Sample records for big chemical processing

  1. Big Data and Chemical Education

    ERIC Educational Resources Information Center

    Pence, Harry E.; Williams, Antony J.

    2016-01-01

    The amount of computerized information that organizations collect and process is growing so large that the term Big Data is commonly being used to describe the situation. Accordingly, Big Data is defined by a combination of the Volume, Variety, Velocity, and Veracity of the data being processed. Big Data tools are already having an impact in…

  2. Big muddy: can a chemical flood breathe new life into a tired old giant

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1978-06-01

    A 9-year, $35.5-million tertiary recovery project has been begun in the Big Muddy Field in Wyoming. It will evaluate a chemical flooding process employing an aqueous surfactant slug followed by polymer. (DLC)

  3. Modern data science for analytical chemical data - A comprehensive review.

    PubMed

    Szymańska, Ewa

    2018-10-22

    Efficient and reliable analysis of chemical analytical data is a great challenge due to the increase in data size, variety and velocity. New methodologies, approaches and methods are being proposed not only by chemometrics but also by other data scientific communities to extract relevant information from big datasets and provide their value to different applications. Besides common goal of big data analysis, different perspectives and terms on big data are being discussed in scientific literature and public media. The aim of this comprehensive review is to present common trends in the analysis of chemical analytical data across different data scientific fields together with their data type-specific and generic challenges. Firstly, common data science terms used in different data scientific fields are summarized and discussed. Secondly, systematic methodologies to plan and run big data analysis projects are presented together with their steps. Moreover, different analysis aspects like assessing data quality, selecting data pre-processing strategies, data visualization and model validation are considered in more detail. Finally, an overview of standard and new data analysis methods is provided and their suitability for big analytical chemical datasets shortly discussed. Copyright © 2018 Elsevier B.V. All rights reserved.

  4. Identification and characterization of cuticular hydrocarbons from a rapid species radiation of Hawaiian swordtailed crickets (Gryllidae: Trigonidiinae: Laupala).

    PubMed

    Mullen, Sean P; Millar, Jocelyn G; Schal, Coby; Shaw, Kerry L

    2008-02-01

    A previous investigation of cuticular hydrocarbon variation among Hawaiian swordtail crickets (genus Laupala) revealed that these species differ dramatically in composition of cuticular lipids. Cuticular lipid extracts of Laupala species sampled from the Big Island of Hawaii also possess a greatly reduced number of chemicals (as evidenced by number of gas chromatography peaks) relative to ancestral taxa sampled from the geologically older island of Maui. One possible explanation for this biogeographic pattern is that reduction in chemical diversity observed among the Big Island taxa represents the loss of ancestral hydrocarbons found on Maui. To test this hypothesis, we characterized and identified the structures of cuticular hydrocarbons for seven species of Hawaiian Laupala, two from Maui (ancestral) and five from the Big Island of Hawaii (derived) by using gas chromatography-mass spectrometry. Big Island Laupala possessed a reduced number of alkenes as well as a reduction in the diversity of methyl-branch positions relative to species sampled from Maui (ancestral), thus supporting our hypothesis of a founder-induced loss of chemical diversity. The reduction in diversity of ancestral hydrocarbons was more severe within one of the two sister lineages on the Big Island, suggesting that post-colonizing processes, such as drift or selection, also have influenced hydrocarbon evolution in this group.

  5. Serinol: small molecule - big impact

    PubMed Central

    2011-01-01

    The amino alcohol serinol (2-amino-1,3-propanediol) has become a common intermediate for several chemical processes. Since the 1940s serinol was used as precursor for synthesis of synthetic antibiotics (chloramphenicol). In the last years, new scopes of applications were discovered. Serinol is used for X-ray contrast agents, pharmaceuticals or for chemical sphingosine/ceramide synthesis. It can either be obtained by chemical processes based on 2-nitro-1,3-propanediol, dihydroxyacetone and ammonia, dihydroxyacetone oxime or 5-amino-1,3-dioxane, or biotechnological application of amino alcohol dehydrogenases (AMDH) or transaminases. This review provides a survey of synthesis, properties and applications for serinol. PMID:21906364

  6. Big Data Analytics in Chemical Engineering.

    PubMed

    Chiang, Leo; Lu, Bo; Castillo, Ivan

    2017-06-07

    Big data analytics is the journey to turn data into insights for more informed business and operational decisions. As the chemical engineering community is collecting more data (volume) from different sources (variety), this journey becomes more challenging in terms of using the right data and the right tools (analytics) to make the right decisions in real time (velocity). This article highlights recent big data advancements in five industries, including chemicals, energy, semiconductors, pharmaceuticals, and food, and then discusses technical, platform, and culture challenges. To reach the next milestone in multiplying successes to the enterprise level, government, academia, and industry need to collaboratively focus on workforce development and innovation.

  7. Hedgehogs and foxes (and a bear)

    NASA Astrophysics Data System (ADS)

    Gibb, Bruce

    2017-02-01

    The chemical universe is big. Really big. You just won't believe how vastly, hugely, mind-bogglingly big it is. Bruce Gibb reminds us that it's somewhat messy too, and so we succeed by recognizing the limits of our knowledge.

  8. The Devil's in the Delta

    ERIC Educational Resources Information Center

    Luyben, William L.

    2007-01-01

    Students frequently confuse and incorrectly apply the several "deltas" that are used in chemical engineering. The deltas come in three different flavors: "out minus in", "big minus little" and "now versus then." The first applies to a change in a stream property as the stream flows through a process. For example, the "[delta]H" in an energy…

  9. Use of big data in drug development for precision medicine

    PubMed Central

    Kim, Rosa S.; Goossens, Nicolas; Hoshida, Yujin

    2016-01-01

    Summary Drug development has been a costly and lengthy process with an extremely low success rate and lack of consideration of individual diversity in drug response and toxicity. Over the past decade, an alternative “big data” approach has been expanding at an unprecedented pace based on the development of electronic databases of chemical substances, disease gene/protein targets, functional readouts, and clinical information covering inter-individual genetic variations and toxicities. This paradigm shift has enabled systematic, high-throughput, and accelerated identification of novel drugs or repurposed indications of existing drugs for pathogenic molecular aberrations specifically present in each individual patient. The exploding interest from the information technology and direct-to-consumer genetic testing industries has been further facilitating the use of big data to achieve personalized Precision Medicine. Here we overview currently available resources and discuss future prospects. PMID:27430024

  10. Helping Students Understand the Role of Symmetry in Chemistry Using the Particle-in-a-Box Model

    ERIC Educational Resources Information Center

    Manae, Meghna A.; Hazra, Anirban

    2016-01-01

    In a course on chemical applications of symmetry and group theory, students learn to use several useful tools (like character tables, projection operators, and correlation tables), but in the process of learning the mathematical details, they often miss the conceptual big picture about "why" and "how" symmetry leads to the…

  11. Ecohydrologic processes and soil thickness feedbacks control limestone-weathering rates in a karst landscape

    DOE PAGES

    Dong, Xiaoli; Cohen, Matthew J.; Martin, Jonathan B.; ...

    2018-05-18

    Here, chemical weathering of bedrock plays an essential role in the formation and evolution of Earth's critical zone. Over geologic time, the negative feedback between temperature and chemical weathering rates contributes to the regulation of Earth climate. The challenge of understanding weathering rates and the resulting evolution of critical zone structures lies in complicated interactions and feedbacks among environmental variables, local ecohydrologic processes, and soil thickness, the relative importance of which remains unresolved. We investigate these interactions using a reactive-transport kinetics model, focusing on a low-relief, wetland-dominated karst landscape (Big Cypress National Preserve, South Florida, USA) as a case study.more » Across a broad range of environmental variables, model simulations highlight primary controls of climate and soil biological respiration, where soil thickness both supplies and limits transport of biologically derived acidity. Consequently, the weathering rate maximum occurs at intermediate soil thickness. The value of the maximum weathering rate and the precise soil thickness at which it occurs depend on several environmental variables, including precipitation regime, soil inundation, vegetation characteristics, and rate of groundwater drainage. Simulations for environmental conditions specific to Big Cypress suggest that wetland depressions in this landscape began to form around beginning of the Holocene with gradual dissolution of limestone bedrock and attendant soil development, highlighting large influence of age-varying soil thickness on weathering rates and consequent landscape development. While climatic variables are often considered most important for chemical weathering, our results indicate that soil thickness and biotic activity are equally important. Weathering rates reflect complex interactions among soil thickness, climate, and local hydrologic and biotic processes, which jointly shape the supply and delivery of chemical reactants, and the resulting trajectories of critical zone and karst landscape development.« less

  12. Ecohydrologic processes and soil thickness feedbacks control limestone-weathering rates in a karst landscape

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dong, Xiaoli; Cohen, Matthew J.; Martin, Jonathan B.

    Here, chemical weathering of bedrock plays an essential role in the formation and evolution of Earth's critical zone. Over geologic time, the negative feedback between temperature and chemical weathering rates contributes to the regulation of Earth climate. The challenge of understanding weathering rates and the resulting evolution of critical zone structures lies in complicated interactions and feedbacks among environmental variables, local ecohydrologic processes, and soil thickness, the relative importance of which remains unresolved. We investigate these interactions using a reactive-transport kinetics model, focusing on a low-relief, wetland-dominated karst landscape (Big Cypress National Preserve, South Florida, USA) as a case study.more » Across a broad range of environmental variables, model simulations highlight primary controls of climate and soil biological respiration, where soil thickness both supplies and limits transport of biologically derived acidity. Consequently, the weathering rate maximum occurs at intermediate soil thickness. The value of the maximum weathering rate and the precise soil thickness at which it occurs depend on several environmental variables, including precipitation regime, soil inundation, vegetation characteristics, and rate of groundwater drainage. Simulations for environmental conditions specific to Big Cypress suggest that wetland depressions in this landscape began to form around beginning of the Holocene with gradual dissolution of limestone bedrock and attendant soil development, highlighting large influence of age-varying soil thickness on weathering rates and consequent landscape development. While climatic variables are often considered most important for chemical weathering, our results indicate that soil thickness and biotic activity are equally important. Weathering rates reflect complex interactions among soil thickness, climate, and local hydrologic and biotic processes, which jointly shape the supply and delivery of chemical reactants, and the resulting trajectories of critical zone and karst landscape development.« less

  13. Stellar nucleosynthesis and chemical evolution of the solar neighborhood

    NASA Technical Reports Server (NTRS)

    Clayton, Donald D.

    1988-01-01

    Current theoretical models of nucleosynthesis (N) in stars are reviewed, with an emphasis on their implications for Galactic chemical evolution. Topics addressed include the Galactic population II red giants and early N; N in the big bang; star formation, stellar evolution, and the ejection of thermonuclearly evolved debris; the chemical evolution of an idealized disk galaxy; analytical solutions for a closed-box model with continuous infall; and nuclear burning processes and yields. Consideration is given to shell N in massive stars, N related to degenerate cores, and the types of observational data used to constrain N models. Extensive diagrams, graphs, and tables of numerical data are provided.

  14. Big data processing in the cloud - Challenges and platforms

    NASA Astrophysics Data System (ADS)

    Zhelev, Svetoslav; Rozeva, Anna

    2017-12-01

    Choosing the appropriate architecture and technologies for a big data project is a difficult task, which requires extensive knowledge in both the problem domain and in the big data landscape. The paper analyzes the main big data architectures and the most widely implemented technologies used for processing and persisting big data. Clouds provide for dynamic resource scaling, which makes them a natural fit for big data applications. Basic cloud computing service models are presented. Two architectures for processing big data are discussed, Lambda and Kappa architectures. Technologies for big data persistence are presented and analyzed. Stream processing as the most important and difficult to manage is outlined. The paper highlights main advantages of cloud and potential problems.

  15. Sustainability of biofuels and renewable chemicals production from biomass.

    PubMed

    Kircher, Manfred

    2015-12-01

    In the sectors of biofuel and renewable chemicals the big feedstock demand asks, first, to expand the spectrum of carbon sources beyond primary biomass, second, to establish circular processing chains and, third, to prioritize product sectors exclusively depending on carbon: chemicals and heavy-duty fuels. Large-volume production lines will reduce greenhouse gas (GHG) emission significantly but also low-volume chemicals are indispensable in building 'low-carbon' industries. The foreseeable feedstock change initiates innovation, securing societal wealth in the industrialized world and creating employment in regions producing biomass. When raising the investments in rerouting to sustainable biofuel and chemicals today competitiveness with fossil-based fuel and chemicals is a strong issue. Many countries adopted comprehensive bioeconomy strategies to tackle this challenge. These public actions are mostly biased to biofuel but should give well-balanced attention to renewable chemicals as well. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Solution structure of leptospiral LigA4 Big domain

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mei, Song; Zhang, Jiahai; Zhang, Xuecheng

    Pathogenic Leptospiraspecies express immunoglobulin-like proteins which serve as adhesins to bind to the extracellular matrices of host cells. Leptospiral immunoglobulin-like protein A (LigA), a surface exposed protein containing tandem repeats of bacterial immunoglobulin-like (Big) domains, has been proved to be involved in the interaction of pathogenic Leptospira with mammalian host. In this study, the solution structure of the fourth Big domain of LigA (LigA4 Big domain) from Leptospira interrogans was solved by nuclear magnetic resonance (NMR). The structure of LigA4 Big domain displays a similar bacterial immunoglobulin-like fold compared with other Big domains, implying some common structural aspects of Bigmore » domain family. On the other hand, it displays some structural characteristics significantly different from classic Ig-like domain. Furthermore, Stains-all assay and NMR chemical shift perturbation revealed the Ca{sup 2+} binding property of LigA4 Big domain. - Highlights: • Determining the solution structure of a bacterial immunoglobulin-like domain from a surface protein of Leptospira. • The solution structure shows some structural characteristics significantly different from the classic Ig-like domains. • A potential Ca{sup 2+}-binding site was identified by strains-all and NMR chemical shift perturbation.« less

  17. Hydrogeochemistry of Big Soda Lake, Nevada: An alkaline meromictic desert lake

    USGS Publications Warehouse

    Kharaka, Y.K.; Robinson, S.W.; Law, L.M.; Carothers, W.W.

    1984-01-01

    Big Soda Lake, located near Fallon, Nevada, occupies an explosion crater rimmed by basaltic debris; volcanic activity apparently ceased within the last 10,000 years. This lake has been selected for a detailed multidisciplinary study that will ultimately cover the organic and inorganic hydrogeochemistry of water and sediments because the time at which chemical stratification was initiated is known (~1920) and chemical analyses are available for a period of more than 100 years. Detailed chemical analyses of the waters show that the lake is at present alkaline (pH = 9.7), chemically stratified (meromictic) and is extremely anoxic (total reduced sulfur-410 mg/L as H2S) below a depth of about 35 m. The average concentrations (in mg/L) of Na, K, Mg, Ca, NH3, H2S, alkalinity (as HCO3), Cl, SO4, and dissolved organics (as C) in waters of the upper layer (depth 0 to 32 m) are 8,100, 320, 150, 5.0, < 0.1, < 0.5, 4,100, 7,100, 5,800, and 20 respectively; in the deeper layer (depth 37 to 64 m) they are 27,000, 1,200, 5.6, 0.8, 45, 410, 24,000, 27,500, 6,800, and 60, respectively. Chemical and stable isotope analyses of the waters, ??13C and ??14C values of dissolved total carbonate from this lake and surface and ground waters in the area together with mineral-water equilibrium computations indicate that the waters in the lake are primarily meteoric in origin with the present chemical composition resulting from the following geochemical processes: 1. (1) evaporation and exchange with atmosphere, the dominant processes, 2. (2) mineral-water interactions, including dissolution, precipitation and ion exchange, 3. (3) inflow and outflow of ground water and 4. (4) biological activity of macro- and microorganisms, including sulfate reduction in the water column of the deeper layer at a very high rate of 6.6 ??mol L-1 day-1. ?? 1984.

  18. Supporting read-across using biological data.

    PubMed

    Zhu, Hao; Bouhifd, Mounir; Donley, Elizabeth; Egnash, Laura; Kleinstreuer, Nicole; Kroese, E Dinant; Liu, Zhichao; Luechtefeld, Thomas; Palmer, Jessica; Pamies, David; Shen, Jie; Strauss, Volker; Wu, Shengde; Hartung, Thomas

    2016-01-01

    Read-across, i.e. filling toxicological data gaps by relating to similar chemicals, for which test data are available, is usually done based on chemical similarity. Besides structure and physico-chemical properties, however, biological similarity based on biological data adds extra strength to this process. In the context of developing Good Read-Across Practice guidance, a number of case studies were evaluated to demonstrate the use of biological data to enrich read-across. In the simplest case, chemically similar substances also show similar test results in relevant in vitro assays. This is a well-established method for the read-across of e.g. genotoxicity assays. Larger datasets of biological and toxicological properties of hundreds and thousands of substances become increasingly available enabling big data approaches in read-across studies. Several case studies using various big data sources are described in this paper. An example is given for the US EPA's ToxCast dataset allowing read-across for high quality uterotrophic assays for estrogenic endocrine disruption. Similarly, an example for REACH registration data enhancing read-across for acute toxicity studies is given. A different approach is taken using omics data to establish biological similarity: Examples are given for stem cell models in vitro and short-term repeated dose studies in rats in vivo to support read-across and category formation. These preliminary biological data-driven read-across studies highlight the road to the new generation of read-across approaches that can be applied in chemical safety assessment.

  19. t4 report1

    PubMed Central

    Zhu, Hao; Bouhifd, Mounir; Kleinstreuer, Nicole; Kroese, E. Dinant; Liu, Zhichao; Luechtefeld, Thomas; Pamies, David; Shen, Jie; Strauss, Volker; Wu, Shengde; Hartung, Thomas

    2016-01-01

    Summary Read-across, i.e. filling toxicological data gaps by relating to similar chemicals, for which test data are available, is usually done based on chemical similarity. Besides structure and physico-chemical properties, however, biological similarity based on biological data adds extra strength to this process. In the context of developing Good Read-Across Practice guidance, a number of case studies were evaluated to demonstrate the use of biological data to enrich read-across. In the simplest case, chemically similar substances also show similar test results in relevant in vitro assays. This is a well-established method for the read-across of e.g. genotoxicity assays. Larger datasets of biological and toxicological properties of hundreds and thousands of substances become increasingly available enabling big data approaches in read-across studies. Several case studies using various big data sources are described in this paper. An example is given for the US EPA’s ToxCast dataset allowing read-across for high quality uterotrophic assays for estrogenic endocrine disruption. Similarly, an example for REACH registration data enhancing read-across for acute toxicity studies is given. A different approach is taken using omics data to establish biological similarity: Examples are given for stem cell models in vitro and short-term repeated dose studies in rats in vivo to support read-across and category formation. These preliminary biological data-driven read-across studies highlight the road to the new generation of read-across approaches that can be applied in chemical safety assessment. PMID:26863516

  20. Impact of chemical polishing on surface roughness and dimensional quality of electron beam melting process (EBM) parts

    NASA Astrophysics Data System (ADS)

    Dolimont, Adrien; Rivière-Lorphèvre, Edouard; Ducobu, François; Backaert, Stéphane

    2018-05-01

    Additive manufacturing is growing faster and faster. This leads us to study the functionalization of the parts that are produced by these processes. Electron Beam melting (EBM) is one of these technologies. It is a powder based additive manufacturing (AM) method. With this process, it is possible to manufacture high-density metal parts with complex topology. One of the big problems with these technologies is the surface finish. To improve the quality of the surface, some finishing operations are needed. In this study, the focus is set on chemical polishing. The goal is to determine how the chemical etching impacts the dimensional accuracy and the surface roughness of EBM parts. To this end, an experimental campaign was carried out on the most widely used material in EBM, Ti6Al4V. Different exposure times were tested. The impact of these times on surface quality was evaluated. To help predicting the excess thickness to be provided, the dimensional impact of chemical polishing on EBM parts was estimated. 15 parts were measured before and after chemical machining. The improvement of surface quality was also evaluated after each treatment.

  1. A Query Language for Handling Big Observation Data Sets in the Sensor Web

    NASA Astrophysics Data System (ADS)

    Autermann, Christian; Stasch, Christoph; Jirka, Simon; Koppe, Roland

    2017-04-01

    The Sensor Web provides a framework for the standardized Web-based sharing of environmental observations and sensor metadata. While the issue of varying data formats and protocols is addressed by these standards, the fast growing size of observational data is imposing new challenges for the application of these standards. Most solutions for handling big observational datasets currently focus on remote sensing applications, while big in-situ datasets relying on vector features still lack a solid approach. Conventional Sensor Web technologies may not be adequate, as the sheer size of the data transmitted and the amount of metadata accumulated may render traditional OGC Sensor Observation Services (SOS) unusable. Besides novel approaches to store and process observation data in place, e.g. by harnessing big data technologies from mainstream IT, the access layer has to be amended to utilize and integrate these large observational data archives into applications and to enable analysis. For this, an extension to the SOS will be discussed that establishes a query language to dynamically process and filter observations at storage level, similar to the OGC Web Coverage Service (WCS) and it's Web Coverage Processing Service (WCPS) extension. This will enable applications to request e.g. spatial or temporal aggregated data sets in a resolution it is able to display or it requires. The approach will be developed and implemented in cooperation with the The Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research whose catalogue of data compromises marine observations of physical, chemical and biological phenomena from a wide variety of sensors, including mobile (like research vessels, aircrafts or underwater vehicles) and stationary (like buoys or research stations). Observations are made with a high temporal resolution and the resulting time series may span multiple decades.

  2. Technical challenges for big data in biomedicine and health: data sources, infrastructure, and analytics.

    PubMed

    Peek, N; Holmes, J H; Sun, J

    2014-08-15

    To review technical and methodological challenges for big data research in biomedicine and health. We discuss sources of big datasets, survey infrastructures for big data storage and big data processing, and describe the main challenges that arise when analyzing big data. The life and biomedical sciences are massively contributing to the big data revolution through secondary use of data that were collected during routine care and through new data sources such as social media. Efficient processing of big datasets is typically achieved by distributing computation over a cluster of computers. Data analysts should be aware of pitfalls related to big data such as bias in routine care data and the risk of false-positive findings in high-dimensional datasets. The major challenge for the near future is to transform analytical methods that are used in the biomedical and health domain, to fit the distributed storage and processing model that is required to handle big data, while ensuring confidentiality of the data being analyzed.

  3. Teaching Information & Technology Skills: The Big6[TM] in Elementary Schools. Professional Growth Series.

    ERIC Educational Resources Information Center

    Eisenberg, Michael B.; Berkowitz, Robert E.

    This book about using the Big6 information problem solving process model in elementary schools is organized into two parts. Providing an overview of the Big6 approach, Part 1 includes the following chapters: "Introduction: The Need," including the information problem, the Big6 and other process models, and teaching/learning the Big6;…

  4. Big (Bio)Chemical Data Mining Using Chemometric Methods: A Need for Chemists.

    PubMed

    Tauler, Roma; Parastar, Hadi

    2018-03-23

    This review aims to demonstrate abilities to analyze Big (Bio)Chemical Data (BBCD) with multivariate chemometric methods and to show some of the more important challenges of modern analytical researches. In this review, the capabilities and versatility of chemometric methods will be discussed in light of the BBCD challenges that are being encountered in chromatographic, spectroscopic and hyperspectral imaging measurements, with an emphasis on their application to omics sciences. In addition, insights and perspectives on how to address the analysis of BBCD are provided along with a discussion of the procedures necessary to obtain more reliable qualitative and quantitative results. In this review, the importance of Big Data and of their relevance to (bio)chemistry are first discussed. Then, analytical tools which can produce BBCD are presented as well as some basics needed to understand prospects and limitations of chemometric techniques when they are applied to BBCD are given. Finally, the significance of the combination of chemometric approaches with BBCD analysis in different chemical disciplines is highlighted with some examples. In this paper, we have tried to cover some of the applications of big data analysis in the (bio)chemistry field. However, this coverage is not extensive covering everything done in the field. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Sports and the Big6: The Information Advantage.

    ERIC Educational Resources Information Center

    Eisenberg, Mike

    1997-01-01

    Explores the connection between sports and the Big6 information problem-solving process and how sports provides an ideal setting for learning and teaching about the Big6. Topics include information aspects of baseball, football, soccer, basketball, figure skating, track and field, and golf; and the Big6 process applied to sports. (LRW)

  6. Harnessing the Big Data Paradigm for ICME: Shifting from Materials Selection to Materials Enabled Design

    NASA Astrophysics Data System (ADS)

    Broderick, Scott R.; Santhanam, Ganesh Ram; Rajan, Krishna

    2016-08-01

    As the size of databases has significantly increased, whether through high throughput computation or through informatics-based modeling, the challenge of selecting the optimal material for specific design requirements has also arisen. Given the multiple, and often conflicting, design requirements, this selection process is not as trivial as sorting the database for a given property value. We suggest that the materials selection process should minimize selector bias, as well as take data uncertainty into account. For this reason, we discuss and apply decision theory for identifying chemical additions to Ni-base alloys. We demonstrate and compare results for both a computational array of chemistries and standard commercial superalloys. We demonstrate how we can use decision theory to select the best chemical additions for enhancing both property and processing, which would not otherwise be easily identifiable. This work is one of the first examples of introducing the mathematical framework of set theory and decision analysis into the domain of the materials selection process.

  7. On The Cloud Processing of Aerosol Particles: An Entraining Air Parcel Model With Two-dimensional Spectral Cloud Microphysics and A New Formulation of The Collection Kernel

    NASA Astrophysics Data System (ADS)

    Bott, Andreas; Kerkweg, Astrid; Wurzler, Sabine

    A study has been made of the modification of aerosol spectra due to cloud pro- cesses and the impact of the modified aerosols on the microphysical structure of future clouds. For this purpose an entraining air parcel model with two-dimensional spectral cloud microphysics has been used. In order to treat collision/coalescence processes in the two-dimensional microphysical module, a new realistic and continuous formu- lation of the collection kernel has been developed. Based on experimental data, the kernel covers the entire investigated size range of aerosols, cloud and rain drops, that is the kernel combines all important coalescence processes such as the collision of cloud drops as well as the impaction scavenging of small aerosols by big raindrops. Since chemical reactions in the gas phase and in cloud drops have an important impact on the physico-chemical properties of aerosol particles, the parcel model has been extended by a chemical module describing gas phase and aqueous phase chemical reactions. However, it will be shown that in the numerical case studies presented in this paper the modification of aerosols by chemical reactions has a minor influence on the microphysical structure of future clouds. The major process yielding in a second cloud event an enhanced formation of rain is the production of large aerosol particles by collision/coalescence processes in the first cloud.

  8. An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer.

    PubMed

    Yang, Xi; Wu, Chengkun; Lu, Kai; Fang, Lin; Zhang, Yong; Li, Shengkang; Guo, Guixin; Du, YunFei

    2017-12-01

    Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion-a big data interface on the Tianhe-2 supercomputer-to enable big data applications to run on Tianhe-2 via a single command or a shell script. Orion supports multiple users, and each user can launch multiple tasks. It minimizes the effort needed to initiate big data applications on the Tianhe-2 supercomputer via automated configuration. Orion follows the "allocate-when-needed" paradigm, and it avoids the idle occupation of computational resources. We tested the utility and performance of Orion using a big genomic dataset and achieved a satisfactory performance on Tianhe-2 with very few modifications to existing applications that were implemented in Hadoop/Spark. In summary, Orion provides a practical and economical interface for big data processing on Tianhe-2.

  9. Research on Technology Innovation Management in Big Data Environment

    NASA Astrophysics Data System (ADS)

    Ma, Yanhong

    2018-02-01

    With the continuous development and progress of the information age, the demand for information is getting larger. The processing and analysis of information data is also moving toward the direction of scale. The increasing number of information data makes people have higher demands on processing technology. The explosive growth of information data onto the current society have prompted the advent of the era of big data. At present, people have more value and significance in producing and processing various kinds of information and data in their lives. How to use big data technology to process and analyze information data quickly to improve the level of big data management is an important stage to promote the current development of information and data processing technology in our country. To some extent, innovative research on the management methods of information technology in the era of big data can enhance our overall strength and make China be an invincible position in the development of the big data era.

  10. A genetic algorithm-based job scheduling model for big data analytics.

    PubMed

    Lu, Qinghua; Li, Shanshan; Zhang, Weishan; Zhang, Lei

    Big data analytics (BDA) applications are a new category of software applications that process large amounts of data using scalable parallel processing infrastructure to obtain hidden value. Hadoop is the most mature open-source big data analytics framework, which implements the MapReduce programming model to process big data with MapReduce jobs. Big data analytics jobs are often continuous and not mutually separated. The existing work mainly focuses on executing jobs in sequence, which are often inefficient and consume high energy. In this paper, we propose a genetic algorithm-based job scheduling model for big data analytics applications to improve the efficiency of big data analytics. To implement the job scheduling model, we leverage an estimation module to predict the performance of clusters when executing analytics jobs. We have evaluated the proposed job scheduling model in terms of feasibility and accuracy.

  11. Developing a Passive Time-Activity Triage System In support of Consumer Ingredient Exposure Prioritization.

    EPA Science Inventory

    Chemical Hazard/toxicity assessment of chemicals relies on droves of chemical-biological data at the organism, tissue, cell, and biomolecular level of resolution. Big data in the context of exposure science relies on a comprehensive knowledge of societies’ and community act...

  12. Developing a Passive Time-Activity Triage System In support of Consumer Ingredient Exposure Prioritization

    EPA Science Inventory

    Chemical Hazard/toxicity assessment of chemicals relies on droves of chemical-biological data at the organism, tissue, cell, and biomolecular level of resolution. Big data in the context of exposure science relies on a comprehensive knowledge of societies’ and community activity ...

  13. Processing Solutions for Big Data in Astronomy

    NASA Astrophysics Data System (ADS)

    Fillatre, L.; Lepiller, D.

    2016-09-01

    This paper gives a simple introduction to processing solutions applied to massive amounts of data. It proposes a general presentation of the Big Data paradigm. The Hadoop framework, which is considered as the pioneering processing solution for Big Data, is described together with YARN, the integrated Hadoop tool for resource allocation. This paper also presents the main tools for the management of both the storage (NoSQL solutions) and computing capacities (MapReduce parallel processing schema) of a cluster of machines. Finally, more recent processing solutions like Spark are discussed. Big Data frameworks are now able to run complex applications while keeping the programming simple and greatly improving the computing speed.

  14. Systematic Proteomic Approach to Characterize the Impacts of Chemical Interactions on Protein and Cytotoxicity Responses to Metal Mixture Exposures

    EPA Science Inventory

    Chemical interactions have posed a big challenge in toxicity characterization and human health risk assessment of environmental mixtures. To characterize the impacts of chemical interactions on protein and cytotoxicity responses to environmental mixtures, we established a systems...

  15. A Triangulation Method to Dismantling a Disciplinary "Big Deal"

    ERIC Educational Resources Information Center

    Dawson, Diane

    2015-01-01

    In late 2012, it appeared that the University Library, University of Saskatchewan would likely no longer be able to afford to subscribe to the entire American Chemical Society "Big Deal" of 36 journals. Difficult choices would need to be made regarding which titles to retain as individual subscriptions. In an effort to arrive at the most…

  16. Current technologies for biological treatment of textile wastewater--a review.

    PubMed

    Sarayu, K; Sandhya, S

    2012-06-01

    The release of colored wastewater represents a serious environmental problem and public health concern. Color removal from textile wastewater has become a big challenge over the last decades, and up to now, there is no single and economically attractive treatment method that can effectively decolorize the wastewater. Effluents from textile manufacturing, dyeing, and finishing processes contain high concentrations of biologically difficult-to-degrade or even inert auxiliaries, chemicals like acids, waxes, fats, salts, binders, thickeners, urea, surfactants, reducing agents, etc. The various chemicals such as biocides and stain repellents used for brightening, sequestering, anticreasing, sizing, softening, and wetting of the yarn or fabric are also present in wastewater. Therefore, the textile wastewater needs environmental friendly, effective treatment process. This paper provides a critical review on the current technology available for decolorization and degradation of textile wastewater and also suggests effective and economically attractive alternatives.

  17. Supporting diagnosis and treatment in medical care based on Big Data processing.

    PubMed

    Lupşe, Oana-Sorina; Crişan-Vida, Mihaela; Stoicu-Tivadar, Lăcrămioara; Bernard, Elena

    2014-01-01

    With information and data in all domains growing every day, it is difficult to manage and extract useful knowledge for specific situations. This paper presents an integrated system architecture to support the activity in the Ob-Gin departments with further developments in using new technology to manage Big Data processing - using Google BigQuery - in the medical domain. The data collected and processed with Google BigQuery results from different sources: two Obstetrics & Gynaecology Departments, the TreatSuggest application - an application for suggesting treatments, and a home foetal surveillance system. Data is uploaded in Google BigQuery from Bega Hospital Timişoara, Romania. The analysed data is useful for the medical staff, researchers and statisticians from public health domain. The current work describes the technological architecture and its processing possibilities that in the future will be proved based on quality criteria to lead to a better decision process in diagnosis and public health.

  18. Chemical Space: Big Data Challenge for Molecular Diversity.

    PubMed

    Awale, Mahendra; Visini, Ricardo; Probst, Daniel; Arús-Pous, Josep; Reymond, Jean-Louis

    2017-10-25

    Chemical space describes all possible molecules as well as multi-dimensional conceptual spaces representing the structural diversity of these molecules. Part of this chemical space is available in public databases ranging from thousands to billions of compounds. Exploiting these databases for drug discovery represents a typical big data problem limited by computational power, data storage and data access capacity. Here we review recent developments of our laboratory, including progress in the chemical universe databases (GDB) and the fragment subset FDB-17, tools for ligand-based virtual screening by nearest neighbor searches, such as our multi-fingerprint browser for the ZINC database to select purchasable screening compounds, and their application to discover potent and selective inhibitors for calcium channel TRPV6 and Aurora A kinase, the polypharmacology browser (PPB) for predicting off-target effects, and finally interactive 3D-chemical space visualization using our online tools WebDrugCS and WebMolCS. All resources described in this paper are available for public use at www.gdb.unibe.ch.

  19. NMR Crystallography of Enzyme Active Sites: Probing Chemically-Detailed, Three-Dimensional Structure in Tryptophan Synthase

    PubMed Central

    Dunn, Michael F.

    2013-01-01

    Conspectus NMR crystallography – the synergistic combination of X-ray diffraction, solid-state NMR spectroscopy, and computational chemistry – offers unprecedented insight into three-dimensional, chemically-detailed structure. From its initial role in refining diffraction data of organic and inorganic solids, NMR crystallography is now being developed for application to active sites in biomolecules, where it reveals chemically-rich detail concerning the interactions between enzyme site residues and the reacting substrate that is not achievable when X-ray, NMR, or computational methodologies are applied in isolation. For example, typical X-ray crystal structures (1.5 to 2.5 Å resolution) of enzyme-bound intermediates identify possible hydrogen-bonding interactions between site residues and substrate, but do not directly identify the protonation state of either. Solid-state NMR can provide chemical shifts for selected atoms of enzyme-substrate complexes, but without a larger structural framework in which to interpret them, only empirical correlations with local chemical structure are possible. Ab initio calculations and molecular mechanics can build models for enzymatic processes, but rely on chemical details that must be specified. Together, however, X-ray diffraction, solid-state NMR spectroscopy, and computational chemistry can provide consistent and testable models for structure and function of enzyme active sites: X-ray crystallography provides a coarse framework upon which models of the active site can be developed using computational chemistry; these models can be distinguished by comparison of their calculated NMR chemical shifts with the results of solid-state NMR spectroscopy experiments. Conceptually, each technique is a puzzle piece offering a generous view of the big picture. Only when correctly pieced together, however, can they reveal the big picture at highest resolution. In this Account, we detail our first steps in the development of NMR crystallography for application to enzyme catalysis. We begin with a brief introduction to NMR crystallography and then define the process that we have employed to probe the active site in the β-subunit of tryptophan synthase with unprecedented atomic-level resolution. This approach has resulted in a novel structural hypothesis for the protonation state of the quinonoid intermediate in tryptophan synthase and its surprising role in directing the next step in the catalysis of L-Trp formation. PMID:23537227

  20. Using Big Data Analytics to Address Mixtures Exposure

    EPA Science Inventory

    The assessment of chemical mixtures is a complex issue for regulators and health scientists. We propose that assessing chemical co-occurrence patterns and prevalence rates is a relatively simple yet powerful approach in characterizing environmental mixtures and mixtures exposure...

  1. Granular computing with multiple granular layers for brain big data processing.

    PubMed

    Wang, Guoyin; Xu, Ji

    2014-12-01

    Big data is the term for a collection of datasets so huge and complex that it becomes difficult to be processed using on-hand theoretical models and technique tools. Brain big data is one of the most typical, important big data collected using powerful equipments of functional magnetic resonance imaging, multichannel electroencephalography, magnetoencephalography, Positron emission tomography, near infrared spectroscopic imaging, as well as other various devices. Granular computing with multiple granular layers, referred to as multi-granular computing (MGrC) for short hereafter, is an emerging computing paradigm of information processing, which simulates the multi-granular intelligent thinking model of human brain. It concerns the processing of complex information entities called information granules, which arise in the process of data abstraction and derivation of information and even knowledge from data. This paper analyzes three basic mechanisms of MGrC, namely granularity optimization, granularity conversion, and multi-granularity joint computation, and discusses the potential of introducing MGrC into intelligent processing of brain big data.

  2. Big-BOE: Fusing Spanish Official Gazette with Big Data Technology.

    PubMed

    Basanta-Val, Pablo; Sánchez-Fernández, Luis

    2018-06-01

    The proliferation of new data sources, stemmed from the adoption of open-data schemes, in combination with an increasing computing capacity causes the inception of new type of analytics that process Internet of things with low-cost engines to speed up data processing using parallel computing. In this context, the article presents an initiative, called BIG-Boletín Oficial del Estado (BOE), designed to process the Spanish official government gazette (BOE) with state-of-the-art processing engines, to reduce computation time and to offer additional speed up for big data analysts. The goal of including a big data infrastructure is to be able to process different BOE documents in parallel with specific analytics, to search for several issues in different documents. The application infrastructure processing engine is described from an architectural perspective and from performance, showing evidence on how this type of infrastructure improves the performance of different types of simple analytics as several machines cooperate.

  3. Statistical tables and charts showing geochemical variation in the Mesoproterozoic Big Creek, Apple Creek, and Gunsight formations, Lemhi group, Salmon River Mountains and Lemhi Range, central Idaho

    USGS Publications Warehouse

    Lindsey, David A.; Tysdal, Russell G.; Taggart, Joseph E.

    2002-01-01

    The principal purpose of this report is to provide a reference archive for results of a statistical analysis of geochemical data for metasedimentary rocks of Mesoproterozoic age of the Salmon River Mountains and Lemhi Range, central Idaho. Descriptions of geochemical data sets, statistical methods, rationale for interpretations, and references to the literature are provided. Three methods of analysis are used: R-mode factor analysis of major oxide and trace element data for identifying petrochemical processes, analysis of variance for effects of rock type and stratigraphic position on chemical composition, and major-oxide ratio plots for comparison with the chemical composition of common clastic sedimentary rocks.

  4. Design and development of a medical big data processing system based on Hadoop.

    PubMed

    Yao, Qin; Tian, Yu; Li, Peng-Fei; Tian, Li-Li; Qian, Yang-Ming; Li, Jing-Song

    2015-03-01

    Secondary use of medical big data is increasingly popular in healthcare services and clinical research. Understanding the logic behind medical big data demonstrates tendencies in hospital information technology and shows great significance for hospital information systems that are designing and expanding services. Big data has four characteristics--Volume, Variety, Velocity and Value (the 4 Vs)--that make traditional systems incapable of processing these data using standalones. Apache Hadoop MapReduce is a promising software framework for developing applications that process vast amounts of data in parallel with large clusters of commodity hardware in a reliable, fault-tolerant manner. With the Hadoop framework and MapReduce application program interface (API), we can more easily develop our own MapReduce applications to run on a Hadoop framework that can scale up from a single node to thousands of machines. This paper investigates a practical case of a Hadoop-based medical big data processing system. We developed this system to intelligently process medical big data and uncover some features of hospital information system user behaviors. This paper studies user behaviors regarding various data produced by different hospital information systems for daily work. In this paper, we also built a five-node Hadoop cluster to execute distributed MapReduce algorithms. Our distributed algorithms show promise in facilitating efficient data processing with medical big data in healthcare services and clinical research compared with single nodes. Additionally, with medical big data analytics, we can design our hospital information systems to be much more intelligent and easier to use by making personalized recommendations.

  5. Visualization of Big Data Through Ship Maintenance Metrics Analysis for Fleet Maintenance and Revitalization

    DTIC Science & Technology

    2014-03-01

    BIG DATA THROUGH SHIP MAINTENANCE METRICS ANALYSIS FOR FLEET MAINTENANCE AND REVITALIZATION by Isaac J. Donaldson March 2014 Thesis...March 2014 3. REPORT TYPE AND DATES COVERED Master’s Thesis 4. TITLE AND SUBTITLE VISUALIZATION OF BIG DATA THROUGH SHIP MAINTENANCE METRICS...terms of the overall performance of ship maintenance processes is clearly a big data problem. The current process for presenting data on the more than

  6. Using the Big Six Research Process. The Coconut Crab from Guam and Other Stories: Writing Myths, Fables, and Tall Tales.

    ERIC Educational Resources Information Center

    Jansen, Barbara A.; Culpepper, Susan N.

    1996-01-01

    Using the Big Six research process, students at Live Oak Elementary (Round Rock, TX) supplemented information from traditional print and electronic sources with e-mail exchanges around the world to complete a library research collaborative project culminating in an original folk tale. Describes the Big Six process and how it was applied. (PEN)

  7. The research of approaches of applying the results of big data analysis in higher education

    NASA Astrophysics Data System (ADS)

    Kochetkov, O. T.; Prokhorov, I. V.

    2017-01-01

    This article briefly discusses the approaches to the use of Big Data in the educational process of higher educational institutions. There is a brief description of nature of big data, their distribution in the education industry and new ways to use Big Data as part of the educational process are offered as well. This article describes a method for the analysis of the relevant requests by using Yandex.Wordstat (for laboratory works on the processing of data) and Google Trends (for actual pictures of interest and preference in a higher education institution).

  8. On Study of Application of Big Data and Cloud Computing Technology in Smart Campus

    NASA Astrophysics Data System (ADS)

    Tang, Zijiao

    2017-12-01

    We live in an era of network and information, which means we produce and face a lot of data every day, however it is not easy for database in the traditional meaning to better store, process and analyze the mass data, therefore the big data was born at the right moment. Meanwhile, the development and operation of big data rest with cloud computing which provides sufficient space and resources available to process and analyze data of big data technology. Nowadays, the proposal of smart campus construction aims at improving the process of building information in colleges and universities, therefore it is necessary to consider combining big data technology and cloud computing technology into construction of smart campus to make campus database system and campus management system mutually combined rather than isolated, and to serve smart campus construction through integrating, storing, processing and analyzing mass data.

  9. [Big data and their perspectives in radiation therapy].

    PubMed

    Guihard, Sébastien; Thariat, Juliette; Clavier, Jean-Baptiste

    2017-02-01

    The concept of big data indicates a change of scale in the use of data and data aggregation into large databases through improved computer technology. One of the current challenges in the creation of big data in the context of radiation therapy is the transformation of routine care items into dark data, i.e. data not yet collected, and the fusion of databases collecting different types of information (dose-volume histograms and toxicity data for example). Processes and infrastructures devoted to big data collection should not impact negatively on the doctor-patient relationship, the general process of care or the quality of the data collected. The use of big data requires a collective effort of physicians, physicists, software manufacturers and health authorities to create, organize and exploit big data in radiotherapy and, beyond, oncology. Big data involve a new culture to build an appropriate infrastructure legally and ethically. Processes and issues are discussed in this article. Copyright © 2016 Société Française du Cancer. Published by Elsevier Masson SAS. All rights reserved.

  10. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends

    PubMed Central

    2014-01-01

    The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called “big data” challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. The MapReduce programming framework uses two tasks common in functional programming: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields. PMID:25383096

  11. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.

    PubMed

    Mohammed, Emad A; Far, Behrouz H; Naugler, Christopher

    2014-01-01

    The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called "big data" challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. THE MAPREDUCE PROGRAMMING FRAMEWORK USES TWO TASKS COMMON IN FUNCTIONAL PROGRAMMING: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields.

  12. Fault Diagnosis Based on Chemical Sensor Data with an Active Deep Neural Network

    PubMed Central

    Jiang, Peng; Hu, Zhixin; Liu, Jun; Yu, Shanen; Wu, Feng

    2016-01-01

    Big sensor data provide significant potential for chemical fault diagnosis, which involves the baseline values of security, stability and reliability in chemical processes. A deep neural network (DNN) with novel active learning for inducing chemical fault diagnosis is presented in this study. It is a method using large amount of chemical sensor data, which is a combination of deep learning and active learning criterion to target the difficulty of consecutive fault diagnosis. DNN with deep architectures, instead of shallow ones, could be developed through deep learning to learn a suitable feature representation from raw sensor data in an unsupervised manner using stacked denoising auto-encoder (SDAE) and work through a layer-by-layer successive learning process. The features are added to the top Softmax regression layer to construct the discriminative fault characteristics for diagnosis in a supervised manner. Considering the expensive and time consuming labeling of sensor data in chemical applications, in contrast to the available methods, we employ a novel active learning criterion for the particularity of chemical processes, which is a combination of Best vs. Second Best criterion (BvSB) and a Lowest False Positive criterion (LFP), for further fine-tuning of diagnosis model in an active manner rather than passive manner. That is, we allow models to rank the most informative sensor data to be labeled for updating the DNN parameters during the interaction phase. The effectiveness of the proposed method is validated in two well-known industrial datasets. Results indicate that the proposed method can obtain superior diagnosis accuracy and provide significant performance improvement in accuracy and false positive rate with less labeled chemical sensor data by further active learning compared with existing methods. PMID:27754386

  13. Fault Diagnosis Based on Chemical Sensor Data with an Active Deep Neural Network.

    PubMed

    Jiang, Peng; Hu, Zhixin; Liu, Jun; Yu, Shanen; Wu, Feng

    2016-10-13

    Big sensor data provide significant potential for chemical fault diagnosis, which involves the baseline values of security, stability and reliability in chemical processes. A deep neural network (DNN) with novel active learning for inducing chemical fault diagnosis is presented in this study. It is a method using large amount of chemical sensor data, which is a combination of deep learning and active learning criterion to target the difficulty of consecutive fault diagnosis. DNN with deep architectures, instead of shallow ones, could be developed through deep learning to learn a suitable feature representation from raw sensor data in an unsupervised manner using stacked denoising auto-encoder (SDAE) and work through a layer-by-layer successive learning process. The features are added to the top Softmax regression layer to construct the discriminative fault characteristics for diagnosis in a supervised manner. Considering the expensive and time consuming labeling of sensor data in chemical applications, in contrast to the available methods, we employ a novel active learning criterion for the particularity of chemical processes, which is a combination of Best vs. Second Best criterion (BvSB) and a Lowest False Positive criterion (LFP), for further fine-tuning of diagnosis model in an active manner rather than passive manner. That is, we allow models to rank the most informative sensor data to be labeled for updating the DNN parameters during the interaction phase. The effectiveness of the proposed method is validated in two well-known industrial datasets. Results indicate that the proposed method can obtain superior diagnosis accuracy and provide significant performance improvement in accuracy and false positive rate with less labeled chemical sensor data by further active learning compared with existing methods.

  14. Nucleosynthesis in relation to cosmology

    NASA Astrophysics Data System (ADS)

    El Eid, Mounib F.

    2018-04-01

    While the primordial (or Big Bang) nucleosynthesis delivers important clues about the conditions in the high red-shift universe (termed far-field cosmology), the nucleosynthesis of the heavy elements beyond iron by the r-process or the s-process deliver information about the early phase and history of the Galaxy (termed near-field cosmology). In particular, the r-process nucleosynthesis is unique, because it is a primary process that helps to associate individual stars with the composition of the protocloud. The present contribution is intended to give a brief overview about these nucleosynthesis processes and describe their link to the early universe, stellar evolution and to the chemical evolution of the Galaxy. The focus of this present contribution is on illumination the role of nucleosynthesis in the Universe. Owing to the complexity of this subject, a general scenario is more appealing to address interested readers.

  15. Volume and Value of Big Healthcare Data.

    PubMed

    Dinov, Ivo D

    Modern scientific inquiries require significant data-driven evidence and trans-disciplinary expertise to extract valuable information and gain actionable knowledge about natural processes. Effective evidence-based decisions require collection, processing and interpretation of vast amounts of complex data. The Moore's and Kryder's laws of exponential increase of computational power and information storage, respectively, dictate the need rapid trans-disciplinary advances, technological innovation and effective mechanisms for managing and interrogating Big Healthcare Data. In this article, we review important aspects of Big Data analytics and discuss important questions like: What are the challenges and opportunities associated with this biomedical, social, and healthcare data avalanche? Are there innovative statistical computing strategies to represent, model, analyze and interpret Big heterogeneous data? We present the foundation of a new compressive big data analytics (CBDA) framework for representation, modeling and inference of large, complex and heterogeneous datasets. Finally, we consider specific directions likely to impact the process of extracting information from Big healthcare data, translating that information to knowledge, and deriving appropriate actions.

  16. Volume and Value of Big Healthcare Data

    PubMed Central

    Dinov, Ivo D.

    2016-01-01

    Modern scientific inquiries require significant data-driven evidence and trans-disciplinary expertise to extract valuable information and gain actionable knowledge about natural processes. Effective evidence-based decisions require collection, processing and interpretation of vast amounts of complex data. The Moore's and Kryder's laws of exponential increase of computational power and information storage, respectively, dictate the need rapid trans-disciplinary advances, technological innovation and effective mechanisms for managing and interrogating Big Healthcare Data. In this article, we review important aspects of Big Data analytics and discuss important questions like: What are the challenges and opportunities associated with this biomedical, social, and healthcare data avalanche? Are there innovative statistical computing strategies to represent, model, analyze and interpret Big heterogeneous data? We present the foundation of a new compressive big data analytics (CBDA) framework for representation, modeling and inference of large, complex and heterogeneous datasets. Finally, we consider specific directions likely to impact the process of extracting information from Big healthcare data, translating that information to knowledge, and deriving appropriate actions. PMID:26998309

  17. Research on Implementing Big Data: Technology, People, & Processes

    ERIC Educational Resources Information Center

    Rankin, Jenny Grant; Johnson, Margie; Dennis, Randall

    2015-01-01

    When many people hear the term "big data", they primarily think of a technology tool for the collection and reporting of data of high variety, volume, and velocity. However, the complexity of big data is not only the technology, but the supporting processes, policies, and people supporting it. This paper was written by three experts to…

  18. THE BERKELEY DATA ANALYSIS SYSTEM (BDAS): AN OPEN SOURCE PLATFORM FOR BIG DATA ANALYTICS

    DTIC Science & Technology

    2017-09-01

    Evan Sparks, Oliver Zahn, Michael J. Franklin, David A. Patterson, Saul Perlmutter. Scientific Computing Meets Big Data Technology: An Astronomy ...Processing Astronomy Imagery Using Big Data Technology. IEEE Transaction on Big Data, 2016. Approved for Public Release; Distribution Unlimited. 22 [93

  19. [Comparative study of chemical composition of pomegranate peel pomegranates inside and pomegranate seeds].

    PubMed

    Zhou, Qian; Sun, Li-Li; Dai, Yan-Peng; Wang, Liang; Su, Ben-Zheng

    2013-07-01

    An HPLC fingerprint of pomegranate peel was established. Using chromatographic conditions, we compared the chemical composition of pomegranate peel, inside and seeds, and simultaneously determined the contents of gallic acid and ellagic acid. By comparison, we found that there were no significant differences between pomegranate peel and inside, but there was a big difference between pomegranate seeds and another two. The contents of gallic acid and ellagic acid of pomegranate peel respectively were 0.33%, 0.59%, while in pomegranate inside the result respectively were 0.52%, 0.38%. Content of ellagic acid from pomegranate seeds was only 0.01%. By study, we thought that when pomegranate peel was processed, pomegranate seeds should be removed, while pomegranate inside could be retained on the premise of full drying.

  20. Big Data Analytics in Medicine and Healthcare.

    PubMed

    Ristevski, Blagoj; Chen, Ming

    2018-05-10

    This paper surveys big data with highlighting the big data analytics in medicine and healthcare. Big data characteristics: value, volume, velocity, variety, veracity and variability are described. Big data analytics in medicine and healthcare covers integration and analysis of large amount of complex heterogeneous data such as various - omics data (genomics, epigenomics, transcriptomics, proteomics, metabolomics, interactomics, pharmacogenomics, diseasomics), biomedical data and electronic health records data. We underline the challenging issues about big data privacy and security. Regarding big data characteristics, some directions of using suitable and promising open-source distributed data processing software platform are given.

  1. Linked Data: Forming Partnerships at the Data Layer

    NASA Astrophysics Data System (ADS)

    Shepherd, A.; Chandler, C. L.; Arko, R. A.; Jones, M. B.; Hitzler, P.; Janowicz, K.; Krisnadhi, A.; Schildhauer, M.; Fils, D.; Narock, T.; Groman, R. C.; O'Brien, M.; Patton, E. W.; Kinkade, D.; Rauch, S.

    2015-12-01

    The challenges presented by big data are straining data management software architectures of the past. For smaller existing data facilities, the technical refactoring of software layers become costly to scale across the big data landscape. In response to these challenges, data facilities will need partnerships with external entities for improved solutions to perform tasks such as data cataloging, discovery and reuse, and data integration and processing with provenance. At its surface, the concept of linked open data suggests an uncalculated altruism. Yet, in his concept of five star open data, Tim Berners-Lee explains the strategic costs and benefits of deploying linked open data from the perspective of its consumer and producer - a data partnership. The Biological and Chemical Oceanography Data Management Office (BCO-DMO) addresses some of the emerging needs of its research community by partnering with groups doing complementary work and linking their respective data layers using linked open data principles. Examples will show how these links, explicit manifestations of partnerships, reduce technical debt and provide a swift flexibility for future considerations.

  2. Big Geo Data Management: AN Exploration with Social Media and Telecommunications Open Data

    NASA Astrophysics Data System (ADS)

    Arias Munoz, C.; Brovelli, M. A.; Corti, S.; Zamboni, G.

    2016-06-01

    The term Big Data has been recently used to define big, highly varied, complex data sets, which are created and updated at a high speed and require faster processing, namely, a reduced time to filter and analyse relevant data. These data is also increasingly becoming Open Data (data that can be freely distributed) made public by the government, agencies, private enterprises and among others. There are at least two issues that can obstruct the availability and use of Open Big Datasets: Firstly, the gathering and geoprocessing of these datasets are very computationally intensive; hence, it is necessary to integrate high-performance solutions, preferably internet based, to achieve the goals. Secondly, the problems of heterogeneity and inconsistency in geospatial data are well known and affect the data integration process, but is particularly problematic for Big Geo Data. Therefore, Big Geo Data integration will be one of the most challenging issues to solve. With these applications, we demonstrate that is possible to provide processed Big Geo Data to common users, using open geospatial standards and technologies. NoSQL databases like MongoDB and frameworks like RASDAMAN could offer different functionalities that facilitate working with larger volumes and more heterogeneous geospatial data sources.

  3. Big Data and Analytics in Healthcare.

    PubMed

    Tan, S S-L; Gao, G; Koch, S

    2015-01-01

    This editorial is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". The amount of data being generated in the healthcare industry is growing at a rapid rate. This has generated immense interest in leveraging the availability of healthcare data (and "big data") to improve health outcomes and reduce costs. However, the nature of healthcare data, and especially big data, presents unique challenges in processing and analyzing big data in healthcare. This Focus Theme aims to disseminate some novel approaches to address these challenges. More specifically, approaches ranging from efficient methods of processing large clinical data to predictive models that could generate better predictions from healthcare data are presented.

  4. The big data processing platform for intelligent agriculture

    NASA Astrophysics Data System (ADS)

    Huang, Jintao; Zhang, Lichen

    2017-08-01

    Big data technology is another popular technology after the Internet of Things and cloud computing. Big data is widely used in many fields such as social platform, e-commerce, and financial analysis and so on. Intelligent agriculture in the course of the operation will produce large amounts of data of complex structure, fully mining the value of these data for the development of agriculture will be very meaningful. This paper proposes an intelligent data processing platform based on Storm and Cassandra to realize the storage and management of big data of intelligent agriculture.

  5. Benchmarking Big Data Systems and the BigData Top100 List.

    PubMed

    Baru, Chaitanya; Bhandarkar, Milind; Nambiar, Raghunath; Poess, Meikel; Rabl, Tilmann

    2013-03-01

    "Big data" has become a major force of innovation across enterprises of all sizes. New platforms with increasingly more features for managing big datasets are being announced almost on a weekly basis. Yet, there is currently a lack of any means of comparability among such platforms. While the performance of traditional database systems is well understood and measured by long-established institutions such as the Transaction Processing Performance Council (TCP), there is neither a clear definition of the performance of big data systems nor a generally agreed upon metric for comparing these systems. In this article, we describe a community-based effort for defining a big data benchmark. Over the past year, a Big Data Benchmarking Community has become established in order to fill this void. The effort focuses on defining an end-to-end application-layer benchmark for measuring the performance of big data applications, with the ability to easily adapt the benchmark specification to evolving challenges in the big data space. This article describes the efforts that have been undertaken thus far toward the definition of a BigData Top100 List. While highlighting the major technical as well as organizational challenges, through this article, we also solicit community input into this process.

  6. The New Improved Big6 Workshop Handbook. Professional Growth Series.

    ERIC Educational Resources Information Center

    Eisenberg, Michael B.; Berkowitz, Robert E.

    This handbook is intended to help classroom teachers, teacher-librarians, technology teachers, administrators, parents, community members, and students to learn about the Big6 Skills approach to information and technology skills, to use the Big6 process in their own activities, and to implement a Big6 information and technology skills program. The…

  7. a Hadoop-Based Distributed Framework for Efficient Managing and Processing Big Remote Sensing Images

    NASA Astrophysics Data System (ADS)

    Wang, C.; Hu, F.; Hu, X.; Zhao, S.; Wen, W.; Yang, C.

    2015-07-01

    Various sensors from airborne and satellite platforms are producing large volumes of remote sensing images for mapping, environmental monitoring, disaster management, military intelligence, and others. However, it is challenging to efficiently storage, query and process such big data due to the data- and computing- intensive issues. In this paper, a Hadoop-based framework is proposed to manage and process the big remote sensing data in a distributed and parallel manner. Especially, remote sensing data can be directly fetched from other data platforms into the Hadoop Distributed File System (HDFS). The Orfeo toolbox, a ready-to-use tool for large image processing, is integrated into MapReduce to provide affluent image processing operations. With the integration of HDFS, Orfeo toolbox and MapReduce, these remote sensing images can be directly processed in parallel in a scalable computing environment. The experiment results show that the proposed framework can efficiently manage and process such big remote sensing data.

  8. Analytic Strategies of Streaming Data for eHealth.

    PubMed

    Yoon, Sunmoo

    2016-01-01

    New analytic strategies for streaming big data from wearable devices and social media are emerging in ehealth. We face challenges to find meaningful patterns from big data because researchers face difficulties to process big volume of streaming data using traditional processing applications.1 This introductory 180 minutes tutorial offers hand-on instruction on analytics2 (e.g., topic modeling, social network analysis) of streaming data. This tutorial aims to provide practical strategies of information on reducing dimensionality using examples of big data. This tutorial will highlight strategies of incorporating domain experts and a comprehensive approach to streaming social media data.

  9. Beyond the Bells and Whistles: Technology Skills for a Purpose.

    ERIC Educational Resources Information Center

    Eisenberg, Michael B.

    2001-01-01

    Discusses the goal of K-12 education to have students learn to use technology, defines computer literacy, and describes the Big6 process model that helps solve information problems. Highlights include examples of technology in Big6 contexts, Big6 and the Internet, and the Big6 as a conceptual framework for meaningful technology use. (LRW)

  10. Big Data Usage Patterns in the Health Care Domain: A Use Case Driven Approach Applied to the Assessment of Vaccination Benefits and Risks. Contribution of the IMIA Primary Healthcare Working Group.

    PubMed

    Liyanage, H; de Lusignan, S; Liaw, S-T; Kuziemsky, C E; Mold, F; Krause, P; Fleming, D; Jones, S

    2014-08-15

    Generally benefits and risks of vaccines can be determined from studies carried out as part of regulatory compliance, followed by surveillance of routine data; however there are some rarer and more long term events that require new methods. Big data generated by increasingly affordable personalised computing, and from pervasive computing devices is rapidly growing and low cost, high volume, cloud computing makes the processing of these data inexpensive. To describe how big data and related analytical methods might be applied to assess the benefits and risks of vaccines. We reviewed the literature on the use of big data to improve health, applied to generic vaccine use cases, that illustrate benefits and risks of vaccination. We defined a use case as the interaction between a user and an information system to achieve a goal. We used flu vaccination and pre-school childhood immunisation as exemplars. We reviewed three big data use cases relevant to assessing vaccine benefits and risks: (i) Big data processing using crowdsourcing, distributed big data processing, and predictive analytics, (ii) Data integration from heterogeneous big data sources, e.g. the increasing range of devices in the "internet of things", and (iii) Real-time monitoring for the direct monitoring of epidemics as well as vaccine effects via social media and other data sources. Big data raises new ethical dilemmas, though its analysis methods can bring complementary real-time capabilities for monitoring epidemics and assessing vaccine benefit-risk balance.

  11. Big Data Usage Patterns in the Health Care Domain: A Use Case Driven Approach Applied to the Assessment of Vaccination Benefits and Risks

    PubMed Central

    Liyanage, H.; Liaw, S-T.; Kuziemsky, C.; Mold, F.; Krause, P.; Fleming, D.; Jones, S.

    2014-01-01

    Summary Background Generally benefits and risks of vaccines can be determined from studies carried out as part of regulatory compliance, followed by surveillance of routine data; however there are some rarer and more long term events that require new methods. Big data generated by increasingly affordable personalised computing, and from pervasive computing devices is rapidly growing and low cost, high volume, cloud computing makes the processing of these data inexpensive. Objective To describe how big data and related analytical methods might be applied to assess the benefits and risks of vaccines. Method: We reviewed the literature on the use of big data to improve health, applied to generic vaccine use cases, that illustrate benefits and risks of vaccination. We defined a use case as the interaction between a user and an information system to achieve a goal. We used flu vaccination and pre-school childhood immunisation as exemplars. Results We reviewed three big data use cases relevant to assessing vaccine benefits and risks: (i) Big data processing using crowd-sourcing, distributed big data processing, and predictive analytics, (ii) Data integration from heterogeneous big data sources, e.g. the increasing range of devices in the “internet of things”, and (iii) Real-time monitoring for the direct monitoring of epidemics as well as vaccine effects via social media and other data sources. Conclusions Big data raises new ethical dilemmas, though its analysis methods can bring complementary real-time capabilities for monitoring epidemics and assessing vaccine benefit-risk balance. PMID:25123718

  12. Big History or the 13800 million years from the Big Bang to the Human Brain

    NASA Astrophysics Data System (ADS)

    Gústafsson, Ludvik E.

    2017-04-01

    Big History is the integrated history of the Cosmos, Earth, Life, and Humanity. It is an attempt to understand our existence as a continuous unfolding of processes leading to ever more complex structures. Three major steps in the development of the Universe can be distinguished, the first being the creation of matter/energy and forces in the context of an expanding universe, while the second and third steps were reached when completely new qualities of matter came into existence. 1. Matter comes out of nothing Quantum fluctuations and the inflation event are thought to be responsible for the creation of stable matter particles in what is called the Big Bang. Along with simple particles the universe is formed. Later larger particles like atoms and the most simple chemical elements hydrogen and helium evolved. Gravitational contraction of hydrogen and helium formed the first stars und later on the first galaxies. Massive stars ended their lives in violent explosions releasing heavier elements like carbon, oxygen, nitrogen, sulfur and iron into the universe. Subsequent star formation led to star systems with bodies containing these heavier elements. 2. Matter starts to live About 9200 million years after the Big Bang a rather inconspicous star of middle size formed in one of a billion galaxies. The leftovers of the star formation clumped into bodies rotating around the central star. In some of them elements like silicon, oxygen, iron and many other became the dominant matter. On the third of these bodies from the central star much of the surface was covered with an already very common chemical compound in the universe, water. Fluid water and plenty of various elements, especially carbon, were the ingredients of very complex chemical compounds that made up even more complex structures. These were able to replicate themselves. Life had appeared, the only occasion that we human beings know of. Life evolved subsequently leading eventually to the formation of multicellular structures like plants, animals and fungi. 3. Matter starts to think A comet or an asteroid crashed into Earth about 66 million years ago, ending the dominance of dinosaurs. Small animals giving birth to living offspring were now able to evolve into a multitude of species, among them the primates. A group of primates migrated from Africa to other continents less than 100000 years ago. Their brain developed a special quality, self-conscience. This ability to reflect about oneself boosted their survival considerably. Man (Homo sapiens) had entered the scene, becoming one of the dominant species of this planet. Due to his immense ability today to handle matter and energy he has become something of a caretaker of planet Earth. Man is responsible for sustainable development for the good of his society and of the whole biosphere. If there is a fourth step in the history of the universe, discoveries in astrobiology may provide us with some clues in the next decades.

  13. Identity processes and personality traits and types in adolescence: directionality of effects and developmental trajectories.

    PubMed

    Luyckx, Koen; Teppers, Eveline; Klimstra, Theo A; Rassart, Jessica

    2014-08-01

    Personality traits are hypothesized to be among the most important factors contributing to individual differences in identity development. However, longitudinal studies linking Big Five personality traits to contemporary identity models (in which multiple exploration and commitment processes are distinguished) are largely lacking. To gain more insight in the directionality of effect and the developmental interdependence of the Big Five and identity processes as forwarded in multilayered personality models, the present study assessed personality and identity in 1,037 adolescents 4 times over a period of 3 years. First, using cross-lagged path analysis, Big Five traits emerged as consistent predictors of identity exploration processes, whereas only one significant path from identity exploration to the Big Five was found. Second, using latent class growth analysis, 3 Big Five trajectory classes were identified, resembling the distinctions typically made between resilients, overcontrollers, and undercontrollers. These classes were characterized by different initial levels and (to a lesser extent) rates of change in commitment and exploration processes. In sum, important developmental associations linking personality traits to identity processes were uncovered, emphasizing the potential role of personality traits in identity development. Developmental implications and suggestions for future research are discussed. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  14. BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry.

    PubMed

    Tetko, Igor V; Engkvist, Ola; Koch, Uwe; Reymond, Jean-Louis; Chen, Hongming

    2016-12-01

    The increasing volume of biomedical data in chemistry and life sciences requires the development of new methods and approaches for their handling. Here, we briefly discuss some challenges and opportunities of this fast growing area of research with a focus on those to be addressed within the BIGCHEM project. The article starts with a brief description of some available resources for "Big Data" in chemistry and a discussion of the importance of data quality. We then discuss challenges with visualization of millions of compounds by combining chemical and biological data, the expectations from mining the "Big Data" using advanced machine-learning methods, and their applications in polypharmacology prediction and target de-convolution in phenotypic screening. We show that the efficient exploration of billions of molecules requires the development of smart strategies. We also address the issue of secure information sharing without disclosing chemical structures, which is critical to enable bi-party or multi-party data sharing. Data sharing is important in the context of the recent trend of "open innovation" in pharmaceutical industry, which has led to not only more information sharing among academics and pharma industries but also the so-called "precompetitive" collaboration between pharma companies. At the end we highlight the importance of education in "Big Data" for further progress of this area. © 2016 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.

  15. BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry

    PubMed Central

    Engkvist, Ola; Koch, Uwe; Reymond, Jean‐Louis; Chen, Hongming

    2016-01-01

    Abstract The increasing volume of biomedical data in chemistry and life sciences requires the development of new methods and approaches for their handling. Here, we briefly discuss some challenges and opportunities of this fast growing area of research with a focus on those to be addressed within the BIGCHEM project. The article starts with a brief description of some available resources for “Big Data” in chemistry and a discussion of the importance of data quality. We then discuss challenges with visualization of millions of compounds by combining chemical and biological data, the expectations from mining the “Big Data” using advanced machine‐learning methods, and their applications in polypharmacology prediction and target de‐convolution in phenotypic screening. We show that the efficient exploration of billions of molecules requires the development of smart strategies. We also address the issue of secure information sharing without disclosing chemical structures, which is critical to enable bi‐party or multi‐party data sharing. Data sharing is important in the context of the recent trend of “open innovation” in pharmaceutical industry, which has led to not only more information sharing among academics and pharma industries but also the so‐called “precompetitive” collaboration between pharma companies. At the end we highlight the importance of education in “Big Data” for further progress of this area. PMID:27464907

  16. The big data-big model (BDBM) challenges in ecological research

    NASA Astrophysics Data System (ADS)

    Luo, Y.

    2015-12-01

    The field of ecology has become a big-data science in the past decades due to development of new sensors used in numerous studies in the ecological community. Many sensor networks have been established to collect data. For example, satellites, such as Terra and OCO-2 among others, have collected data relevant on global carbon cycle. Thousands of field manipulative experiments have been conducted to examine feedback of terrestrial carbon cycle to global changes. Networks of observations, such as FLUXNET, have measured land processes. In particular, the implementation of the National Ecological Observatory Network (NEON), which is designed to network different kinds of sensors at many locations over the nation, will generate large volumes of ecological data every day. The raw data from sensors from those networks offer an unprecedented opportunity for accelerating advances in our knowledge of ecological processes, educating teachers and students, supporting decision-making, testing ecological theory, and forecasting changes in ecosystem services. Currently, ecologists do not have the infrastructure in place to synthesize massive yet heterogeneous data into resources for decision support. It is urgent to develop an ecological forecasting system that can make the best use of multiple sources of data to assess long-term biosphere change and anticipate future states of ecosystem services at regional and continental scales. Forecasting relies on big models that describe major processes that underlie complex system dynamics. Ecological system models, despite great simplification of the real systems, are still complex in order to address real-world problems. For example, Community Land Model (CLM) incorporates thousands of processes related to energy balance, hydrology, and biogeochemistry. Integration of massive data from multiple big data sources with complex models has to tackle Big Data-Big Model (BDBM) challenges. Those challenges include interoperability of multiple, heterogeneous data sets; intractability of structural complexity of big models; equifinality of model structure selection and parameter estimation; and computational demand of global optimization with Big Models.

  17. Leveraging Big-Data for Business Process Analytics

    ERIC Educational Resources Information Center

    Vera-Baquero, Alejandro; Colomo Palacios, Ricardo; Stantchev, Vladimir; Molloy, Owen

    2015-01-01

    Purpose: This paper aims to present a solution that enables organizations to monitor and analyse the performance of their business processes by means of Big Data technology. Business process improvement can drastically influence in the profit of corporations and helps them to remain viable. However, the use of traditional Business Intelligence…

  18. Using 'big data' to validate claims made in the pharmaceutical approval process.

    PubMed

    Wasser, Thomas; Haynes, Kevin; Barron, John; Cziraky, Mark

    2015-01-01

    Big Data in the healthcare setting refers to the storage, assimilation, and analysis of large quantities of information regarding patient care. These data can be collected and stored in a wide variety of ways including electronic medical records collected at the patient bedside, or through medical records that are coded and passed to insurance companies for reimbursement. When these data are processed it is possible to validate claims as a part of the regulatory review process regarding the anticipated performance of medications and devices. In order to analyze properly claims by manufacturers and others, there is a need to express claims in terms that are testable in a timeframe that is useful and meaningful to formulary committees. Claims for the comparative benefits and costs, including budget impact, of products and devices need to be expressed in measurable terms, ideally in the context of submission or validation protocols. Claims should be either consistent with accessible Big Data or able to support observational studies where Big Data identifies target populations. Protocols should identify, in disaggregated terms, key variables that would lead to direct or proxy validation. Once these variables are identified, Big Data can be used to query massive quantities of data in the validation process. Research can be passive or active in nature. Passive, where the data are collected retrospectively; active where the researcher is prospectively looking for indicators of co-morbid conditions, side-effects or adverse events, testing these indicators to determine if claims are within desired ranges set forth by the manufacturer. Additionally, Big Data can be used to assess the effectiveness of therapy through health insurance records. This, for example, could indicate that disease or co-morbid conditions cease to be treated. Understanding the basic strengths and weaknesses of Big Data in the claim validation process provides a glimpse of the value that this research can provide to industry. Big Data can support a research agenda that focuses on the process of claims validation to support formulary submissions as well as inputs to ongoing disease area and therapeutic class reviews.

  19. Based Real Time Remote Health Monitoring Systems: A Review on Patients Prioritization and Related "Big Data" Using Body Sensors information and Communication Technology.

    PubMed

    Kalid, Naser; Zaidan, A A; Zaidan, B B; Salman, Omar H; Hashim, M; Muzammil, H

    2017-12-29

    The growing worldwide population has increased the need for technologies, computerised software algorithms and smart devices that can monitor and assist patients anytime and anywhere and thus enable them to lead independent lives. The real-time remote monitoring of patients is an important issue in telemedicine. In the provision of healthcare services, patient prioritisation poses a significant challenge because of the complex decision-making process it involves when patients are considered 'big data'. To our knowledge, no study has highlighted the link between 'big data' characteristics and real-time remote healthcare monitoring in the patient prioritisation process, as well as the inherent challenges involved. Thus, we present comprehensive insights into the elements of big data characteristics according to the six 'Vs': volume, velocity, variety, veracity, value and variability. Each of these elements is presented and connected to a related part in the study of the connection between patient prioritisation and real-time remote healthcare monitoring systems. Then, we determine the weak points and recommend solutions as potential future work. This study makes the following contributions. (1) The link between big data characteristics and real-time remote healthcare monitoring in the patient prioritisation process is described. (2) The open issues and challenges for big data used in the patient prioritisation process are emphasised. (3) As a recommended solution, decision making using multiple criteria, such as vital signs and chief complaints, is utilised to prioritise the big data of patients with chronic diseases on the basis of the most urgent cases.

  20. Big Data, Internet of Things and Cloud Convergence--An Architecture for Secure E-Health Applications.

    PubMed

    Suciu, George; Suciu, Victor; Martian, Alexandru; Craciunescu, Razvan; Vulpe, Alexandru; Marcu, Ioana; Halunga, Simona; Fratu, Octavian

    2015-11-01

    Big data storage and processing are considered as one of the main applications for cloud computing systems. Furthermore, the development of the Internet of Things (IoT) paradigm has advanced the research on Machine to Machine (M2M) communications and enabled novel tele-monitoring architectures for E-Health applications. However, there is a need for converging current decentralized cloud systems, general software for processing big data and IoT systems. The purpose of this paper is to analyze existing components and methods of securely integrating big data processing with cloud M2M systems based on Remote Telemetry Units (RTUs) and to propose a converged E-Health architecture built on Exalead CloudView, a search based application. Finally, we discuss the main findings of the proposed implementation and future directions.

  1. ETV Program Report: Big Fish Septage and High Strength Waste Water Treatment System

    EPA Science Inventory

    Verification testing of the Big Fish Environmental Septage and High Strength Wastewater Processing System for treatment of high-strength wastewater was conducted at the Big Fish facility in Charlevoix, Michigan. Testing was conducted over a 13-month period to address different c...

  2. Opportunity and Challenges for Migrating Big Data Analytics in Cloud

    NASA Astrophysics Data System (ADS)

    Amitkumar Manekar, S.; Pradeepini, G., Dr.

    2017-08-01

    Big Data Analytics is a big word now days. As per demanding and more scalable process data generation capabilities, data acquisition and storage become a crucial issue. Cloud storage is a majorly usable platform; the technology will become crucial to executives handling data powered by analytics. Now a day’s trend towards “big data-as-a-service” is talked everywhere. On one hand, cloud-based big data analytics exactly tackle in progress issues of scale, speed, and cost. But researchers working to solve security and other real-time problem of big data migration on cloud based platform. This article specially focused on finding possible ways to migrate big data to cloud. Technology which support coherent data migration and possibility of doing big data analytics on cloud platform is demanding in natute for new era of growth. This article also gives information about available technology and techniques for migration of big data in cloud.

  3. THE FASTEST OODA LOOP: THE IMPLICATIONS OF BIG DATA FOR AIR POWER

    DTIC Science & Technology

    2016-06-01

    AIR COMMAND AND STAFF COLLEGE AIR UNIVERSITY THE FASTEST OODA LOOP : THE IMPLICATIONS OF BIG DATA FOR AIR POWER by Aaron J. Dove, Maj, USAF A...Use of Big Data Thus Far..............................................................16 The Big Data Boost To The OODA Loop ...processed with enough accuracy that it required minimal to no human or man-in-the loop vetting of the information through Command and Control (C2

  4. Big Data Analytics for a Smart Green Infrastructure Strategy

    NASA Astrophysics Data System (ADS)

    Barrile, Vincenzo; Bonfa, Stefano; Bilotta, Giuliana

    2017-08-01

    As well known, Big Data is a term for data sets so large or complex that traditional data processing applications aren’t sufficient to process them. The term “Big Data” is referred to using predictive analytics. It is often related to user behavior analytics, or other advanced data analytics methods which from data extract value, and rarely to a particular size of data set. This is especially true for the huge amount of Earth Observation data that satellites constantly orbiting the earth daily transmit.

  5. Big Data Provenance: Challenges, State of the Art and Opportunities.

    PubMed

    Wang, Jianwu; Crawl, Daniel; Purawat, Shweta; Nguyen, Mai; Altintas, Ilkay

    2015-01-01

    Ability to track provenance is a key feature of scientific workflows to support data lineage and reproducibility. The challenges that are introduced by the volume, variety and velocity of Big Data, also pose related challenges for provenance and quality of Big Data, defined as veracity. The increasing size and variety of distributed Big Data provenance information bring new technical challenges and opportunities throughout the provenance lifecycle including recording, querying, sharing and utilization. This paper discusses the challenges and opportunities of Big Data provenance related to the veracity of the datasets themselves and the provenance of the analytical processes that analyze these datasets. It also explains our current efforts towards tracking and utilizing Big Data provenance using workflows as a programming model to analyze Big Data.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Agrawal, Rakesh

    This project sought and successfully answered two big challenges facing the creation of low-energy, cost-effective, zeotropic multi-component distillation processes: first, identification of an efficient search space that includes all the useful distillation configurations and no undesired configurations; second, development of an algorithm to search the space efficiently and generate an array of low-energy options for industrial multi-component mixtures. Such mixtures are found in large-scale chemical and petroleum plants. Commercialization of our results was addressed by building a user interface allowing practical application of our methods for industrial problems by anyone with basic knowledge of distillation for a given problem. Wemore » also provided our algorithm to a major U.S. Chemical Company for use by the practitioners. The successful execution of this program has provided methods and algorithms at the disposal of process engineers to readily generate low-energy solutions for a large class of multicomponent distillation problems in a typical chemical and petrochemical plant. In a petrochemical complex, the distillation trains within crude oil processing, hydrotreating units containing alkylation, isomerization, reformer, LPG (liquefied petroleum gas) and NGL (natural gas liquids) processing units can benefit from our results. Effluents from naphtha crackers and ethane-propane crackers typically contain mixtures of methane, ethylene, ethane, propylene, propane, butane and heavier hydrocarbons. We have shown that our systematic search method with a more complete search space, along with the optimization algorithm, has a potential to yield low-energy distillation configurations for all such applications with energy savings up to 50%.« less

  7. Using the Big6[TM] To Plan Instruction and Services.

    ERIC Educational Resources Information Center

    Kearns, Jodi L.

    2000-01-01

    Explains how the relationship between school library collection development, curriculum development, and information problem solving can be improved by applying the Big6 Skills process to the selection of materials and teacher collaboration. Includes charts for cooperative planning that follow the Big6 Skills. (Contains 3 references.) (LRW)

  8. How to Use TCM Informatics to Study Traditional Chinese Medicine in Big Data Age.

    PubMed

    Shi, Cheng; Gong, Qing-Yue; Zhou, Jinhai

    2017-01-01

    This paper introduces the characteristics and complexity of traditional Chinese medicine (TCM) data, considers that modern big data processing technology has brought new opportunities for the research of TCM, and gives some ideas and methods to apply big data technology in TCM.

  9. Evaluation of Macroinvertebrate Data Based on Autoecological Information

    NASA Astrophysics Data System (ADS)

    Juhász, I.

    2016-12-01

    Various data (biological, chemical, hydrological and morphological) have been gathered within the frame of the monitoring of the Water Framework Directive from 2007 in Hungary. This data only used a status assessment of certain water bodies in Hungary. The macroinvertebrates indicate many environmental factors well; therefore, they are very useful in detecting changes in the status of an environment. The main aim in this research was to investigate changes in environmental variables and decide how these variables cause big changes in the macroinvertebrate fauna. The macroinvertebrate data was processed using the ASTERICS 4.0.4 program. The program calculated some important metrics (i.e., microhabitat distributions, longitudinal zonation, functional feeding guilds, etc.). These metrics were compared with the chemical and hydrological data. The main conclusion is that if we have enough of a frequency and quality of macroinvertebrate data, we can understand changes in the environment of an ecosystem.

  10. Challenges and potential solutions for big data implementations in developing countries.

    PubMed

    Luna, D; Mayan, J C; García, M J; Almerares, A A; Househ, M

    2014-08-15

    The volume of data, the velocity with which they are generated, and their variety and lack of structure hinder their use. This creates the need to change the way information is captured, stored, processed, and analyzed, leading to the paradigm shift called Big Data. To describe the challenges and possible solutions for developing countries when implementing Big Data projects in the health sector. A non-systematic review of the literature was performed in PubMed and Google Scholar. The following keywords were used: "big data", "developing countries", "data mining", "health information systems", and "computing methodologies". A thematic review of selected articles was performed. There are challenges when implementing any Big Data program including exponential growth of data, special infrastructure needs, need for a trained workforce, need to agree on interoperability standards, privacy and security issues, and the need to include people, processes, and policies to ensure their adoption. Developing countries have particular characteristics that hinder further development of these projects. The advent of Big Data promises great opportunities for the healthcare field. In this article, we attempt to describe the challenges developing countries would face and enumerate the options to be used to achieve successful implementations of Big Data programs.

  11. High performance poly(etherketoneketone) (PEKK) composite parts fabricated using Big Area Additive Manufacturing (BAAM) processes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kunc, Vlastimil; Kishore, Vidya; Chen, Xun

    ORNL collaborated with Arkema Inc. to investigate poly(etherketoneketone) (PEKK) and its composites as potential feedstock material for Big Area Additive Manufacturing (BAAM) system. In this work thermal and rheological properties were investigated and characterized in order to identify suitable processing conditions and material flow behavior for BAAM process.

  12. Thinking big: Towards ideal strains and processes for large-scale aerobic biofuels production

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McMillan, James D.; Beckham, Gregg T.

    In this study, global concerns about anthropogenic climate change, energy security and independence, and environmental consequences of continued fossil fuel exploitation are driving significant public and private sector interest and financing to hasten development and deployment of processes to produce renewable fuels, as well as bio-based chemicals and materials, towards scales commensurate with current fossil fuel-based production. Over the past two decades, anaerobic microbial production of ethanol from first-generation hexose sugars derived primarily from sugarcane and starch has reached significant market share worldwide, with fermentation bioreactor sizes often exceeding the million litre scale. More recently, industrial-scale lignocellulosic ethanol plants aremore » emerging that produce ethanol from pentose and hexose sugars using genetically engineered microbes and bioreactor scales similar to first-generation biorefineries.« less

  13. Thinking big: Towards ideal strains and processes for large-scale aerobic biofuels production

    DOE PAGES

    McMillan, James D.; Beckham, Gregg T.

    2016-12-22

    In this study, global concerns about anthropogenic climate change, energy security and independence, and environmental consequences of continued fossil fuel exploitation are driving significant public and private sector interest and financing to hasten development and deployment of processes to produce renewable fuels, as well as bio-based chemicals and materials, towards scales commensurate with current fossil fuel-based production. Over the past two decades, anaerobic microbial production of ethanol from first-generation hexose sugars derived primarily from sugarcane and starch has reached significant market share worldwide, with fermentation bioreactor sizes often exceeding the million litre scale. More recently, industrial-scale lignocellulosic ethanol plants aremore » emerging that produce ethanol from pentose and hexose sugars using genetically engineered microbes and bioreactor scales similar to first-generation biorefineries.« less

  14. Water and Sediment Chemical Data and Data Summary for Samples Collected in 1999 and 2001 in the Goodpaster River Basin, Big Delta B-2 Quadrangle, Alaska

    USGS Publications Warehouse

    Wang, Bronwen; Gough, Larry; Wanty, Richard; Vohden, Jim; Crock, Jim; Day, Warren

    2006-01-01

    We report the chemical analysis for water and sediment collected from the Big Delta B-2 quadrangle. These data are part of a study located in the Big Delta B-2 quadrangle that focused on the integration of geology and bedrock geochemistry on with the biogeochemistry of water, sediments, soil, and vegetation. The discovery of the Pogo lode gold deposit in the northwest corner of the quadrangle was the impetus for this study. The study objectives were to create a geologic map, evaluate the bedrock geochemical influence on the geochemical signature of the surficial environment, and define landscape-level predevelopment geochemical baselines. Important to baseline development is an evaluation of what, if any, geochemical difference exists between the mineralized and non-mineralized areas within a watershed or between mineralized and non-mineralized watersheds. The analytic results for the bedrock, soils, and vegetation are reported elsewhere. Presented here, with minimal interpretation, is the analytic data for the water and sediment samples collected in the summers of 1999 and 2001, and a summary statistics of these analyses.

  15. Rethinking big data: A review on the data quality and usage issues

    NASA Astrophysics Data System (ADS)

    Liu, Jianzheng; Li, Jie; Li, Weifeng; Wu, Jiansheng

    2016-05-01

    The recent explosive publications of big data studies have well documented the rise of big data and its ongoing prevalence. Different types of ;big data; have emerged and have greatly enriched spatial information sciences and related fields in terms of breadth and granularity. Studies that were difficult to conduct in the past time due to data availability can now be carried out. However, big data brings lots of ;big errors; in data quality and data usage, which cannot be used as a substitute for sound research design and solid theories. We indicated and summarized the problems faced by current big data studies with regard to data collection, processing and analysis: inauthentic data collection, information incompleteness and noise of big data, unrepresentativeness, consistency and reliability, and ethical issues. Cases of empirical studies are provided as evidences for each problem. We propose that big data research should closely follow good scientific practice to provide reliable and scientific ;stories;, as well as explore and develop techniques and methods to mitigate or rectify those 'big-errors' brought by big data.

  16. Chemical Evolution of Binary Stars

    NASA Astrophysics Data System (ADS)

    Izzard, R. G.

    2013-02-01

    Energy generation by nuclear fusion is the fundamental process that prevents stars from collapsing under their own gravity. Fusion in the core of a star converts hydrogen to heavier elements from helium to uranium. The signature of this nucleosynthesis is often visible in a single star only for a very short time, for example while the star is a red giant or, in massive stars, when it explodes. Contrarily, in a binary system nuclear-processed matter can captured by a secondary star which remains chemically polluted long after its more massive companion star has evolved and died. By probing old, low-mass stars we gain vital insight into the complex nucleosynthesis that occurred when our Galaxy was much younger than it is today. Stellar evolution itself is also affected by the presence of a companion star. Thermonuclear novae and type Ia supernovae result from mass transfer in binary stars, but big questions still surround the nature of their progenitors. Stars may even merge and one of the challenges for the future of stellar astrophysics is to quantitatively understand what happens in such extreme systems. Binary stars offer unique insights into stellar, galactic and extragalactic astrophysics through their plethora of exciting phenomena. Understanding the chemical evolution of binary stars is thus of high priority in modern astrophysics.

  17. Big Data Provenance: Challenges, State of the Art and Opportunities

    PubMed Central

    Wang, Jianwu; Crawl, Daniel; Purawat, Shweta; Nguyen, Mai; Altintas, Ilkay

    2017-01-01

    Ability to track provenance is a key feature of scientific workflows to support data lineage and reproducibility. The challenges that are introduced by the volume, variety and velocity of Big Data, also pose related challenges for provenance and quality of Big Data, defined as veracity. The increasing size and variety of distributed Big Data provenance information bring new technical challenges and opportunities throughout the provenance lifecycle including recording, querying, sharing and utilization. This paper discusses the challenges and opportunities of Big Data provenance related to the veracity of the datasets themselves and the provenance of the analytical processes that analyze these datasets. It also explains our current efforts towards tracking and utilizing Big Data provenance using workflows as a programming model to analyze Big Data. PMID:29399671

  18. To What Extent Can the Big Five and Learning Styles Predict Academic Achievement

    ERIC Educational Resources Information Center

    Köseoglu, Yaman

    2016-01-01

    Personality traits and learning styles play defining roles in shaping academic achievement. 202 university students completed the Big Five personality traits questionnaire and the Inventory of Learning Processes Scale and self-reported their grade point averages. Conscientiousness and agreeableness, two of the Big Five personality traits, related…

  19. Pushing the polymer envelope

    NASA Astrophysics Data System (ADS)

    Tolley, Paul R.

    2005-09-01

    The pressure to "push the polymer envelope" is clear, given the exploding range of demanding applications with optical components. There are two keys to success: 1. Expanded range of polymers with suitable optical properties. 2. Sophisticated manufacturing process options with an overall system perspective: -Tolerances and costs established relative to need (proof-of-concept, prototype, low to high volume production). -Designed to integrate into an assembly that meets all environmental constraints, not just size and weight, which are natural polymer advantages. (Withstanding extreme temperatures and chemical exposure is often critical, as are easy clean-up and general resistance to surface damage.) -Highly repeatable. The thesis of this paper is that systematically innovating processes we already understand on materials we already know can deliver big returns. To illustrate, we introduce HRDT1, High Refraction Diamond Turning, a patent-pending processing option to significantly reduce total costs for high index, high thermal applications.

  20. Selective removal of cesium by ammonium molybdophosphate - polyacrylonitrile bead and membrane.

    PubMed

    Ding, Dahu; Zhang, Zhenya; Chen, Rongzhi; Cai, Tianming

    2017-02-15

    The selective removal of radionuclides with extremely low concentrations from environmental medium remains a big challenge. Ammonium molybdophosphate possess considerable selectivity towards cesium ion (Cs + ) due to the specific ion exchange between Cs + and NH 4 + . Ammonium molybdophosphate - polyacrylonitrile (AMP-PAN) membrane was successfully prepared for the first time in this study. Efficient removal of Cs + (95.7%, 94.1% and 91.3% of 1mgL -1 ) from solutions with high ionic strength (400mgL -1 of Na + , Ca 2+ or K + ) was achieved by AMP-PAN composite. Multilayer chemical adsorption process was testified through kinetic and isotherm studies. The estimated maximum adsorption capacities even reached 138.9±21.3mgg -1 . Specifically, the liquid film diffusion was identified as the rate-limiting step throughout the removal process. Finally, AMP-PAN membrane could eliminate Cs + from water effectively through the filtration adsorption process. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. Big Data in the Industry - Overview of Selected Issues

    NASA Astrophysics Data System (ADS)

    Gierej, Sylwia

    2017-12-01

    This article reviews selected issues related to the use of Big Data in the industry. The aim is to define the potential scope and forms of using large data sets in manufacturing companies. By systematically reviewing scientific and professional literature, selected issues related to the use of mass data analytics in production were analyzed. A definition of Big Data was presented, detailing its main attributes. The importance of mass data processing technology in the development of Industry 4.0 concept has been highlighted. Subsequently, attention was paid to issues such as production process optimization, decision making and mass production individualisation, and indicated the potential for large volumes of data. As a result, conclusions were drawn regarding the potential of using Big Data in the industry.

  2. Mechanism Profiling of Hepatotoxicity Caused by Oxidative Stress Using Antioxidant Response Element Reporter Gene Assay Models and Big Data.

    PubMed

    Kim, Marlene Thai; Huang, Ruili; Sedykh, Alexander; Wang, Wenyi; Xia, Menghang; Zhu, Hao

    2016-05-01

    Hepatotoxicity accounts for a substantial number of drugs being withdrawn from the market. Using traditional animal models to detect hepatotoxicity is expensive and time-consuming. Alternative in vitro methods, in particular cell-based high-throughput screening (HTS) studies, have provided the research community with a large amount of data from toxicity assays. Among the various assays used to screen potential toxicants is the antioxidant response element beta lactamase reporter gene assay (ARE-bla), which identifies chemicals that have the potential to induce oxidative stress and was used to test > 10,000 compounds from the Tox21 program. The ARE-bla computational model and HTS data from a big data source (PubChem) were used to profile environmental and pharmaceutical compounds with hepatotoxicity data. Quantitative structure-activity relationship (QSAR) models were developed based on ARE-bla data. The models predicted the potential oxidative stress response for known liver toxicants when no ARE-bla data were available. Liver toxicants were used as probe compounds to search PubChem Bioassay and generate a response profile, which contained thousands of bioassays (> 10 million data points). By ranking the in vitro-in vivo correlations (IVIVCs), the most relevant bioassay(s) related to hepatotoxicity were identified. The liver toxicants profile contained the ARE-bla and relevant PubChem assays. Potential toxicophores for well-known toxicants were created by identifying chemical features that existed only in compounds with high IVIVCs. Profiling chemical IVIVCs created an opportunity to fully explore the source-to-outcome continuum of modern experimental toxicology using cheminformatics approaches and big data sources. Kim MT, Huang R, Sedykh A, Wang W, Xia M, Zhu H. 2016. Mechanism profiling of hepatotoxicity caused by oxidative stress using antioxidant response element reporter gene assay models and big data. Environ Health Perspect 124:634-641; http://dx.doi.org/10.1289/ehp.1509763.

  3. Mechanism Profiling of Hepatotoxicity Caused by Oxidative Stress Using Antioxidant Response Element Reporter Gene Assay Models and Big Data

    PubMed Central

    Kim, Marlene Thai; Huang, Ruili; Sedykh, Alexander; Wang, Wenyi; Xia, Menghang; Zhu, Hao

    2015-01-01

    Background: Hepatotoxicity accounts for a substantial number of drugs being withdrawn from the market. Using traditional animal models to detect hepatotoxicity is expensive and time-consuming. Alternative in vitro methods, in particular cell-based high-throughput screening (HTS) studies, have provided the research community with a large amount of data from toxicity assays. Among the various assays used to screen potential toxicants is the antioxidant response element beta lactamase reporter gene assay (ARE-bla), which identifies chemicals that have the potential to induce oxidative stress and was used to test > 10,000 compounds from the Tox21 program. Objective: The ARE-bla computational model and HTS data from a big data source (PubChem) were used to profile environmental and pharmaceutical compounds with hepatotoxicity data. Methods: Quantitative structure–activity relationship (QSAR) models were developed based on ARE-bla data. The models predicted the potential oxidative stress response for known liver toxicants when no ARE-bla data were available. Liver toxicants were used as probe compounds to search PubChem Bioassay and generate a response profile, which contained thousands of bioassays (> 10 million data points). By ranking the in vitro–in vivo correlations (IVIVCs), the most relevant bioassay(s) related to hepatotoxicity were identified. Results: The liver toxicants profile contained the ARE-bla and relevant PubChem assays. Potential toxicophores for well-known toxicants were created by identifying chemical features that existed only in compounds with high IVIVCs. Conclusion: Profiling chemical IVIVCs created an opportunity to fully explore the source-to-outcome continuum of modern experimental toxicology using cheminformatics approaches and big data sources. Citation: Kim MT, Huang R, Sedykh A, Wang W, Xia M, Zhu H. 2016. Mechanism profiling of hepatotoxicity caused by oxidative stress using antioxidant response element reporter gene assay models and big data. Environ Health Perspect 124:634–641; http://dx.doi.org/10.1289/ehp.1509763 PMID:26383846

  4. Lexical Link Analysis Application: Improving Web Service to Acquisition Visibility Portal Phase III

    DTIC Science & Technology

    2015-04-30

    It is a supervised learning method but best for Big Data with low dimensions. It is an approximate inference good for Big Data and Hadoop ...Each process produces large amounts of information ( Big Data ). There is a critical need for automation, validation, and discovery to help acquisition...can inform managers where areas might have higher program risk and how resource and big data management might affect the desired return on investment

  5. [Big data in medicine and healthcare].

    PubMed

    Rüping, Stefan

    2015-08-01

    Healthcare is one of the business fields with the highest Big Data potential. According to the prevailing definition, Big Data refers to the fact that data today is often too large and heterogeneous and changes too quickly to be stored, processed, and transformed into value by previous technologies. The technological trends drive Big Data: business processes are more and more executed electronically, consumers produce more and more data themselves - e.g. in social networks - and finally ever increasing digitalization. Currently, several new trends towards new data sources and innovative data analysis appear in medicine and healthcare. From the research perspective, omics-research is one clear Big Data topic. In practice, the electronic health records, free open data and the "quantified self" offer new perspectives for data analytics. Regarding analytics, significant advances have been made in the information extraction from text data, which unlocks a lot of data from clinical documentation for analytics purposes. At the same time, medicine and healthcare is lagging behind in the adoption of Big Data approaches. This can be traced to particular problems regarding data complexity and organizational, legal, and ethical challenges. The growing uptake of Big Data in general and first best-practice examples in medicine and healthcare in particular, indicate that innovative solutions will be coming. This paper gives an overview of the potentials of Big Data in medicine and healthcare.

  6. Mineralogy and grain size of surficial sediment from the Big Lost River drainage and vicinity, with chemical and physical characteristics of geologic materials from selected sites at the Idaho National Engineering Laboratory, Idaho

    USGS Publications Warehouse

    Bartholomay, R.C.; Knobel, L.L.; Davis, L.C.

    1989-01-01

    The U.S. Geological Survey 's Idaho National Engineering Laboratory project office, in cooperation with the U.S. Department of Energy, collected 35 samples of surficial sediments from the Big Lost River drainage and vicinity from July 1987 through August 1988 for analysis of grain-size distribution, bulk mineralogy, and clay mineralogy. Samples were collected from 11 sites in the channel and 5 sites in overbank deposits of the Big Lost River, 6 sites in the spreading areas that receive excess flow from the Big Lost River during peak flow conditions, 7 sites in the natural sinks and playas of the Big Lost River, 1 site in the Little Lost River Sink, and 5 sites from other small, isolated closed basins. Eleven samples from the Big Lost River channel deposits had a mean of 1.9 and median of 0.8 weight percent in the less than 0.062 mm fraction. The other 24 samples had a mean of 63.3 and median of 63.7 weight percent for the same size fraction. Mineralogy data are consistent with grain-size data. The Big Lost River channel deposits had mean and median percent mineral abundances of total clays and detrital mica of 10 and 10%, respectively, whereas the remaining 24 samples had mean and median values of 24% and 22.5% , respectively. (USGS)

  7. Modeling regeneration responses of big sagebrush (Artemisia tridentata) to abiotic conditions

    USGS Publications Warehouse

    Schlaepfer, Daniel R.; Lauenroth, William K.; Bradford, John B.

    2014-01-01

    Ecosystems dominated by big sagebrush, Artemisia tridentata Nuttall (Asteraceae), which are the most widespread ecosystems in semiarid western North America, have been affected by land use practices and invasive species. Loss of big sagebrush and the decline of associated species, such as greater sage-grouse, are a concern to land managers and conservationists. However, big sagebrush regeneration remains difficult to achieve by restoration and reclamation efforts and there is no regeneration simulation model available. We present here the first process-based, daily time-step, simulation model to predict yearly big sagebrush regeneration including relevant germination and seedling responses to abiotic factors. We estimated values, uncertainty, and importance of 27 model parameters using a total of 1435 site-years of observation. Our model explained 74% of variability of number of years with successful regeneration at 46 sites. It also achieved 60% overall accuracy predicting yearly regeneration success/failure. Our results identify specific future research needed to improve our understanding of big sagebrush regeneration, including data at the subspecies level and improved parameter estimates for start of seed dispersal, modified wet thermal-time model of germination, and soil water potential influences. We found that relationships between big sagebrush regeneration and climate conditions were site specific, varying across the distribution of big sagebrush. This indicates that statistical models based on climate are unsuitable for understanding range-wide regeneration patterns or for assessing the potential consequences of changing climate on sagebrush regeneration and underscores the value of this process-based model. We used our model to predict potential regeneration across the range of sagebrush ecosystems in the western United States, which confirmed that seedling survival is a limiting factor, whereas germination is not. Our results also suggested that modeled regeneration suitability is necessary but not sufficient to explain sagebrush presence. We conclude that future assessment of big sagebrush responses to climate change will need to account for responses of regenerative stages using a process-based understanding, such as provided by our model.

  8. Telecom Big Data for Urban Transport Analysis - a Case Study of Split-Dalmatia County in Croatia

    NASA Astrophysics Data System (ADS)

    Baučić, M.; Jajac, N.; Bućan, M.

    2017-09-01

    Today, big data has become widely available and the new technologies are being developed for big data storage architecture and big data analytics. An ongoing challenge is how to incorporate big data into GIS applications supporting the various domains. International Transport Forum explains how the arrival of big data and real-time data, together with new data processing algorithms lead to new insights and operational improvements of transport. Based on the telecom customer data, the Study of Tourist Movement and Traffic in Split-Dalmatia County in Croatia is carried out as a part of the "IPA Adriatic CBC//N.0086/INTERMODAL" project. This paper briefly explains the big data used in the study and the results of the study. Furthermore, this paper investigates the main considerations when using telecom customer big data: data privacy and data quality. The paper concludes with GIS visualisation and proposes the further use of big data used in the study.

  9. iMARS--mutation analysis reporting software: an analysis of spontaneous cII mutation spectra.

    PubMed

    Morgan, Claire; Lewis, Paul D

    2006-01-31

    The sensitivity of any mutational assay is determined by the level at which spontaneous mutations occur in the corresponding untreated controls. Establishing the type and frequency at which mutations occur naturally within a test system is essential if one is to draw scientifically sound conclusions regarding chemically induced mutations. Currently, mutation-spectra analysis is laborious and time-consuming. Thus, we have developed iMARS, a comprehensive mutation-spectrum analysis package that utilises routinely used methodologies and visualisation tools. To demonstrate the use and capabilities of iMARS, we have analysed the distribution, types and sequence context of spontaneous base substitutions derived from the cII gene mutation assay in transgenic animals. Analysis of spontaneous mutation spectra revealed variation both within and between the transgenic rodent test systems Big Blue Mouse, MutaMouse and Big Blue Rat. The most common spontaneous base substitutions were G:C-->A:T transitions and G:C-->T:A transversions. All Big Blue Mouse spectra were significantly different from each other by distribution and nearly all by mutation type, whereas the converse was true for the other test systems. Twenty-eight mutation hotspots were observed across all spectra generally occurring in CG, GA/TC, GG and GC dinucleotides. A mutation hotspot at nucleotide 212 occurred at a higher frequency in MutaMouse and Big Blue Rat. In addition, CG dinucleotides were the most mutable in all spectra except two Big Blue Mouse spectra. Thus, spontaneous base-substitution spectra showed more variation in distribution, type and sequence context in Big Blue Mouse relative to spectra derived from MutaMouse and Big Blue Rat. The results of our analysis provide a baseline reference for mutation studies utilising the cII gene in transgenic rodent models. The potential differences in spontaneous base-substitution spectra should be considered when making comparisons between these test systems. The ease at which iMARS has allowed us to carry out an exhaustive investigation to assess mutation distribution, mutation type, strand bias, target sequences and motifs, as well as predict mutation hotspots provides us with a valuable tool in helping to distinguish true chemically induced hotspots from background mutations and gives a true reflection of mutation frequency.

  10. Informatics in neurocritical care: new ideas for Big Data.

    PubMed

    Flechet, Marine; Grandas, Fabian Güiza; Meyfroidt, Geert

    2016-04-01

    Big data is the new hype in business and healthcare. Data storage and processing has become cheap, fast, and easy. Business analysts and scientists are trying to design methods to mine these data for hidden knowledge. Neurocritical care is a field that typically produces large amounts of patient-related data, and these data are increasingly being digitized and stored. This review will try to look beyond the hype, and focus on possible applications in neurointensive care amenable to Big Data research that can potentially improve patient care. The first challenge in Big Data research will be the development of large, multicenter, and high-quality databases. These databases could be used to further investigate recent findings from mathematical models, developed in smaller datasets. Randomized clinical trials and Big Data research are complementary. Big Data research might be used to identify subgroups of patients that could benefit most from a certain intervention, or can be an alternative in areas where randomized clinical trials are not possible. The processing and the analysis of the large amount of patient-related information stored in clinical databases is beyond normal human cognitive ability. Big Data research applications have the potential to discover new medical knowledge, and improve care in the neurointensive care unit.

  11. Challenges and Potential Solutions for Big Data Implementations in Developing Countries

    PubMed Central

    Mayan, J.C; García, M.J.; Almerares, A.A.; Househ, M.

    2014-01-01

    Summary Background The volume of data, the velocity with which they are generated, and their variety and lack of structure hinder their use. This creates the need to change the way information is captured, stored, processed, and analyzed, leading to the paradigm shift called Big Data. Objectives To describe the challenges and possible solutions for developing countries when implementing Big Data projects in the health sector. Methods A non-systematic review of the literature was performed in PubMed and Google Scholar. The following keywords were used: “big data”, “developing countries”, “data mining”, “health information systems”, and “computing methodologies”. A thematic review of selected articles was performed. Results There are challenges when implementing any Big Data program including exponential growth of data, special infrastructure needs, need for a trained workforce, need to agree on interoperability standards, privacy and security issues, and the need to include people, processes, and policies to ensure their adoption. Developing countries have particular characteristics that hinder further development of these projects. Conclusions The advent of Big Data promises great opportunities for the healthcare field. In this article, we attempt to describe the challenges developing countries would face and enumerate the options to be used to achieve successful implementations of Big Data programs. PMID:25123719

  12. The Ethics of Big Data and Nursing Science.

    PubMed

    Milton, Constance L

    2017-10-01

    Big data is a scientific, social, and technological trend referring to the process and size of datasets available for analysis. Ethical implications arise as healthcare disciplines, including nursing, struggle over questions of informed consent, privacy, ownership of data, and its possible use in epistemology. The author offers straight-thinking possibilities for the use of big data in nursing science.

  13. Big Data's Call to Philosophers of Education

    ERIC Educational Resources Information Center

    Blanken-Webb, Jane

    2017-01-01

    This paper investigates the intersection of big data and philosophy of education by considering big data's potential for addressing learning via a holistic process of coming-to-know. Learning, in this sense, cannot be reduced to the difference between a pre- and post-test, for example, as it is constituted at least as much by qualities of…

  14. Big Books for Little Readers: Works in the ESL Classroom Too.

    ERIC Educational Resources Information Center

    Nambiar, Mohana K.

    Big books, magnified or enlarged versions of children's books, are recommended for use in the English-as-a-Second-Language (ESL) classroom. The big book approach is based on the idea that shared reading and enlarged texts support joint adult-child participation in the reading process and emphasizes reading for meaning and enjoyment rather than…

  15. Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data.

    PubMed

    Dinov, Ivo D

    2016-01-01

    Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be 'team science'.

  16. Big data - smart health strategies. Findings from the yearbook 2014 special theme.

    PubMed

    Koutkias, V; Thiessard, F

    2014-08-15

    To select best papers published in 2013 in the field of big data and smart health strategies, and summarize outstanding research efforts. A systematic search was performed using two major bibliographic databases for relevant journal papers. The references obtained were reviewed in a two-stage process, starting with a blinded review performed by the two section editors, and followed by a peer review process operated by external reviewers recognized as experts in the field. The complete review process selected four best papers, illustrating various aspects of the special theme, among them: (a) using large volumes of unstructured data and, specifically, clinical notes from Electronic Health Records (EHRs) for pharmacovigilance; (b) knowledge discovery via querying large volumes of complex (both structured and unstructured) biological data using big data technologies and relevant tools; (c) methodologies for applying cloud computing and big data technologies in the field of genomics, and (d) system architectures enabling high-performance access to and processing of large datasets extracted from EHRs. The potential of big data in biomedicine has been pinpointed in various viewpoint papers and editorials. The review of current scientific literature illustrated a variety of interesting methods and applications in the field, but still the promises exceed the current outcomes. As we are getting closer towards a solid foundation with respect to common understanding of relevant concepts and technical aspects, and the use of standardized technologies and tools, we can anticipate to reach the potential that big data offer for personalized medicine and smart health strategies in the near future.

  17. Big Data - Smart Health Strategies

    PubMed Central

    2014-01-01

    Summary Objectives To select best papers published in 2013 in the field of big data and smart health strategies, and summarize outstanding research efforts. Methods A systematic search was performed using two major bibliographic databases for relevant journal papers. The references obtained were reviewed in a two-stage process, starting with a blinded review performed by the two section editors, and followed by a peer review process operated by external reviewers recognized as experts in the field. Results The complete review process selected four best papers, illustrating various aspects of the special theme, among them: (a) using large volumes of unstructured data and, specifically, clinical notes from Electronic Health Records (EHRs) for pharmacovigilance; (b) knowledge discovery via querying large volumes of complex (both structured and unstructured) biological data using big data technologies and relevant tools; (c) methodologies for applying cloud computing and big data technologies in the field of genomics, and (d) system architectures enabling high-performance access to and processing of large datasets extracted from EHRs. Conclusions The potential of big data in biomedicine has been pinpointed in various viewpoint papers and editorials. The review of current scientific literature illustrated a variety of interesting methods and applications in the field, but still the promises exceed the current outcomes. As we are getting closer towards a solid foundation with respect to common understanding of relevant concepts and technical aspects, and the use of standardized technologies and tools, we can anticipate to reach the potential that big data offer for personalized medicine and smart health strategies in the near future. PMID:25123721

  18. Impacts of fire on hydrology and erosion in steep mountain big sagebrush communities

    Treesearch

    Frederick B. Pierson; Peter R. Robichaud; Kenneth E. Spaeth; Corey A. Moffet

    2003-01-01

    Wildfire is an important ecological process and management issue on western rangelands. Major unknowns associated with wildfire are its affects on vegetation and soil conditions that influence hydrologic processes including infiltration, surface runoff, erosion, sediment transport, and flooding. Post wildfire hydrologic response was studied in big sagebrush plant...

  19. Tunable and Reconfigurable Optical Negative-Index Materials with Low Losses

    DTIC Science & Technology

    2012-01-21

    to study metric signature transitions and the cosmological “Big Bang”. • A theory for basic nonlinear optical processes in NIMs and in double...h-MMs) can be used to study metric signature transitions and the cosmological “Big Bang”. • A theory for basic nonlinear optical processes in NIMs

  20. Big Data in Plant Science: Resources and Data Mining Tools for Plant Genomics and Proteomics.

    PubMed

    Popescu, George V; Noutsos, Christos; Popescu, Sorina C

    2016-01-01

    In modern plant biology, progress is increasingly defined by the scientists' ability to gather and analyze data sets of high volume and complexity, otherwise known as "big data". Arguably, the largest increase in the volume of plant data sets over the last decade is a consequence of the application of the next-generation sequencing and mass-spectrometry technologies to the study of experimental model and crop plants. The increase in quantity and complexity of biological data brings challenges, mostly associated with data acquisition, processing, and sharing within the scientific community. Nonetheless, big data in plant science create unique opportunities in advancing our understanding of complex biological processes at a level of accuracy without precedence, and establish a base for the plant systems biology. In this chapter, we summarize the major drivers of big data in plant science and big data initiatives in life sciences with a focus on the scope and impact of iPlant, a representative cyberinfrastructure platform for plant science.

  1. From ecological records to big data: the invention of global biodiversity.

    PubMed

    Devictor, Vincent; Bensaude-Vincent, Bernadette

    2016-12-01

    This paper is a critical assessment of the epistemological impact of the systematic quantification of nature with the accumulation of big datasets on the practice and orientation of ecological science. We examine the contents of big databases and argue that it is not just accumulated information; records are translated into digital data in a process that changes their meanings. In order to better understand what is at stake in the 'datafication' process, we explore the context for the emergence and quantification of biodiversity in the 1980s, along with the concept of the global environment. In tracing the origin and development of the global biodiversity information facility (GBIF) we describe big data biodiversity projects as a techno-political construction dedicated to monitoring a new object: the global diversity. We argue that, biodiversity big data became a powerful driver behind the invention of the concept of the global environment, and a way to embed ecological science in the political agenda.

  2. Information Retrieval Using Hadoop Big Data Analysis

    NASA Astrophysics Data System (ADS)

    Motwani, Deepak; Madan, Madan Lal

    This paper concern on big data analysis which is the cognitive operation of probing huge amounts of information in an attempt to get uncovers unseen patterns. Through Big Data Analytics Applications such as public and private organization sectors have formed a strategic determination to turn big data into cut throat benefit. The primary occupation of extracting value from big data give rise to a process applied to pull information from multiple different sources; this process is known as extract transforms and lode. This paper approach extract information from log files and Research Paper, awareness reduces the efforts for blueprint finding and summarization of document from several positions. The work is able to understand better Hadoop basic concept and increase the user experience for research. In this paper, we propose an approach for analysis log files for finding concise information which is useful and time saving by using Hadoop. Our proposed approach will be applied on different research papers on a specific domain and applied for getting summarized content for further improvement and make the new content.

  3. Geohydrology of Big Bear Valley, California: phase 1--geologic framework, recharge, and preliminary assessment of the source and age of groundwater

    USGS Publications Warehouse

    Flint, Lorraine E.; Brandt, Justin; Christensen, Allen H.; Flint, Alan L.; Hevesi, Joseph A.; Jachens, Robert; Kulongoski, Justin T.; Martin, Peter; Sneed, Michelle

    2012-01-01

    The Big Bear Valley, located in the San Bernardino Mountains of southern California, has increased in population in recent years. Most of the water supply for the area is pumped from the alluvial deposits that form the Big Bear Valley groundwater basin. This study was conducted to better understand the thickness and structure of the groundwater basin in order to estimate the quantity and distribution of natural recharge to Big Bear Valley. A gravity survey was used to estimate the thickness of the alluvial deposits that form the Big Bear Valley groundwater basin. This determined that the alluvial deposits reach a maximum thickness of 1,500 to 2,000 feet beneath the center of Big Bear Lake and the area between Big Bear and Baldwin Lakes, and decrease to less than 500 feet thick beneath the eastern end of Big Bear Lake. Interferometric Synthetic Aperture Radar (InSAR) was used to measure pumping-induced land subsidence and to locate structures, such as faults, that could affect groundwater movement. The measurements indicated small amounts of land deformation (uplift and subsidence) in the area between Big Bear Lake and Baldwin Lake, the area near the city of Big Bear Lake, and the area near Sugarloaf, California. Both the gravity and InSAR measurements indicated the possible presence of subsurface faults in subbasins between Big Bear and Baldwin Lakes, but additional data are required for confirmation. The distribution and quantity of groundwater recharge in the area were evaluated by using a regional water-balance model (Basin Characterization Model, or BCM) and a daily rainfall-runoff model (INFILv3). The BCM calculated spatially distributed potential recharge in the study area of approximately 12,700 acre-feet per year (acre-ft/yr) of potential in-place recharge and 30,800 acre-ft/yr of potential runoff. Using the assumption that only 10 percent of the runoff becomes recharge, this approach indicated there is approximately 15,800 acre-ft/yr of total recharge in Big Bear Valley. The INFILv3 model was modified for this study to include a perched zone beneath the root zone to better simulate lateral seepage and recharge in the shallow subsurface in mountainous terrain. The climate input used in the INFILv3 model was developed by using daily climate data from 84 National Climatic Data Center stations and published Parameter Regression on Independent Slopes Model (PRISM) average monthly precipitation maps to match the drier average monthly precipitation measured in the Baldwin Lake drainage basin. This model resulted in a good representation of localized rain-shadow effects and calibrated well to measured lake volumes at Big Bear and Baldwin Lakes. The simulated average annual recharge was about 5,480 acre-ft/yr in the Big Bear study area, with about 2,800 acre-ft/yr in the Big Bear Lake surface-water drainage basin and about 2,680 acre-ft/yr in the Baldwin Lake surface-water drainage basin. One spring and eight wells were sampled and analyzed for chemical and isotopic data in 2005 and 2006 to determine if isotopic techniques could be used to assess the sources and ages of groundwater in the Big Bear Valley. This approach showed that the predominant source of recharge to the Big Bear Valley is winter precipitation falling on the surrounding mountains. The tritium and uncorrected carbon-14 ages of samples collected from wells for this study indicated that the groundwater basin contains water of different ages, ranging from modern to about 17,200-years old.The results of these investigations provide an understanding of the lateral and vertical extent of the groundwater basin, the spatial distribution of groundwater recharge, the processes responsible for the recharge, and the source and age of groundwater in the groundwater basin. Although the studies do not provide an understanding of the detailed water-bearing properties necessary to determine the groundwater availability of the basin, they do provide a framework for the future development of a groundwater model that would help to improve the understanding of the potential hydrologic effects of water-management alternatives in Big Bear Valley.

  4. New developments in understanding the r-process from observations of metal-poor stars

    NASA Astrophysics Data System (ADS)

    Frebel, Anna

    2015-04-01

    In their atmospheres, old metal-poor Galactic stars retain detailed information about the chemical composition of the interstellar medium at the time of their birth. Extracting such stellar abundances enables us to reconstruct the beginning of the chemical evolution shortly after the Big Bang. About 5% of metal-poor stars with [Fe/H] < - 2 . 5 display in their spectrum a strong enhancement of neutron-capture elements associated with the rapid (r-) nucleosynthesis process that is responsible for the production of the heaviest elements in the Universe. This fortuity provides a unique opportunity of bringing together astrophysics and nuclear physics because these objects act as ``cosmic lab'' for both fields of study. The so-called r-process stars are thought to have formed from material enriched in heavy neutron-capture elements that were created during an r-process event in a previous generation supernova. It appears that the few stars known with this rare chemical signature all follow the scaled solar r-process pattern (for the heaviest elements with 56 <= Z <= 90 that is). This suggests that the r-process is universal - a surprising empirical finding and a solid result that can not be obtained from any laboratory on earth. While much research has been devoted to establishing this pattern, little attention has been given to the overall level of enhancement. New results will be presented on the full extent of r-process element enrichment as observed in metal-poor stars. The challenge lies in determining how the r-process material in the earliest gas clouds was mixed and diluted. Assuming individual r-process events to have contributed the observed r-process elements. We provide empirical estimates on the amount of r-process material produced. This should become a crucial constraint for theoretical nuclear physics models of heavy element nucleosynthesis.

  5. Ultrafast graphene and carbon nanotube film patterning by picosecond laser pulses

    NASA Astrophysics Data System (ADS)

    Bobrinetskiy, Ivan I.; Emelianov, Alexey V.; Otero, Nerea; Romero, Pablo M.

    2016-03-01

    Carbon nanomaterials is among the most promising technologies for advanced electronic applications, due to their extraordinary chemical and physical properties. Nonetheless, after more than two decades of intensive research, the application of carbon-based nanostructures in real electronic and optoelectronic devices is still a big challenge due to lack of scalable integration in microelectronic manufacturing. Laser processing is an attractive tool for graphene device manufacturing, providing a large variety of processes through direct and indirect interaction of laser beams with graphene lattice: functionalization, oxidation, reduction, etching and ablation, growth, etc. with resolution down to the nanoscale. Focused laser radiation allows freeform processing, enabling fully mask-less fabrication of devices from graphene and carbon nanotube films. This concept is attractive to reduce costs, improve flexibility, and reduce alignment operations, by producing fully functional devices in single direct-write operations. In this paper, a picosecond laser with a wavelength of 515 nm and pulse width of 30 ps is used to pattern carbon nanostructures in two ways: ablation and chemical functionalization. The light absorption leads to thermal ablation of graphene and carbon nanotube film under the fluence 60-90 J/cm2 with scanning speed up to 2 m/s. Just under the ablation energy, the two-photon absorption leads to add functional groups to the carbon lattice which change the optical properties of graphene. This paper shows the results of controlled modification of geometrical configuration and the physical and chemical properties of carbon based nanostructures, by laser direct writing.

  6. Sampling Operations on Big Data

    DTIC Science & Technology

    2015-11-29

    gories. These include edge sampling methods where edges are selected by a predetermined criteria; snowball sampling methods where algorithms start... Sampling Operations on Big Data Vijay Gadepally, Taylor Herr, Luke Johnson, Lauren Milechin, Maja Milosavljevic, Benjamin A. Miller Lincoln...process and disseminate information for discovery and exploration under real-time constraints. Common signal processing operations such as sampling and

  7. The importance of tree size and fecundity for wind dispersal of big-leaf mahogany

    Treesearch

    Julian M. Norghauer; Charles A. Nock; James Grogan

    2011-01-01

    Seed dispersal by wind is a critical yet poorly understood process in tropical forest trees. How tree size and fecundity affect this process at the population level remains largely unknown because of insufficient replication across adults. We measured seed dispersal by the endangered neotropical timber species big-leaf mahogany (Swietenia macrophylla King, Meliaceae)...

  8. Concurrence of big data analytics and healthcare: A systematic review.

    PubMed

    Mehta, Nishita; Pandit, Anil

    2018-06-01

    The application of Big Data analytics in healthcare has immense potential for improving the quality of care, reducing waste and error, and reducing the cost of care. This systematic review of literature aims to determine the scope of Big Data analytics in healthcare including its applications and challenges in its adoption in healthcare. It also intends to identify the strategies to overcome the challenges. A systematic search of the articles was carried out on five major scientific databases: ScienceDirect, PubMed, Emerald, IEEE Xplore and Taylor & Francis. The articles on Big Data analytics in healthcare published in English language literature from January 2013 to January 2018 were considered. Descriptive articles and usability studies of Big Data analytics in healthcare and medicine were selected. Two reviewers independently extracted information on definitions of Big Data analytics; sources and applications of Big Data analytics in healthcare; challenges and strategies to overcome the challenges in healthcare. A total of 58 articles were selected as per the inclusion criteria and analyzed. The analyses of these articles found that: (1) researchers lack consensus about the operational definition of Big Data in healthcare; (2) Big Data in healthcare comes from the internal sources within the hospitals or clinics as well external sources including government, laboratories, pharma companies, data aggregators, medical journals etc.; (3) natural language processing (NLP) is most widely used Big Data analytical technique for healthcare and most of the processing tools used for analytics are based on Hadoop; (4) Big Data analytics finds its application for clinical decision support; optimization of clinical operations and reduction of cost of care (5) major challenge in adoption of Big Data analytics is non-availability of evidence of its practical benefits in healthcare. This review study unveils that there is a paucity of information on evidence of real-world use of Big Data analytics in healthcare. This is because, the usability studies have considered only qualitative approach which describes potential benefits but does not take into account the quantitative study. Also, majority of the studies were from developed countries which brings out the need for promotion of research on Healthcare Big Data analytics in developing countries. Copyright © 2018 Elsevier B.V. All rights reserved.

  9. bigSCale: an analytical framework for big-scale single-cell data.

    PubMed

    Iacono, Giovanni; Mereu, Elisabetta; Guillaumet-Adkins, Amy; Corominas, Roser; Cuscó, Ivon; Rodríguez-Esteban, Gustavo; Gut, Marta; Pérez-Jurado, Luis Alberto; Gut, Ivo; Heyn, Holger

    2018-06-01

    Single-cell RNA sequencing (scRNA-seq) has significantly deepened our insights into complex tissues, with the latest techniques capable of processing tens of thousands of cells simultaneously. Analyzing increasing numbers of cells, however, generates extremely large data sets, extending processing time and challenging computing resources. Current scRNA-seq analysis tools are not designed to interrogate large data sets and often lack sensitivity to identify marker genes. With bigSCale, we provide a scalable analytical framework to analyze millions of cells, which addresses the challenges associated with large data sets. To handle the noise and sparsity of scRNA-seq data, bigSCale uses large sample sizes to estimate an accurate numerical model of noise. The framework further includes modules for differential expression analysis, cell clustering, and marker identification. A directed convolution strategy allows processing of extremely large data sets, while preserving transcript information from individual cells. We evaluated the performance of bigSCale using both a biological model of aberrant gene expression in patient-derived neuronal progenitor cells and simulated data sets, which underlines the speed and accuracy in differential expression analysis. To test its applicability for large data sets, we applied bigSCale to assess 1.3 million cells from the mouse developing forebrain. Its directed down-sampling strategy accumulates information from single cells into index cell transcriptomes, thereby defining cellular clusters with improved resolution. Accordingly, index cell clusters identified rare populations, such as reelin ( Reln )-positive Cajal-Retzius neurons, for which we report previously unrecognized heterogeneity associated with distinct differentiation stages, spatial organization, and cellular function. Together, bigSCale presents a solution to address future challenges of large single-cell data sets. © 2018 Iacono et al.; Published by Cold Spring Harbor Laboratory Press.

  10. 'Big data', Hadoop and cloud computing in genomics.

    PubMed

    O'Driscoll, Aisling; Daugelaite, Jurate; Sleator, Roy D

    2013-10-01

    Since the completion of the Human Genome project at the turn of the Century, there has been an unprecedented proliferation of genomic sequence data. A consequence of this is that the medical discoveries of the future will largely depend on our ability to process and analyse large genomic data sets, which continue to expand as the cost of sequencing decreases. Herein, we provide an overview of cloud computing and big data technologies, and discuss how such expertise can be used to deal with biology's big data sets. In particular, big data technologies such as the Apache Hadoop project, which provides distributed and parallelised data processing and analysis of petabyte (PB) scale data sets will be discussed, together with an overview of the current usage of Hadoop within the bioinformatics community. Copyright © 2013 Elsevier Inc. All rights reserved.

  11. Natural regeneration processes in big sagebrush (Artemisia tridentata)

    USGS Publications Warehouse

    Schlaepfer, Daniel R.; Lauenroth, William K.; Bradford, John B.

    2014-01-01

    Big sagebrush, Artemisia tridentata Nuttall (Asteraceae), is the dominant plant species of large portions of semiarid western North America. However, much of historical big sagebrush vegetation has been removed or modified. Thus, regeneration is recognized as an important component for land management. Limited knowledge about key regeneration processes, however, represents an obstacle to identifying successful management practices and to gaining greater insight into the consequences of increasing disturbance frequency and global change. Therefore, our objective is to synthesize knowledge about natural big sagebrush regeneration. We identified and characterized the controls of big sagebrush seed production, germination, and establishment. The largest knowledge gaps and associated research needs include quiescence and dormancy of embryos and seedlings; variation in seed production and germination percentages; wet-thermal time model of germination; responses to frost events (including freezing/thawing of soils), CO2 concentration, and nutrients in combination with water availability; suitability of microsite vs. site conditions; competitive ability as well as seedling growth responses; and differences among subspecies and ecoregions. Potential impacts of climate change on big sagebrush regeneration could include that temperature increases may not have a large direct influence on regeneration due to the broad temperature optimum for regeneration, whereas indirect effects could include selection for populations with less stringent seed dormancy. Drier conditions will have direct negative effects on germination and seedling survival and could also lead to lighter seeds, which lowers germination success further. The short seed dispersal distance of big sagebrush may limit its tracking of suitable climate; whereas, the low competitive ability of big sagebrush seedlings may limit successful competition with species that track climate. An improved understanding of the ecology of big sagebrush regeneration should benefit resource management activities and increase the ability of land managers to anticipate global change impacts.

  12. Differential Privacy Preserving in Big Data Analytics for Connected Health.

    PubMed

    Lin, Chi; Song, Zihao; Song, Houbing; Zhou, Yanhong; Wang, Yi; Wu, Guowei

    2016-04-01

    In Body Area Networks (BANs), big data collected by wearable sensors usually contain sensitive information, which is compulsory to be appropriately protected. Previous methods neglected privacy protection issue, leading to privacy exposure. In this paper, a differential privacy protection scheme for big data in body sensor network is developed. Compared with previous methods, this scheme will provide privacy protection with higher availability and reliability. We introduce the concept of dynamic noise thresholds, which makes our scheme more suitable to process big data. Experimental results demonstrate that, even when the attacker has full background knowledge, the proposed scheme can still provide enough interference to big sensitive data so as to preserve the privacy.

  13. Differential School Contextual Effects for Math and English: Integrating the Big-Fish-Little-Pond Effect and the Internal/External Frame of Reference

    ERIC Educational Resources Information Center

    Parker, Philip D.; Marsh, Herbert W.; Ludtke, Oliver; Trautwein, Ulrich

    2013-01-01

    The internal/external frame of reference and the big-fish-little-pond effect are two major models of academic self-concept formation which have considerable theoretical and empirical support. Integrating the domain specific and compensatory processes of the internal/external frame of reference model with the big-fish-little-pond effect suggests a…

  14. A Scalable, Open Source Platform for Data Processing, Archiving and Dissemination

    DTIC Science & Technology

    2016-01-01

    Object Oriented Data Technology (OODT) big data toolkit developed by NASA and the Work-flow INstance Generation and Selection (WINGS) scientific work...to several challenge big data problems and demonstrated the utility of OODT-WINGS in addressing them. Specific demonstrated analyses address i...source software, Apache, Object Oriented Data Technology, OODT, semantic work-flows, WINGS, big data , work- flow management 16. SECURITY CLASSIFICATION OF

  15. BIG: a calossin-like protein required for polar auxin transport in Arabidopsis

    PubMed Central

    Gil, Pedro; Dewey, Elizabeth; Friml, Jiri; Zhao, Yunde; Snowden, Kimberley C.; Putterill, Jo; Palme, Klaus; Estelle, Mark; Chory, Joanne

    2001-01-01

    Polar auxin transport is crucial for the regulation of auxin action and required for some light-regulated responses during plant development. We have found that two mutants of Arabidopsis—doc1, which displays altered expression of light-regulated genes, and tir3, known for its reduced auxin transport—have similar defects and define mutations in a single gene that we have renamed BIG. BIG is very similar to the Drosophila gene Calossin/Pushover, a member of a gene family also present in Caenorhabditis elegans and human genomes. The protein encoded by BIG is extraordinary in size, 560 kD, and contains several putative Zn-finger domains. Expression-profiling experiments indicate that altered expression of multiple light-regulated genes in doc1 mutants can be suppressed by elevated levels of auxin caused by overexpression of an auxin biosynthetic gene, suggesting that normal auxin distribution is required to maintain low-level expression of these genes in the dark. Double mutants of tir3 with the auxin mutants pin1, pid, and axr1 display severe defects in auxin-dependent growth of the inflorescence. Chemical inhibitors of auxin transport change the intracellular localization of the auxin efflux carrier PIN1 in doc1/tir3 mutants, supporting the idea that BIG is required for normal auxin efflux. PMID:11485992

  16. News | Frederick National Laboratory for Cancer Research

    Cancer.gov

    Consortium aims to accelerate drug discovery process(Physics Today) Why big pharma and biotech are betting big on AI(NBC News) Scientists launch SF-based effort to dramatically cut cancer drug discovery time(SF Chronicle

  17. Analyzing big data with the hybrid interval regression methods.

    PubMed

    Huang, Chia-Hui; Yang, Keng-Chieh; Kao, Han-Ying

    2014-01-01

    Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes.

  18. Analyzing Big Data with the Hybrid Interval Regression Methods

    PubMed Central

    Kao, Han-Ying

    2014-01-01

    Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes. PMID:25143968

  19. Exascale computing and big data

    DOE PAGES

    Reed, Daniel A.; Dongarra, Jack

    2015-06-25

    Scientific discovery and engineering innovation requires unifying traditionally separated high-performance computing and big data analytics. The tools and cultures of high-performance computing and big data analytics have diverged, to the detriment of both; unification is essential to address a spectrum of major research domains. The challenges of scale tax our ability to transmit data, compute complicated functions on that data, or store a substantial part of it; new approaches are required to meet these challenges. Finally, the international nature of science demands further development of advanced computer architectures and global standards for processing data, even as international competition complicates themore » openness of the scientific process.« less

  20. Intelligent Control of Micro Grid: A Big Data-Based Control Center

    NASA Astrophysics Data System (ADS)

    Liu, Lu; Wang, Yanping; Liu, Li; Wang, Zhiseng

    2018-01-01

    In this paper, a structure of micro grid system with big data-based control center is introduced. Energy data from distributed generation, storage and load are analized through the control center, and from the results new trends will be predicted and applied as a feedback to optimize the control. Therefore, each step proceeded in micro grid can be adjusted and orgnized in a form of comprehensive management. A framework of real-time data collection, data processing and data analysis will be proposed by employing big data technology. Consequently, a integrated distributed generation and a optimized energy storage and transmission process can be implemented in the micro grid system.

  1. Exascale computing and big data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reed, Daniel A.; Dongarra, Jack

    Scientific discovery and engineering innovation requires unifying traditionally separated high-performance computing and big data analytics. The tools and cultures of high-performance computing and big data analytics have diverged, to the detriment of both; unification is essential to address a spectrum of major research domains. The challenges of scale tax our ability to transmit data, compute complicated functions on that data, or store a substantial part of it; new approaches are required to meet these challenges. Finally, the international nature of science demands further development of advanced computer architectures and global standards for processing data, even as international competition complicates themore » openness of the scientific process.« less

  2. Vertical Structure of The Polluted Low Troposphere During Escompte 2001.

    NASA Astrophysics Data System (ADS)

    Saïd, F.; Escompte Team

    ESCOMPTE 2001 is a field experiment that took place in the south-east of France, from June 11th to July 13th, with the aim of understanding chemical constitu- ants transformation and transport and to improve numerical models devoted to pol- lution study and forecasting. Information about the experiment can be found on http://medias.obs-mip.fr/escompte. The studied area was roughly 120x120 km includ- ing a big town, Marseille and a petroleum complex around the Fos-Berre pond. Various experimental means such as radiosounding, UHF and VHF radars, lidars and aircraft were involved in order to study the 3D distribution of chemical species in relationship with the dynamical processes. The vertical distribution of horizontal wind, ozone, aerosols and water vapor content revealed several cases with complex stratification. This stratification could also be detected on the lidars extinction coefficients or on the radars reflectivity. The superposed layers extended on large areas and were steady. The aim is to try to understand how these staggered layers have been formed and whether they interfere with the mixed layer. If ever they did, they could play a major part in the pollution process.

  3. Introduction to Big Bang nucleosynthesis - Open and closed models, anisotropies

    NASA Astrophysics Data System (ADS)

    Tayler, R. J.

    1982-10-01

    A variety of observations suggest that the universe had a hot dense origin and that the pregalactic composition of the universe was determined by nuclear reactions that occurred in the first few minutes. There is no unique hot Big Bang theory, but the simplest version produces a primeval chemical composition that is in good qualitative agreement with the abundances deduced from observation. Whether or not any Big Bang theory will provide quantitative agreement with observations depends on a variety of factors in elementary particle physics (number and masses of stable or long-lived particles, half-life of neutron, structure of grand unified theories) and from observational astronomy (present mean baryon density of the universe, the Hubble constant and deceleration parameter). The influence of these factors on the abundances is discussed, as is the effect of departures from homogeneity and isotropy in the early universe.

  4. SEAS (Surveillance Environmental Acoustic Support Program) Support

    DTIC Science & Technology

    1984-02-29

    ASEPS software - Provide support for AMES - Support for OUTPOST CREOLE, BIG DIPPER and MFA , First, a summary of the tasks as delineated in the contract...addition, the contractor will provide an engineer/scientist to support the BIG DIPPER data processing activities at NOSC. Task 3: SEAS Inventory - The...SI to provide support to SEAS for the OUTPOST -’ CREOLE III exercise which followed immediately after the BIG DIPPER .. exercise. OUTPOST CREOLE III

  5. Too Big for the Sieve

    NASA Image and Video Library

    2012-10-11

    In this image, the scoop on NASA Curiosity rover shows the larger soil particles that were too big to filter through a sample-processing sieve that is porous only to particles less than 0.006 inches 150 microns across.

  6. The caBIG Terminology Review Process

    PubMed Central

    Cimino, James J.; Hayamizu, Terry F.; Bodenreider, Olivier; Davis, Brian; Stafford, Grace A.; Ringwald, Martin

    2009-01-01

    The National Cancer Institute (NCI) is developing an integrated biomedical informatics infrastructure, the cancer Biomedical Informatics Grid (caBIG®), to support collaboration within the cancer research community. A key part of the caBIG architecture is the establishment of terminology standards for representing data. In order to evaluate the suitability of existing controlled terminologies, the caBIG Vocabulary and Data Elements Workspace (VCDE WS) working group has developed a set of criteria that serve to assess a terminology's structure, content, documentation, and editorial process. This paper describes the evolution of these criteria and the results of their use in evaluating four standard terminologies: the Gene Ontology (GO), the NCI Thesaurus (NCIt), the Common Terminology for Adverse Events (known as CTCAE), and the laboratory portion of the Logical Objects, Identifiers, Names and Codes (LOINC). The resulting caBIG criteria are presented as a matrix that may be applicable to any terminology standardization effort. PMID:19154797

  7. CHIPMUNK: A Virtual Synthesizable Small-Molecule Library for Medicinal Chemistry, Exploitable for Protein-Protein Interaction Modulators.

    PubMed

    Humbeck, Lina; Weigang, Sebastian; Schäfer, Till; Mutzel, Petra; Koch, Oliver

    2018-03-20

    A common issue during drug design and development is the discovery of novel scaffolds for protein targets. On the one hand the chemical space of purchasable compounds is rather limited; on the other hand artificially generated molecules suffer from a grave lack of accessibility in practice. Therefore, we generated a novel virtual library of small molecules which are synthesizable from purchasable educts, called CHIPMUNK (CHemically feasible In silico Public Molecular UNiverse Knowledge base). Altogether, CHIPMUNK covers over 95 million compounds and encompasses regions of the chemical space that are not covered by existing databases. The coverage of CHIPMUNK exceeds the chemical space spanned by the Lipinski rule of five to foster the exploration of novel and difficult target classes. The analysis of the generated property space reveals that CHIPMUNK is well suited for the design of protein-protein interaction inhibitors (PPIIs). Furthermore, a recently developed structural clustering algorithm (StruClus) for big data was used to partition the sub-libraries into meaningful subsets and assist scientists to process the large amount of data. These clustered subsets also contain the target space based on ChEMBL data which was included during clustering. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. FUn: a framework for interactive visualizations of large, high-dimensional datasets on the web.

    PubMed

    Probst, Daniel; Reymond, Jean-Louis

    2018-04-15

    During the past decade, big data have become a major tool in scientific endeavors. Although statistical methods and algorithms are well-suited for analyzing and summarizing enormous amounts of data, the results do not allow for a visual inspection of the entire data. Current scientific software, including R packages and Python libraries such as ggplot2, matplotlib and plot.ly, do not support interactive visualizations of datasets exceeding 100 000 data points on the web. Other solutions enable the web-based visualization of big data only through data reduction or statistical representations. However, recent hardware developments, especially advancements in graphical processing units, allow for the rendering of millions of data points on a wide range of consumer hardware such as laptops, tablets and mobile phones. Similar to the challenges and opportunities brought to virtually every scientific field by big data, both the visualization of and interaction with copious amounts of data are both demanding and hold great promise. Here we present FUn, a framework consisting of a client (Faerun) and server (Underdark) module, facilitating the creation of web-based, interactive 3D visualizations of large datasets, enabling record level visual inspection. We also introduce a reference implementation providing access to SureChEMBL, a database containing patent information on more than 17 million chemical compounds. The source code and the most recent builds of Faerun and Underdark, Lore.js and the data preprocessing toolchain used in the reference implementation, are available on the project website (http://doc.gdb.tools/fun/). daniel.probst@dcb.unibe.ch or jean-louis.reymond@dcb.unibe.ch.

  9. Translating Big Data into Smart Data for Veterinary Epidemiology.

    PubMed

    VanderWaal, Kimberly; Morrison, Robert B; Neuhauser, Claudia; Vilalta, Carles; Perez, Andres M

    2017-01-01

    The increasing availability and complexity of data has led to new opportunities and challenges in veterinary epidemiology around how to translate abundant, diverse, and rapidly growing "big" data into meaningful insights for animal health. Big data analytics are used to understand health risks and minimize the impact of adverse animal health issues through identifying high-risk populations, combining data or processes acting at multiple scales through epidemiological modeling approaches, and harnessing high velocity data to monitor animal health trends and detect emerging health threats. The advent of big data requires the incorporation of new skills into veterinary epidemiology training, including, for example, machine learning and coding, to prepare a new generation of scientists and practitioners to engage with big data. Establishing pipelines to analyze big data in near real-time is the next step for progressing from simply having "big data" to create "smart data," with the objective of improving understanding of health risks, effectiveness of management and policy decisions, and ultimately preventing or at least minimizing the impact of adverse animal health issues.

  10. Flexible Description and Adaptive Processing of Earth Observation Data through the BigEarth Platform

    NASA Astrophysics Data System (ADS)

    Gorgan, Dorian; Bacu, Victor; Stefanut, Teodor; Nandra, Cosmin; Mihon, Danut

    2016-04-01

    The Earth Observation data repositories extending periodically by several terabytes become a critical issue for organizations. The management of the storage capacity of such big datasets, accessing policy, data protection, searching, and complex processing require high costs that impose efficient solutions to balance the cost and value of data. Data can create value only when it is used, and the data protection has to be oriented toward allowing innovation that sometimes depends on creative people, which achieve unexpected valuable results through a flexible and adaptive manner. The users need to describe and experiment themselves different complex algorithms through analytics in order to valorize data. The analytics uses descriptive and predictive models to gain valuable knowledge and information from data analysis. Possible solutions for advanced processing of big Earth Observation data are given by the HPC platforms such as cloud. With platforms becoming more complex and heterogeneous, the developing of applications is even harder and the efficient mapping of these applications to a suitable and optimum platform, working on huge distributed data repositories, is challenging and complex as well, even by using specialized software services. From the user point of view, an optimum environment gives acceptable execution times, offers a high level of usability by hiding the complexity of computing infrastructure, and supports an open accessibility and control to application entities and functionality. The BigEarth platform [1] supports the entire flow of flexible description of processing by basic operators and adaptive execution over cloud infrastructure [2]. The basic modules of the pipeline such as the KEOPS [3] set of basic operators, the WorDeL language [4], the Planner for sequential and parallel processing, and the Executor through virtual machines, are detailed as the main components of the BigEarth platform [5]. The presentation exemplifies the development of some Earth Observation oriented applications based on flexible description of processing, and adaptive and portable execution over Cloud infrastructure. Main references for further information: [1] BigEarth project, http://cgis.utcluj.ro/projects/bigearth [2] Gorgan, D., "Flexible and Adaptive Processing of Earth Observation Data over High Performance Computation Architectures", International Conference and Exhibition Satellite 2015, August 17-19, Houston, Texas, USA. [3] Mihon, D., Bacu, V., Colceriu, V., Gorgan, D., "Modeling of Earth Observation Use Cases through the KEOPS System", Proceedings of the Intelligent Computer Communication and Processing (ICCP), IEEE-Press, pp. 455-460, (2015). [4] Nandra, C., Gorgan, D., "Workflow Description Language for Defining Big Earth Data Processing Tasks", Proceedings of the Intelligent Computer Communication and Processing (ICCP), IEEE-Press, pp. 461-468, (2015). [5] Bacu, V., Stefan, T., Gorgan, D., "Adaptive Processing of Earth Observation Data on Cloud Infrastructures Based on Workflow Description", Proceedings of the Intelligent Computer Communication and Processing (ICCP), IEEE-Press, pp.444-454, (2015).

  11. Effect of fungicide on Wyoming big sagebrush seed germination

    USDA-ARS?s Scientific Manuscript database

    Because fungal infection may complicate both the logistics and the interpretation of germination tests, seeds are sometimes treated with chemical fungicides. Fungicides may reduce the germination rate and/or germination percentage, and should be avoided unless fungal contamination is severe enough ...

  12. Multiscale modeling and simulation of embryogenesis for in silico predictive toxicology (WC9)

    EPA Science Inventory

    Translating big data from alternative and HTS platforms into hazard identification and risk assessment is an important need for predictive toxicology and for elucidating adverse outcome pathways (AOPs) in developmental toxicity. Understanding how chemical disruption of molecular ...

  13. BigDataScript: a scripting language for data pipelines.

    PubMed

    Cingolani, Pablo; Sladek, Rob; Blanchette, Mathieu

    2015-01-01

    The analysis of large biological datasets often requires complex processing pipelines that run for a long time on large computational infrastructures. We designed and implemented a simple script-like programming language with a clean and minimalist syntax to develop and manage pipeline execution and provide robustness to various types of software and hardware failures as well as portability. We introduce the BigDataScript (BDS) programming language for data processing pipelines, which improves abstraction from hardware resources and assists with robustness. Hardware abstraction allows BDS pipelines to run without modification on a wide range of computer architectures, from a small laptop to multi-core servers, server farms, clusters and clouds. BDS achieves robustness by incorporating the concepts of absolute serialization and lazy processing, thus allowing pipelines to recover from errors. By abstracting pipeline concepts at programming language level, BDS simplifies implementation, execution and management of complex bioinformatics pipelines, resulting in reduced development and debugging cycles as well as cleaner code. BigDataScript is available under open-source license at http://pcingola.github.io/BigDataScript. © The Author 2014. Published by Oxford University Press.

  14. Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework

    PubMed Central

    Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

    2015-01-01

    Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012

  15. BigDataScript: a scripting language for data pipelines

    PubMed Central

    Cingolani, Pablo; Sladek, Rob; Blanchette, Mathieu

    2015-01-01

    Motivation: The analysis of large biological datasets often requires complex processing pipelines that run for a long time on large computational infrastructures. We designed and implemented a simple script-like programming language with a clean and minimalist syntax to develop and manage pipeline execution and provide robustness to various types of software and hardware failures as well as portability. Results: We introduce the BigDataScript (BDS) programming language for data processing pipelines, which improves abstraction from hardware resources and assists with robustness. Hardware abstraction allows BDS pipelines to run without modification on a wide range of computer architectures, from a small laptop to multi-core servers, server farms, clusters and clouds. BDS achieves robustness by incorporating the concepts of absolute serialization and lazy processing, thus allowing pipelines to recover from errors. By abstracting pipeline concepts at programming language level, BDS simplifies implementation, execution and management of complex bioinformatics pipelines, resulting in reduced development and debugging cycles as well as cleaner code. Availability and implementation: BigDataScript is available under open-source license at http://pcingola.github.io/BigDataScript. Contact: pablo.e.cingolani@gmail.com PMID:25189778

  16. Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework.

    PubMed

    Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

    2015-01-01

    Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.

  17. Results of experiments related to contact of mine-spoils water with coal, West Decker and Big Sky Mines, southeastern Montana

    USGS Publications Warehouse

    Davis, R.E.; Dodge, K.A.

    1986-01-01

    Batch-mixing experiments using spoils water and coal from the West Decker and Big Sky Mines were conducted to determine possible chemical changes in water moving from coal-mine spoils through a coal aquifer. The spoils water was combined with air-dried and oven-dried chunks of coal and air-dried and oven-dried crushed coal at a 1:1 weight ratio, mixed for 2 hr, and separated after a total contact time of 24 hr. The dissolved-solids concentration in water used in the experiments decreased an average 210 mg/liter (5-10%). Other chemical changes included general decreases in the concentrations of magnesium, potassium, and bicarbonate, and general increases in the concentrations of barium and boron. The magnitude of the changes increased as the surface area of the coal increased. The quantity of extractable cations and exchangeable cations on the post-mixing coal was larger than on the pre-mixing coal. Equilibrium and mass-transfer relations indicate that adsorption reactions or ion-exchange and precipitation reactions, or both, probably are the major reactions responsible for the chemical changes observed in the experiments. (Authors ' abstract)

  18. Hydrologic conditions and distribution of selected radiochemical and chemical constituents in water, Snake River Plain aquifer, Idaho National Engineering Laboratory, Idaho, 1989 through 1991

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bartholomay, R.C.; Orr, B.R.; Liszewski, M.J.

    Radiochemical and chemical wastewater discharged since 1952 to infiltration ponds and disposal wells at the Idaho National Engineering Laboratory (INEL) has affected water quality in the Snake River Plain aquifer. The U.S. Geological Survey, in cooperation with the U.S. Department of Energy, maintains a continuous monitoring network at the INEL to determine hydrologic trends and to delineate the movement of radiochemical and chemical wastes in the aquifer. This report presents an analysis of water-level and water-quality data collected from the Snake River Plain aquifer during 1989-91. Water in the eastern Snake River Plain aquifer moves principally through fractures and interflowmore » zones in basalt, generally flows southwestward, and eventually discharges at springs along the Snake River. The aquifer is recharged principally from irrigation water, infiltration of streamflow, and ground-water inflow from adjoining mountain drainage basins. Water levels in wells throughout the INEL generally declined during 1989-91 due to drought. Detectable concentrations of radiochemical constituents in water samples from wells in the Snake River Plain aquifer at the INEL decreased or remained constant during 1989-91. Decreased concentrations are attributed to reduced rates of radioactive-waste disposal, sorption processes, radioactive decay, and changes in waste-disposal practices. Detectable concentrations of chemical constituents in water from the Snake River Plain aquifer at the INEL were variable during 1989-91. Sodium and chloride concentrations in the southern part of the INEL increased slightly during 1989-91 because of increased waste-disposal rates and a lack of recharge from the Big Lost River. Plumes of 1,1,1-trichloroethane have developed near the Idaho Chemical Processing Plant and the Radioactive Waste Management Complex as a result of waste disposal practices.« less

  19. A Home for Toad: Using Storytelling To Teach Big6 Skills.

    ERIC Educational Resources Information Center

    Jansen, Barbara

    1998-01-01

    Describes how to use storytelling in elementary education to teach the Big6 research process. Strategies for implementation are presented, including modifying a story, writing a story based on the curriculum connection, and using puppets. (LRW)

  20. Laboratory Astrophysics Prize: Laboratory Astrophysics with Nuclei

    NASA Astrophysics Data System (ADS)

    Wiescher, Michael

    2018-06-01

    Nuclear astrophysics is concerned with nuclear reaction and decay processes from the Big Bang to the present star generation controlling the chemical evolution of our universe. Such nuclear reactions maintain stellar life, determine stellar evolution, and finally drive stellar explosion in the circle of stellar life. Laboratory nuclear astrophysics seeks to simulate and understand the underlying processes using a broad portfolio of nuclear instrumentation, from reactor to accelerator from stable to radioactive beams to map the broad spectrum of nucleosynthesis processes. This talk focuses on only two aspects of the broad field, the need of deep underground accelerator facilities in cosmic ray free environments in order to understand the nucleosynthesis in stars, and the need for high intensity radioactive beam facilities to recreate the conditions found in stellar explosions. Both concepts represent the two main frontiers of the field, which are being pursued in the US with the CASPAR accelerator at the Sanford Underground Research Facility in South Dakota and the FRIB facility at Michigan State University.

  1. Big Data Analyses in Health and Opportunities for Research in Radiology.

    PubMed

    Aphinyanaphongs, Yindalon

    2017-02-01

    This article reviews examples of big data analyses in health care with a focus on radiology. We review the defining characteristics of big data, the use of natural language processing, traditional and novel data sources, and large clinical data repositories available for research. This article aims to invoke novel research ideas through a combination of examples of analyses and domain knowledge. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  2. Soil impact on the radial growth of Taxodium ( Taxodium distichum (L.) Rich.) in Serbia

    NASA Astrophysics Data System (ADS)

    Jokanovic, D.; Popovic, V.; Vilotic, D.; Mitrovic, S.; Brasanac-Bosanac, Lj.

    2012-04-01

    This work presents results of analyzed radial development, quantity of radial growth and soil factors in two plantations in Serbia. One of them is located close to Backa Palanka area, and the other is in Belgrade , area of a big war island. Both of them were established at the same type of soil. The research was conducted on 27 years old trees. For both of these locations there were a number of dates which were measured, such as physical-chemical characteristics of the soil and current radial growth that was measured among 20 % widest trees - those parameters were the most important. By comparison between values and form of a current radial growth it was concluded that Taxodium trees in a plantation near Backa Palanka have a culmination of a current radial growth a bit earlier and with some bigger values than those that originate from Belgrade - big war island. There was compared radial development and it was concluded that the trees from Backa Palanka reach bigger radial values than those from Belgrade - big war island at the same age as well. There were some differences between these locations based on physical-chemical analyze conducted on a soil, so the differences in a radial development, form and values of a current radial growth can be explained with a soil influence, and it will be proofed over the following period in this scientific research.

  3. Building and Characterizing Volcanic Landscapes with a Numerical Landscape Evolution Model and Spectral Techniques

    NASA Astrophysics Data System (ADS)

    Richardson, P. W.; Karlstrom, L.

    2016-12-01

    The competition between constructional volcanic processes such as lava flows, cinder cones, and tumuli compete with physical and chemical erosional processes to control the morphology of mafic volcanic landscapes. If volcanic effusion rates are high, these landscapes are primarily constructional, but over the timescales associated with hot spot volcanism (1-10 Myr) and arcs (10-50 Myr), chemical and physical erosional processes are important. For fluvial incision to occur, initially high infiltration rates must be overcome by chemical weathering or input of fine-grained sediment. We investigate lava flow resurfacing, using a new lava flow algorithm that can be calibrated for specific flows and eruption magnitude/frequency relationships, into a landscape evolution model to complete two modeling experiments to investigate the interplay between volcanic resurfacing and fluvial incision. We use a stochastic spatial vent distribution calibrated from the Hawaiian eruption record to resurface a synthetically produced ocean island. In one experiment, we investigate the consequences of including time-dependent channel incision efficiency. This effectively mimics the behavior of transient hydrological development of lava flows. In the second experiment, we explore the competition between channel incision and lava flow resurfacing. The relative magnitudes of channel incision versus lava flow resurfacing are captured in landscape topography. For example, during the shield building period for ocean islands, effusion rates are high and the signature of lava flow resurfacing dominates. In contrast, after the shield building phase, channel incision begins and eventually dominates the topographic signature. We develop a dimensionless ratio of resurfacing rate to erosion rate to characterize the transition between these processes. We use spectral techniques to characterize volcanic features and to pinpoint the transition between constructional and erosional morphology on modeled landscapes and on the Big Island of Hawaii.

  4. Preclinical drug development.

    PubMed

    Brodniewicz, Teresa; Grynkiewicz, Grzegorz

    2010-01-01

    Life sciences provide reasonably sound prognosis for a number and nature of therapeutic targets on which drug design could be based, and search for new chemical entities--future new drugs, is now more than ever based on scientific principles. Nevertheless, current very long and incredibly costly drug discovery and development process is very inefficient, with attrition rate spanning from many thousands of new chemical structures, through a handful of validated drug leads, to single successful new drug launches, achieved in average after 13 years, with compounded cost estimates from hundreds of thousands to over one billion US dollars. Since radical pharmaceutical innovation is critically needed, number of new research projects concerning this area is steeply rising outside of big pharma industry--both in academic environment and in small private companies. Their prospective success will critically depend on project management, which requires combined knowledge of scientific, technical and legal matters, comprising regulations concerning admission of new drug candidates to be subjects of clinical studies. This paper attempts to explain basic rules and requirements of drug development within preclinical study period, in case of new chemical entities of natural or synthetic origin, which belong to low molecular weight category.

  5. Complex optimization for big computational and experimental neutron datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bao, Feng; Oak Ridge National Lab.; Archibald, Richard

    Here, we present a framework to use high performance computing to determine accurate solutions to the inverse optimization problem of big experimental data against computational models. We demonstrate how image processing, mathematical regularization, and hierarchical modeling can be used to solve complex optimization problems on big data. We also demonstrate how both model and data information can be used to further increase solution accuracy of optimization by providing confidence regions for the processing and regularization algorithms. Finally, we use the framework in conjunction with the software package SIMPHONIES to analyze results from neutron scattering experiments on silicon single crystals, andmore » refine first principles calculations to better describe the experimental data.« less

  6. Complex optimization for big computational and experimental neutron datasets

    DOE PAGES

    Bao, Feng; Oak Ridge National Lab.; Archibald, Richard; ...

    2016-11-07

    Here, we present a framework to use high performance computing to determine accurate solutions to the inverse optimization problem of big experimental data against computational models. We demonstrate how image processing, mathematical regularization, and hierarchical modeling can be used to solve complex optimization problems on big data. We also demonstrate how both model and data information can be used to further increase solution accuracy of optimization by providing confidence regions for the processing and regularization algorithms. Finally, we use the framework in conjunction with the software package SIMPHONIES to analyze results from neutron scattering experiments on silicon single crystals, andmore » refine first principles calculations to better describe the experimental data.« less

  7. Big-data-based edge biomarkers: study on dynamical drug sensitivity and resistance in individuals.

    PubMed

    Zeng, Tao; Zhang, Wanwei; Yu, Xiangtian; Liu, Xiaoping; Li, Meiyi; Chen, Luonan

    2016-07-01

    Big-data-based edge biomarker is a new concept to characterize disease features based on biomedical big data in a dynamical and network manner, which also provides alternative strategies to indicate disease status in single samples. This article gives a comprehensive review on big-data-based edge biomarkers for complex diseases in an individual patient, which are defined as biomarkers based on network information and high-dimensional data. Specifically, we firstly introduce the sources and structures of biomedical big data accessible in public for edge biomarker and disease study. We show that biomedical big data are typically 'small-sample size in high-dimension space', i.e. small samples but with high dimensions on features (e.g. omics data) for each individual, in contrast to traditional big data in many other fields characterized as 'large-sample size in low-dimension space', i.e. big samples but with low dimensions on features. Then, we demonstrate the concept, model and algorithm for edge biomarkers and further big-data-based edge biomarkers. Dissimilar to conventional biomarkers, edge biomarkers, e.g. module biomarkers in module network rewiring-analysis, are able to predict the disease state by learning differential associations between molecules rather than differential expressions of molecules during disease progression or treatment in individual patients. In particular, in contrast to using the information of the common molecules or edges (i.e.molecule-pairs) across a population in traditional biomarkers including network and edge biomarkers, big-data-based edge biomarkers are specific for each individual and thus can accurately evaluate the disease state by considering the individual heterogeneity. Therefore, the measurement of big data in a high-dimensional space is required not only in the learning process but also in the diagnosing or predicting process of the tested individual. Finally, we provide a case study on analyzing the temporal expression data from a malaria vaccine trial by big-data-based edge biomarkers from module network rewiring-analysis. The illustrative results show that the identified module biomarkers can accurately distinguish vaccines with or without protection and outperformed previous reported gene signatures in terms of effectiveness and efficiency. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  8. Thermophilic versus Mesophilic Anaerobic Digestion of Sewage Sludge: A Comparative Review

    PubMed Central

    Gebreeyessus, Getachew D.; Jenicek, Pavel

    2016-01-01

    During advanced biological wastewater treatment, a huge amount of sludge is produced as a by-product of the treatment process. Hence, reuse and recovery of resources and energy from the sludge is a big technological challenge. The processing of sludge produced by Wastewater Treatment Plants (WWTPs) is massive, which takes up a big part of the overall operational costs. In this regard, anaerobic digestion (AD) of sewage sludge continues to be an attractive option to produce biogas that could contribute to the wastewater management cost reduction and foster the sustainability of those WWTPs. At the same time, AD reduces sludge amounts and that again contributes to the reduction of the sludge disposal costs. However, sludge volume minimization remains, a challenge thus improvement of dewatering efficiency is an inevitable part of WWTP operation. As a result, AD parameters could have significant impact on sludge properties. One of the most important operational parameters influencing the AD process is temperature. Consequently, the thermophilic and the mesophilic modes of sludge AD are compared for their pros and cons by many researchers. However, most comparisons are more focused on biogas yield, process speed and stability. Regarding the biogas yield, thermophilic sludge AD is preferred over the mesophilic one because of its faster biochemical reaction rate. Equally important but not studied sufficiently until now was the influence of temperature on the digestate quality, which is expressed mainly by the sludge dewateringability, and the reject water quality (chemical oxygen demand, ammonia nitrogen, and pH). In the field of comparison of thermophilic and mesophilic digestion process, few and often inconclusive research, unfortunately, has been published so far. Hence, recommendations for optimized technologies have not yet been done. The review presented provides a comparison of existing sludge AD technologies and the gaps that need to be filled so as to optimize the connection between the two systems. In addition, many other relevant AD process parameters, including sludge rheology, which need to be addressed, are also reviewed and presented. PMID:28952577

  9. Chemical Expertise: Chemistry in the Royal Prussian Porcelain Manufactory.

    PubMed

    Klein, Ursula

    2014-01-01

    Eighteenth-century chemists defined chemistry as both a "science and an art." By "chemical art" they meant not merely experimentation but also parts of certain arts and crafts. This raises the question of how to identify the "chemical parts" of the arts and crafts in eighteenth-century Europe. In this essay I tackle this question with respect to porcelain manufacture. My essay begins with a brief discussion of historiographical problems related to this question. It then analyzes practices involved in porcelain manufacture that can be reasonably identified as chemical practices or a chemical art. My analysis yields evidence for the argument that chemical experts and expertise fulfilled distinct technical functions in porcelain manufacture and, by extension, in eighteenth-century "big industry," along with its system of division of labor.

  10. Results from the Big Spring basin water quality monitoring and demonstration projects, Iowa, USA

    USGS Publications Warehouse

    Rowden, R.D.; Liu, H.; Libra, R.D.

    2001-01-01

    Agricultural practices, hydrology, and water quality of the 267-km2 Big Spring groundwater drainage basin in Clayton County, Iowa, have been monitored since 1981. Land use is agricultural; nitrate-nitrogen (-N) and herbicides are the resulting contaminants in groundwater and surface water. Ordovician Galena Group carbonate rocks comprise the main aquifer in the basin. Recharge to this karstic aquifer is by infiltration, augmented by sinkhole-captured runoff. Groundwater is discharged at Big Spring, where quantity and quality of the discharge are monitored. Monitoring has shown a threefold increase in groundwater nitrate-N concentrations from the 1960s to the early 1980s. The nitrate-N discharged from the basin typically is equivalent to over one-third of the nitrogen fertilizer applied, with larger losses during wetter years. Atrazine is present in groundwater all year; however, contaminant concentrations in the groundwater respond directly to recharge events, and unique chemical signatures of infiltration versus runoff recharge are detectable in the discharge from Big Spring. Education and demonstration efforts have reduced nitrogen fertilizer application rates by one-third since 1981. Relating declines in nitrate and pesticide concentrations to inputs of nitrogen fertilizer and pesticides at Big Spring is problematic. Annual recharge has varied five-fold during monitoring, overshadowing any water-quality improvements resulting from incrementally decreased inputs. ?? Springer-Verlag 2001.

  11. Semantic Web technologies for the big data in life sciences.

    PubMed

    Wu, Hongyan; Yamaguchi, Atsuko

    2014-08-01

    The life sciences field is entering an era of big data with the breakthroughs of science and technology. More and more big data-related projects and activities are being performed in the world. Life sciences data generated by new technologies are continuing to grow in not only size but also variety and complexity, with great speed. To ensure that big data has a major influence in the life sciences, comprehensive data analysis across multiple data sources and even across disciplines is indispensable. The increasing volume of data and the heterogeneous, complex varieties of data are two principal issues mainly discussed in life science informatics. The ever-evolving next-generation Web, characterized as the Semantic Web, is an extension of the current Web, aiming to provide information for not only humans but also computers to semantically process large-scale data. The paper presents a survey of big data in life sciences, big data related projects and Semantic Web technologies. The paper introduces the main Semantic Web technologies and their current situation, and provides a detailed analysis of how Semantic Web technologies address the heterogeneous variety of life sciences big data. The paper helps to understand the role of Semantic Web technologies in the big data era and how they provide a promising solution for the big data in life sciences.

  12. Internet of things and Big Data as potential solutions to the problems in waste electrical and electronic equipment management: An exploratory study.

    PubMed

    Gu, Fu; Ma, Buqing; Guo, Jianfeng; Summers, Peter A; Hall, Philip

    2017-10-01

    Management of Waste Electrical and Electronic Equipment (WEEE) is a vital part in solid waste management, there are still some difficult issues require attentionss. This paper investigates the potential of applying Internet of Things (IoT) and Big Data as the solutions to the WEEE management problems. The massive data generated during the production, consumption and disposal of Electrical and Electronic Equipment (EEE) fits the characteristics of Big Data. Through using the state-of-the-art communication technologies, the IoT derives the WEEE "Big Data" from the life cycle of EEE, and the Big Data technologies process the WEEE "Big Data" for supporting decision making in WEEE management. The framework of implementing the IoT and the Big Data technologies is proposed, with its multiple layers are illustrated. Case studies with the potential application scenarios of the framework are presented and discussed. As an unprecedented exploration, the combined application of the IoT and the Big Data technologies in WEEE management brings a series of opportunities as well as new challenges. This study provides insights and visions for stakeholders in solving the WEEE management problems under the context of IoT and Big Data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset.

    PubMed

    Kamal, Sarwar; Ripon, Shamim Hasnat; Dey, Nilanjan; Ashour, Amira S; Santhi, V

    2016-07-01

    In the age of information superhighway, big data play a significant role in information processing, extractions, retrieving and management. In computational biology, the continuous challenge is to manage the biological data. Data mining techniques are sometimes imperfect for new space and time requirements. Thus, it is critical to process massive amounts of data to retrieve knowledge. The existing software and automated tools to handle big data sets are not sufficient. As a result, an expandable mining technique that enfolds the large storage and processing capability of distributed or parallel processing platforms is essential. In this analysis, a contemporary distributed clustering methodology for imbalance data reduction using k-nearest neighbor (K-NN) classification approach has been introduced. The pivotal objective of this work is to illustrate real training data sets with reduced amount of elements or instances. These reduced amounts of data sets will ensure faster data classification and standard storage management with less sensitivity. However, general data reduction methods cannot manage very big data sets. To minimize these difficulties, a MapReduce-oriented framework is designed using various clusters of automated contents, comprising multiple algorithmic approaches. To test the proposed approach, a real DNA (deoxyribonucleic acid) dataset that consists of 90 million pairs has been used. The proposed model reduces the imbalance data sets from large-scale data sets without loss of its accuracy. The obtained results depict that MapReduce based K-NN classifier provided accurate results for big data of DNA. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  14. Anticipated Changes in Conducting Scientific Data-Analysis Research in the Big-Data Era

    NASA Astrophysics Data System (ADS)

    Kuo, Kwo-Sen; Seablom, Michael; Clune, Thomas; Ramachandran, Rahul

    2014-05-01

    A Big-Data environment is one that is capable of orchestrating quick-turnaround analyses involving large volumes of data for numerous simultaneous users. Based on our experiences with a prototype Big-Data analysis environment, we anticipate some important changes in research behaviors and processes while conducting scientific data-analysis research in the near future as such Big-Data environments become the mainstream. The first anticipated change will be the reduced effort and difficulty in most parts of the data management process. A Big-Data analysis environment is likely to house most of the data required for a particular research discipline along with appropriate analysis capabilities. This will reduce the need for researchers to download local copies of data. In turn, this also reduces the need for compute and storage procurement by individual researchers or groups, as well as associated maintenance and management afterwards. It is almost certain that Big-Data environments will require a different "programming language" to fully exploit the latent potential. In addition, the process of extending the environment to provide new analysis capabilities will likely be more involved than, say, compiling a piece of new or revised code. We thus anticipate that researchers will require support from dedicated organizations associated with the environment that are composed of professional software engineers and data scientists. A major benefit will likely be that such extensions are of higher-quality and broader applicability than ad hoc changes by physical scientists. Another anticipated significant change is improved collaboration among the researchers using the same environment. Since the environment is homogeneous within itself, many barriers to collaboration are minimized or eliminated. For example, data and analysis algorithms can be seamlessly shared, reused and re-purposed. In conclusion, we will be able to achieve a new level of scientific productivity in the Big-Data analysis environments.

  15. Anticipated Changes in Conducting Scientific Data-Analysis Research in the Big-Data Era

    NASA Technical Reports Server (NTRS)

    Kuo, Kwo-Sen; Seablom, Michael; Clune, Thomas; Ramachandran, Rahul

    2014-01-01

    A Big-Data environment is one that is capable of orchestrating quick-turnaround analyses involving large volumes of data for numerous simultaneous users. Based on our experiences with a prototype Big-Data analysis environment, we anticipate some important changes in research behaviors and processes while conducting scientific data-analysis research in the near future as such Big-Data environments become the mainstream. The first anticipated change will be the reduced effort and difficulty in most parts of the data management process. A Big-Data analysis environment is likely to house most of the data required for a particular research discipline along with appropriate analysis capabilities. This will reduce the need for researchers to download local copies of data. In turn, this also reduces the need for compute and storage procurement by individual researchers or groups, as well as associated maintenance and management afterwards. It is almost certain that Big-Data environments will require a different "programming language" to fully exploit the latent potential. In addition, the process of extending the environment to provide new analysis capabilities will likely be more involved than, say, compiling a piece of new or revised code.We thus anticipate that researchers will require support from dedicated organizations associated with the environment that are composed of professional software engineers and data scientists. A major benefit will likely be that such extensions are of higherquality and broader applicability than ad hoc changes by physical scientists. Another anticipated significant change is improved collaboration among the researchers using the same environment. Since the environment is homogeneous within itself, many barriers to collaboration are minimized or eliminated. For example, data and analysis algorithms can be seamlessly shared, reused and re-purposed. In conclusion, we will be able to achieve a new level of scientific productivity in the Big-Data analysis environments.

  16. A Robust and Resilient Network Design Paradigm for Region-Based Faults Inflicted by WMD Attack

    DTIC Science & Technology

    2016-04-01

    MEASUREMENTS FOR GRID MONITORING AND CONTROL AGAINST POSSIBLE WMD ATTACKS We investigated big data processing of PMU measurements for grid monitoring and...control against possible WMD attacks. Big data processing and analytics of synchrophasor measurements, collected from multiple locations of power grids...collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources

  17. Developing a Hadoop-based Middleware for Handling Multi-dimensional NetCDF

    NASA Astrophysics Data System (ADS)

    Li, Z.; Yang, C. P.; Schnase, J. L.; Duffy, D.; Lee, T. J.

    2014-12-01

    Climate observations and model simulations are collecting and generating vast amounts of climate data, and these data are ever-increasing and being accumulated in a rapid speed. Effectively managing and analyzing these data are essential for climate change studies. Hadoop, a distributed storage and processing framework for large data sets, has attracted increasing attentions in dealing with the Big Data challenge. The maturity of Infrastructure as a Service (IaaS) of cloud computing further accelerates the adoption of Hadoop in solving Big Data problems. However, Hadoop is designed to process unstructured data such as texts, documents and web pages, and cannot effectively handle the scientific data format such as array-based NetCDF files and other binary data format. In this paper, we propose to build a Hadoop-based middleware for transparently handling big NetCDF data by 1) designing a distributed climate data storage mechanism based on POSIX-enabled parallel file system to enable parallel big data processing with MapReduce, as well as support data access by other systems; 2) modifying the Hadoop framework to transparently processing NetCDF data in parallel without sequencing or converting the data into other file formats, or loading them to HDFS; and 3) seamlessly integrating Hadoop, cloud computing and climate data in a highly scalable and fault-tolerance framework.

  18. Cincinnati Big Area Additive Manufacturing (BAAM)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Duty, Chad E.; Love, Lonnie J.

    Oak Ridge National Laboratory (ORNL) worked with Cincinnati Incorporated (CI) to demonstrate Big Area Additive Manufacturing which increases the speed of the additive manufacturing (AM) process by over 1000X, increases the size of parts by over 10X and shows a cost reduction of over 100X. ORNL worked with CI to transition the Big Area Additive Manufacturing (BAAM) technology from a proof-of-principle (TRL 2-3) demonstration to a prototype product stage (TRL 7-8).

  19. LSP 156, Low Power Embedded Analytics: FY15 Line Supported Information, Computation, and Exploitation Program

    DTIC Science & Technology

    2015-12-04

    from back-office big - data analytics to fieldable hot-spot systems providing storage-processing-communication services for off- grid sensors. Speed...and power efficiency are the key metrics. Current state-of-the art approaches for big - data aim toward scaling out to many computers to meet...pursued within Lincoln Laboratory as well as external sponsors. Our vision is to bring new capabilities in big - data and internet-of-things applications

  20. Development of a Computational Framework for Big Data-Driven Prediction of Long-Term Bridge Performance and Traffic Flow

    DOT National Transportation Integrated Search

    2018-04-01

    Consistent efforts with dense sensor deployment and data gathering processes for bridge big data have accumulated profound information regarding bridge performance, associated environments, and traffic flows. However, direct applications of bridge bi...

  1. Big data need big theory too

    PubMed Central

    Dougherty, Edward R.; Highfield, Roger R.

    2016-01-01

    The current interest in big data, machine learning and data analytics has generated the widespread impression that such methods are capable of solving most problems without the need for conventional scientific methods of inquiry. Interest in these methods is intensifying, accelerated by the ease with which digitized data can be acquired in virtually all fields of endeavour, from science, healthcare and cybersecurity to economics, social sciences and the humanities. In multiscale modelling, machine learning appears to provide a shortcut to reveal correlations of arbitrary complexity between processes at the atomic, molecular, meso- and macroscales. Here, we point out the weaknesses of pure big data approaches with particular focus on biology and medicine, which fail to provide conceptual accounts for the processes to which they are applied. No matter their ‘depth’ and the sophistication of data-driven methods, such as artificial neural nets, in the end they merely fit curves to existing data. Not only do these methods invariably require far larger quantities of data than anticipated by big data aficionados in order to produce statistically reliable results, but they can also fail in circumstances beyond the range of the data used to train them because they are not designed to model the structural characteristics of the underlying system. We argue that it is vital to use theory as a guide to experimental design for maximal efficiency of data collection and to produce reliable predictive models and conceptual knowledge. Rather than continuing to fund, pursue and promote ‘blind’ big data projects with massive budgets, we call for more funding to be allocated to the elucidation of the multiscale and stochastic processes controlling the behaviour of complex systems, including those of life, medicine and healthcare. This article is part of the themed issue ‘Multiscale modelling at the physics–chemistry–biology interface’. PMID:27698035

  2. Translating the Science of Measuring Ecosystems at a National Scale: NEON's Online Learning Portal

    NASA Astrophysics Data System (ADS)

    Wasser, L. A.

    2015-12-01

    "Big Data" are becoming increasingly common in many fields. The National Ecological Observatory Network (NEON) will collect data over the 30 years, using consistent, standardized methods across the United States. These freely available new data provide an opportunity for increased understanding of continental- and global scale processes such as changes in vegetation structure and condition, biodiversity and landuse. However, while "big data" are becoming more accessible and available, working with big data is challenging. New and potentially unfamiliar data types and associated processing methods, required to work with a growing diversity of available data take time and resources to learn. Analysis of these big datasets may further present a challenge given large file sizes, and uncertainty regarding best methods to properly statistically summarize and analyze results. Finally, resources that support learning these concepts and approaches, are distributed widely across multiple online spaces and may take time to find. This presentation will overview the development of NEON's collaborative University-focused online education portal. It will also cover content testing, community feedback and results from workshops using online content. Portal content is hosted in github to facilitate community input, accessibility version control. Content includes 1) videos and supporting graphics that explain key concepts related to NEON and related big spatio-temporal and 2) data tutorials that include subsets of spatio-temporal data that can be used to learn key big data skills in a self-paced approach, or that can be used as a teaching tool in the classroom or in a workshop. All resources utilize free and open data processing, visualization and analysis tools, techniques and scripts. All NEON materials are being developed in collaboration with the scientific community and are being tested via in-person workshops. Visit the portal online: www.neondataskills.org.

  3. Increasing the value of geospatial informatics with open approaches for Big Data

    NASA Astrophysics Data System (ADS)

    Percivall, G.; Bermudez, L. E.

    2017-12-01

    Open approaches to big data provide geoscientists with new capabilities to address problems of unmatched size and complexity. Consensus approaches for Big Geo Data have been addressed in multiple international workshops and testbeds organized by the Open Geospatial Consortium (OGC) in the past year. Participants came from government (NASA, ESA, USGS, NOAA, DOE); research (ORNL, NCSA, IU, JPL, CRIM, RENCI); industry (ESRI, Digital Globe, IBM, rasdaman); standards (JTC 1/NIST); and open source software communities. Results from the workshops and testbeds are documented in Testbed reports and a White Paper published by the OGC. The White Paper identifies the following set of use cases: Collection and Ingest: Remote sensed data processing; Data stream processing Prepare and Structure: SQL and NoSQL databases; Data linking; Feature identification Analytics and Visualization: Spatial-temporal analytics; Machine Learning; Data Exploration Modeling and Prediction: Integrated environmental models; Urban 4D models. Open implementations were developed in the Arctic Spatial Data Pilot using Discrete Global Grid Systems (DGGS) and in Testbeds using WPS and ESGF to publish climate predictions. Further development activities to advance open implementations of Big Geo Data include the following: Open Cloud Computing: Avoid vendor lock-in through API interoperability and Application portability. Open Source Extensions: Implement geospatial data representations in projects from Apache, Location Tech, and OSGeo. Investigate parallelization strategies for N-Dimensional spatial data. Geospatial Data Representations: Schemas to improve processing and analysis using geospatial concepts: Features, Coverages, DGGS. Use geospatial encodings like NetCDF and GeoPackge. Big Linked Geodata: Use linked data methods scaled to big geodata. Analysis Ready Data: Support "Download as last resort" and "Analytics as a service". Promote elements common to "datacubes."

  4. Big data need big theory too.

    PubMed

    Coveney, Peter V; Dougherty, Edward R; Highfield, Roger R

    2016-11-13

    The current interest in big data, machine learning and data analytics has generated the widespread impression that such methods are capable of solving most problems without the need for conventional scientific methods of inquiry. Interest in these methods is intensifying, accelerated by the ease with which digitized data can be acquired in virtually all fields of endeavour, from science, healthcare and cybersecurity to economics, social sciences and the humanities. In multiscale modelling, machine learning appears to provide a shortcut to reveal correlations of arbitrary complexity between processes at the atomic, molecular, meso- and macroscales. Here, we point out the weaknesses of pure big data approaches with particular focus on biology and medicine, which fail to provide conceptual accounts for the processes to which they are applied. No matter their 'depth' and the sophistication of data-driven methods, such as artificial neural nets, in the end they merely fit curves to existing data. Not only do these methods invariably require far larger quantities of data than anticipated by big data aficionados in order to produce statistically reliable results, but they can also fail in circumstances beyond the range of the data used to train them because they are not designed to model the structural characteristics of the underlying system. We argue that it is vital to use theory as a guide to experimental design for maximal efficiency of data collection and to produce reliable predictive models and conceptual knowledge. Rather than continuing to fund, pursue and promote 'blind' big data projects with massive budgets, we call for more funding to be allocated to the elucidation of the multiscale and stochastic processes controlling the behaviour of complex systems, including those of life, medicine and healthcare.This article is part of the themed issue 'Multiscale modelling at the physics-chemistry-biology interface'. © 2015 The Authors.

  5. Solution structure of the Big domain from Streptococcus pneumoniae reveals a novel Ca2+-binding module

    PubMed Central

    Wang, Tao; Zhang, Jiahai; Zhang, Xuecheng; Xu, Chao; Tu, Xiaoming

    2013-01-01

    Streptococcus pneumoniae is a pathogen causing acute respiratory infection, otitis media and some other severe diseases in human. In this study, the solution structure of a bacterial immunoglobulin-like (Big) domain from a putative S. pneumoniae surface protein SP0498 was determined by NMR spectroscopy. SP0498 Big domain adopts an eight-β-strand barrel-like fold, which is different in some aspects from the two-sheet sandwich-like fold of the canonical Ig-like domains. Intriguingly, we identified that the SP0498 Big domain was a Ca2+ binding domain. The structure of the Big domain is different from those of the well known Ca2+ binding domains, therefore revealing a novel Ca2+-binding module. Furthermore, we identified the critical residues responsible for the binding to Ca2+. We are the first to report the interactions between the Big domain and Ca2+ in terms of structure, suggesting an important role of the Big domain in many essential calcium-dependent cellular processes such as pathogenesis. PMID:23326635

  6. Endogenous estrogen status, but not genistein supplementation, modulates 7,12-dimethylbenz[a]anthracene-induced mutation in the liver cII gene of transgenic big blue rats.

    PubMed

    Chen, Tao; Hutts, Robert C; Mei, Nan; Liu, Xiaoli; Bishop, Michelle E; Shelton, Sharon; Manjanatha, Mugimane G; Aidoo, Anane

    2005-06-01

    A growing number of studies suggest that isoflavones found in soybeans have estrogenic activity and may safely alleviate the symptoms of menopause. One of these isoflavones, genistein, is commonly used by postmenopausal women as an alternative to hormone replacement therapy. Although sex hormones have been implicated as an important risk factor for the development of hepatocellular carcinoma, there are limited data on the potential effects of the estrogens, including phytoestrogens, on chemical mutagenesis in liver. Because of the association between mutation induction and the carcinogenesis process, we investigated whether endogenous estrogen and supplemental genistein affect 7,12-dimethylbenz[a]anthracene (DMBA)-induced mutagenesis in rat liver. Intact and ovariectomized female Big Blue rats were treated with 80 mg DMBA/kg body weight. Some of the rats also received a supplement of 1,000 ppm genistein. Sixteen weeks after the carcinogen treatment, the rats were sacrificed, their livers were removed, and mutant frequencies (MFs) and types of mutations were determined in the liver cII gene. DMBA significantly increased the MFs in liver for both the intact and ovariectomized rats. While there was no significant difference in MF between the ovariectomized and intact control animals, the mutation induction by DMBA in the ovariectomized groups was significantly higher than that in the intact groups. Dietary genistein did not alter these responses. Molecular analysis of the mutants showed that DMBA induced chemical-specific types of mutations in the liver cII gene. These results suggest that endogenous ovarian hormones have an inhibitory effect on liver mutagenesis by DMBA, whereas dietary genistein does not modulate spontaneous or DMBA-induced mutagenesis in either intact or ovariectomized rats.

  7. Economic analysis of open space box model utilization in spacecraft

    NASA Astrophysics Data System (ADS)

    Mohammad, Atif F.; Straub, Jeremy

    2015-05-01

    It is a known fact that the amount of data about space that is stored is getting larger on an everyday basis. However, the utilization of Big Data and related tools to perform ETL (Extract, Transform and Load) applications will soon be pervasive in the space sciences. We have entered in a crucial time where using Big Data can be the difference (for terrestrial applications) between organizations underperforming and outperforming their peers. The same is true for NASA and other space agencies, as well as for individual missions and the highly-competitive process of mission data analysis and publication. In most industries, conventional opponents and new candidates alike will influence data-driven approaches to revolutionize and capture the value of Big Data archives. The Open Space Box Model is poised to take the proverbial "giant leap", as it provides autonomic data processing and communications for spacecraft. We can find economic value generated from such use of data processing in our earthly organizations in every sector, such as healthcare, retail. We also can easily find retailers, performing research on Big Data, by utilizing sensors driven embedded data in products within their stores and warehouses to determine how these products are actually used in the real world.

  8. Flexible Description Language for HPC based Processing of Remote Sense Data

    NASA Astrophysics Data System (ADS)

    Nandra, Constantin; Gorgan, Dorian; Bacu, Victor

    2016-04-01

    When talking about Big Data, the most challenging aspect lays in processing them in order to gain new insight, find new patterns and gain knowledge from them. This problem is likely most apparent in the case of Earth Observation (EO) data. With ever higher numbers of data sources and increasing data acquisition rates, dealing with EO data is indeed a challenge [1]. Geoscientists should address this challenge by using flexible and efficient tools and platforms. To answer this trend, the BigEarth project [2] aims to combine the advantages of high performance computing solutions with flexible processing description methodologies in order to reduce both task execution times and task definition time and effort. As a component of the BigEarth platform, WorDeL (Workflow Description Language) [3] is intended to offer a flexible, compact and modular approach to the task definition process. WorDeL, unlike other description alternatives such as Python or shell scripts, is oriented towards the description topologies, using them as abstractions for the processing programs. This feature is intended to make it an attractive alternative for users lacking in programming experience. By promoting modular designs, WorDeL not only makes the processing descriptions more user-readable and intuitive, but also helps organizing the processing tasks into independent sub-tasks, which can be executed in parallel on multi-processor platforms in order to improve execution times. As a BigEarth platform [4] component, WorDeL represents the means by which the user interacts with the system, describing processing algorithms in terms of existing operators and workflows [5], which are ultimately translated into sets of executable commands. The WorDeL language has been designed to help in the definition of compute-intensive, batch tasks which can be distributed and executed on high-performance, cloud or grid-based architectures in order to improve the processing time. Main references for further information: [1] Gorgan, D., "Flexible and Adaptive Processing of Earth Observation Data over High Performance Computation Architectures", International Conference and Exhibition Satellite 2015, August 17-19, Houston, Texas, USA. [2] Bigearth project - flexible processing of big earth data over high performance computing architectures. http://cgis.utcluj.ro/bigearth, (2014) [3] Nandra, C., Gorgan, D., "Workflow Description Language for Defining Big Earth Data Processing Tasks", Proceedings of the Intelligent Computer Communication and Processing (ICCP), IEEE-Press, pp. 461-468, (2015). [4] Bacu, V., Stefan, T., Gorgan, D., "Adaptive Processing of Earth Observation Data on Cloud Infrastructures Based on Workflow Description", Proceedings of the Intelligent Computer Communication and Processing (ICCP), IEEE-Press, pp.444-454, (2015). [5] Mihon, D., Bacu, V., Colceriu, V., Gorgan, D., "Modeling of Earth Observation Use Cases through the KEOPS System", Proceedings of the Intelligent Computer Communication and Processing (ICCP), IEEE-Press, pp. 455-460, (2015).

  9. Population-based imaging biobanks as source of big data.

    PubMed

    Gatidis, Sergios; Heber, Sophia D; Storz, Corinna; Bamberg, Fabian

    2017-06-01

    Advances of computational sciences over the last decades have enabled the introduction of novel methodological approaches in biomedical research. Acquiring extensive and comprehensive data about a research subject and subsequently extracting significant information has opened new possibilities in gaining insight into biological and medical processes. This so-called big data approach has recently found entrance into medical imaging and numerous epidemiological studies have been implementing advanced imaging to identify imaging biomarkers that provide information about physiological processes, including normal development and aging but also on the development of pathological disease states. The purpose of this article is to present existing epidemiological imaging studies and to discuss opportunities, methodological and organizational aspects, and challenges that population imaging poses to the field of big data research.

  10. Heavy element production in inhomogeneous big bang nucleosynthesis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matsuura, Shunji; Fujimoto, Shin-ichirou; Nishimura, Sunao

    2005-12-15

    We present a new astrophysical site of the big bang nucleosynthesis (BBN) that are very peculiar compared with the standard BBN. Some models of the baryogenesis suggest that very high baryon density regions were formed in the early universe. On the other hand, recent observations suggest that heavy elements already exist in high red-shifts and the origin of these elements become a big puzzle. Motivated by these, we investigate BBN in very high baryon density regions. BBN proceeds in proton-rich environment, which is known to be like the p-process. However, by taking very heavy nuclei into account, we find thatmore » BBN proceeds through both the p-process and the r-process simultaneously. P-nuclei such as {sup 92}Mo, {sup 94}Mo, {sup 96}Ru, {sup 98}Ru whose origin is not well known are also synthesized.« less

  11. Applications and Benefits for Big Data Sets Using Tree Distances and The T-SNE Algorithm

    DTIC Science & Technology

    2016-03-01

    BENEFITS FOR BIG DATA SETS USING TREE DISTANCES AND THE T-SNE ALGORITHM by Suyoung Lee March 2016 Thesis Advisor: Samuel E. Buttrey...REPORT TYPE AND DATES COVERED Master’s thesis 4. TITLE AND SUBTITLE APPLICATIONS AND BENEFITS FOR BIG DATA SETS USING TREE DISTANCES AND THE T-SNE...this work we use tree distance computed using Buttrey’s treeClust package in R, as discussed by Buttrey and Whitaker in 2015, to process mixed data

  12. Baryon symmetric big-bang cosmology. [matter-antimatter symmetry

    NASA Technical Reports Server (NTRS)

    Stecker, F. W.

    1978-01-01

    The framework of baryon-symmetric big-bang cosmology offers the greatest potential for deducing the evolution of the universe as a consequence of physical laws and processes with the minimum number of arbitrary assumptions as to initial conditions in the big-bang. In addition, it offers the possibility of explaining the photon-baryon ratio in the universe and how galaxies and galaxy clusters are formed, and also provides the only acceptable explanation at present for the origin of the cosmic gamma ray background radiation.

  13. A Framework for Identifying and Analyzing Major Issues in Implementing Big Data and Data Analytics in E-Learning: Introduction to Special Issue on Big Data and Data Analytics

    ERIC Educational Resources Information Center

    Corbeil, Maria Elena; Corbeil, Joseph Rene; Khan, Badrul H.

    2017-01-01

    Due to rapid advancements in our ability to collect, process, and analyze massive amounts of data, it is now possible for educational institutions to gain new insights into how people learn (Kumar, 2013). E-learning has become an important part of education, and this form of learning is especially suited to the use of big data and data analysis,…

  14. The community-driven BiG CZ software system for integration and analysis of bio- and geoscience data in the critical zone

    NASA Astrophysics Data System (ADS)

    Aufdenkampe, A. K.; Mayorga, E.; Horsburgh, J. S.; Lehnert, K. A.; Zaslavsky, I.; Valentine, D. W., Jr.; Richard, S. M.; Cheetham, R.; Meyer, F.; Henry, C.; Berg-Cross, G.; Packman, A. I.; Aronson, E. L.

    2014-12-01

    Here we present the prototypes of a new scientific software system designed around the new Observations Data Model version 2.0 (ODM2, https://github.com/UCHIC/ODM2) to substantially enhance integration of biological and Geological (BiG) data for Critical Zone (CZ) science. The CZ science community takes as its charge the effort to integrate theory, models and data from the multitude of disciplines collectively studying processes on the Earth's surface. The central scientific challenge of the CZ science community is to develop a "grand unifying theory" of the critical zone through a theory-model-data fusion approach, for which the key missing need is a cyberinfrastructure for seamless 4D visual exploration of the integrated knowledge (data, model outputs and interpolations) from all the bio and geoscience disciplines relevant to critical zone structure and function, similar to today's ability to easily explore historical satellite imagery and photographs of the earth's surface using Google Earth. This project takes the first "BiG" steps toward answering that need. The overall goal of this project is to co-develop with the CZ science and broader community, including natural resource managers and stakeholders, a web-based integration and visualization environment for joint analysis of cross-scale bio and geoscience processes in the critical zone (BiG CZ), spanning experimental and observational designs. We will: (1) Engage the CZ and broader community to co-develop and deploy the BiG CZ software stack; (2) Develop the BiG CZ Portal web application for intuitive, high-performance map-based discovery, visualization, access and publication of data by scientists, resource managers, educators and the general public; (3) Develop the BiG CZ Toolbox to enable cyber-savvy CZ scientists to access BiG CZ Application Programming Interfaces (APIs); and (4) Develop the BiG CZ Central software stack to bridge data systems developed for multiple critical zone domains into a single metadata catalog. The entire BiG CZ Software system is being developed on public repositories as a modular suite of open source software projects. It will be built around a new Observations Data Model Version 2.0 (ODM2) that has been developed by members of the BiG CZ project team, with community input, under separate funding.

  15. Initial-stage examination of a testbed for the big data transfer over parallel links. The SDN approach

    NASA Astrophysics Data System (ADS)

    Khoruzhnikov, S. E.; Grudinin, V. A.; Sadov, O. L.; Shevel, A. E.; Titov, V. B.; Kairkanov, A. B.

    2015-04-01

    The transfer of Big Data over a computer network is an important and unavoidable operation in the past, present, and in any feasible future. A large variety of astronomical projects produces the Big Data. There are a number of methods to transfer the data over a global computer network (Internet) with a range of tools. In this paper we consider the transfer of one piece of Big Data from one point in the Internet to another, in general over a long-range distance: many thousand kilometers. Several free of charge systems to transfer the Big Data are analyzed here. The most important architecture features are emphasized, and the idea is discussed to add the SDN OpenFlow protocol technique for fine-grain tuning of the data transfer process over several parallel data links.

  16. BIG MAC: A bolometer array for mid-infrared astronomy, Center Director's Discretionary Fund

    NASA Technical Reports Server (NTRS)

    Telesco, C. M.; Decher, R.; Baugher, C.

    1985-01-01

    The infrared array referred to as Big Mac (for Marshall Array Camera), was designed for ground based astronomical observations in the wavelength range 5 to 35 microns. It contains 20 discrete gallium-doped germanium bolometer detectors at a temperature of 1.4K. Each bolometer is irradiated by a square field mirror constituting a single pixel of the array. The mirrors are arranged contiguously in four columns and five rows, thus defining the array configuration. Big Mac utilized cold reimaging optics and an up looking dewar. The total Big Mac system also contains a telescope interface tube for mounting the dewar and a computer for data acquisition and processing. Initial astronomical observations at a major infrared observatory indicate that Big Mac performance is excellent, having achieved the design specifications and making this instrument an outstanding tool for astrophysics.

  17. MapFactory - Towards a mapping design pattern for big geospatial data

    NASA Astrophysics Data System (ADS)

    Rautenbach, Victoria; Coetzee, Serena

    2018-05-01

    With big geospatial data emerging, cartographers and geographic information scientists have to find new ways of dealing with the volume, variety, velocity, and veracity (4Vs) of the data. This requires the development of tools that allow processing, filtering, analysing, and visualising of big data through multidisciplinary collaboration. In this paper, we present the MapFactory design pattern that will be used for the creation of different maps according to the (input) design specification for big geospatial data. The design specification is based on elements from ISO19115-1:2014 Geographic information - Metadata - Part 1: Fundamentals that would guide the design and development of the map or set of maps to be produced. The results of the exploratory research suggest that the MapFactory design pattern will help with software reuse and communication. The MapFactory design pattern will aid software developers to build the tools that are required to automate map making with big geospatial data. The resulting maps would assist cartographers and others to make sense of big geospatial data.

  18. Nursing Knowledge: Big Data Science-Implications for Nurse Leaders.

    PubMed

    Westra, Bonnie L; Clancy, Thomas R; Sensmeier, Joyce; Warren, Judith J; Weaver, Charlotte; Delaney, Connie W

    2015-01-01

    The integration of Big Data from electronic health records and other information systems within and across health care enterprises provides an opportunity to develop actionable predictive models that can increase the confidence of nursing leaders' decisions to improve patient outcomes and safety and control costs. As health care shifts to the community, mobile health applications add to the Big Data available. There is an evolving national action plan that includes nursing data in Big Data science, spearheaded by the University of Minnesota School of Nursing. For the past 3 years, diverse stakeholders from practice, industry, education, research, and professional organizations have collaborated through the "Nursing Knowledge: Big Data Science" conferences to create and act on recommendations for inclusion of nursing data, integrated with patient-generated, interprofessional, and contextual data. It is critical for nursing leaders to understand the value of Big Data science and the ways to standardize data and workflow processes to take advantage of newer cutting edge analytics to support analytic methods to control costs and improve patient quality and safety.

  19. Classification and virtual screening of androgen receptor antagonists.

    PubMed

    Li, Jiazhong; Gramatica, Paola

    2010-05-24

    Computational tools, such as quantitative structure-activity relationship (QSAR), are highly useful as screening support for prioritization of substances of very high concern (SVHC). From the practical point of view, QSAR models should be effective to pick out more active rather than inactive compounds, expressed as sensitivity in classification works. This research investigates the classification of a big data set of endocrine-disrupting chemicals (EDCs)-androgen receptor (AR) antagonists, mainly aiming to improve the external sensitivity and to screen for potential AR binders. The kNN, lazy IB1, and ADTree methods and the consensus approach were used to build different models, which improve the sensitivity on external chemicals from 57.1% (literature) to 76.4%. Additionally, the models' predictive abilities were further validated on a blind collected data set (sensitivity: 85.7%). Then the proposed classifiers were used: (i) to distinguish a set of AR binders into antagonists and agonists; (ii) to screen a combined estrogen receptor binder database to find out possible chemicals that can bind to both AR and ER; and (iii) to virtually screen our in-house environmental chemical database. The in silico screening results suggest: (i) that some compounds can affect the normal endocrine system through a complex mechanism binding both to ER and AR; (ii) new EDCs, which are nonER binders, but can in silico bind to AR, are recognized; and (iii) about 20% of compounds in a big data set of environmental chemicals are predicted as new AR antagonists. The priority should be given to them to experimentally test the binding activities with AR.

  20. Sagebrush identification, ecology, and palatability relative to sage-grouse

    Treesearch

    Roger Rosentreter

    2005-01-01

    Basic identification keys and comparison tables for 23 low and big sagebrush (Artemisia) taxa are presented. Differences in sagebrush ecology, soil temperature regimes, geographic range, palatability, mineralogy, and chemistry are discussed. Coumarin, a chemical produced in the glands of some Artemisia species, causes UV-light fluorescence of the...

  1. Chemistry--The Big Picture

    ERIC Educational Resources Information Center

    Cassell, Anne

    2011-01-01

    Chemistry produces materials and releases energy by ionic or electronic rearrangements. Three structure types affect the ease with which a reaction occurs. In the Earth's crust, "solid crystals" change chemically only with extreme heat and pressure, unless their fixed ions touch moving fluids. On the other hand, in living things, "liquid crystals"…

  2. How Did the Universe Make People? A Brief History of the Universe from the Beginning to the End

    NASA Technical Reports Server (NTRS)

    Mather, John C.

    2009-01-01

    Astronomers are beginning to know the easy part: How did the Big Bang make stars and galaxies and the chemical elements? How did solar systems form and evolve? How did the Earth and the Moon form, and how did water and carbon come to the Earth? Geologists are piecing together the history of the Earth, and biologists are coming to know the history and process of life from the earliest times. But is our planet the only life-supporting place in the universe, or are there many? Astronomers are working on that too. I will tell the story of the discovery of the Big Bang by Edwin Hubble, and how the primordial heat radiation tells the details of that universal explosion. I will tell how the James Webb Space Telescope will extend the discoveries of the Hubble Space Telescope to ever greater distances, will look inside dust clouds to see stars being born today, will measure planets around other stars, and examine the dwarf planets in the outer Solar System. I will show concepts for great new space telescopes to follow the JWST and how they could use future moon rockets to hunt for signs of life on planets around other stars.

  3. Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yoo, Wucherl; Koo, Michelle; Cao, Yu

    Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe-more » art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processed many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.« less

  4. Big data analysis framework for healthcare and social sectors in Korea.

    PubMed

    Song, Tae-Min; Ryu, Seewon

    2015-01-01

    We reviewed applications of big data analysis of healthcare and social services in developed countries, and subsequently devised a framework for such an analysis in Korea. We reviewed the status of implementing big data analysis of health care and social services in developed countries, and strategies used by the Ministry of Health and Welfare of Korea (Government 3.0). We formulated a conceptual framework of big data in the healthcare and social service sectors at the national level. As a specific case, we designed a process and method of social big data analysis on suicide buzz. Developed countries (e.g., the United States, the UK, Singapore, Australia, and even OECD and EU) are emphasizing the potential of big data, and using it as a tool to solve their long-standing problems. Big data strategies for the healthcare and social service sectors were formulated based on an ICT-based policy of current government and the strategic goals of the Ministry of Health and Welfare. We suggest a framework of big data analysis in the healthcare and welfare service sectors separately and assigned them tentative names: 'health risk analysis center' and 'integrated social welfare service network'. A framework of social big data analysis is presented by applying it to the prevention and proactive detection of suicide in Korea. There are some concerns with the utilization of big data in the healthcare and social welfare sectors. Thus, research on these issues must be conducted so that sophisticated and practical solutions can be reached.

  5. 76 FR 59420 - Proposed Information Collection; Alaska Guide Service Evaluation

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-09-26

    ... Office of Management and Budget (OMB) to approve the information collection (IC) described below. As... lands, we issue permits for commercial guide services, including big game hunting, sport fishing... information during the competitive selection process for big game and sport fishing guide permits to evaluate...

  6. Big Science and the Large Hadron Collider

    NASA Astrophysics Data System (ADS)

    Giudice, Gian Francesco

    2012-03-01

    The Large Hadron Collider (LHC), the particle accelerator operating at CERN, is probably the most complex and ambitious scientific project ever accomplished by humanity. The sheer size of the enterprise, in terms of financial and human resources, naturally raises the question whether society should support such costly basic-research programs. I address this question by first reviewing the process that led to the emergence of Big Science and the role of large projects in the development of science and technology. I then compare the methodologies of Small and Big Science, emphasizing their mutual linkage. Finally, after examining the cost of Big Science projects, I highlight several general aspects of their beneficial implications for society.

  7. Raster Data Partitioning for Supporting Distributed GIS Processing

    NASA Astrophysics Data System (ADS)

    Nguyen Thai, B.; Olasz, A.

    2015-08-01

    In the geospatial sector big data concept also has already impact. Several studies facing originally computer science techniques applied in GIS processing of huge amount of geospatial data. In other research studies geospatial data is considered as it were always been big data (Lee and Kang, 2015). Nevertheless, we can prove data acquisition methods have been improved substantially not only the amount, but the resolution of raw data in spectral, spatial and temporal aspects as well. A significant portion of big data is geospatial data, and the size of such data is growing rapidly at least by 20% every year (Dasgupta, 2013). The produced increasing volume of raw data, in different format, representation and purpose the wealth of information derived from this data sets represents only valuable results. However, the computing capability and processing speed rather tackle with limitations, even if semi-automatic or automatic procedures are aimed on complex geospatial data (Kristóf et al., 2014). In late times, distributed computing has reached many interdisciplinary areas of computer science inclusive of remote sensing and geographic information processing approaches. Cloud computing even more requires appropriate processing algorithms to be distributed and handle geospatial big data. Map-Reduce programming model and distributed file systems have proven their capabilities to process non GIS big data. But sometimes it's inconvenient or inefficient to rewrite existing algorithms to Map-Reduce programming model, also GIS data can not be partitioned as text-based data by line or by bytes. Hence, we would like to find an alternative solution for data partitioning, data distribution and execution of existing algorithms without rewriting or with only minor modifications. This paper focuses on technical overview of currently available distributed computing environments, as well as GIS data (raster data) partitioning, distribution and distributed processing of GIS algorithms. A proof of concept implementation have been made for raster data partitioning, distribution and processing. The first results on performance have been compared against commercial software ERDAS IMAGINE 2011 and 2014. Partitioning methods heavily depend on application areas, therefore we may consider data partitioning as a preprocessing step before applying processing services on data. As a proof of concept we have implemented a simple tile-based partitioning method splitting an image into smaller grids (NxM tiles) and comparing the processing time to existing methods by NDVI calculation. The concept is demonstrated using own development open source processing framework.

  8. Evaluation of lamprey larvicides in the Big Garlic River and Saux Head Lake

    USGS Publications Warehouse

    Manion, Patrick J.

    1969-01-01

    Bayluscide (5,2'-dichloro-4'-nitrosalicylanilide) and TFM (3-trifluoromethyl-4-nitrophenol) were evaluated as selective larvicides for control of the sea lamprey, Petromyzon marinus, in the Big Garlic River and Saux Head Lake in Marquette County, Michigan. Population estimates and movement of ammocetes were determined from the recapture of marked ammocetes released before chemical treatment. In 1966 the estimated population of 3136 ammocetes off the stream mouth in Saux Head Lake was reduced 89% by treatment with granular Bayluscide; this percentage was supported by a population estimate of 120 ammocetes in 1967, an indicated reduction of 96% from 1966. Post-marking movement of ammocetes was greater upstream than downstream.

  9. Big data and clinicians: a review on the state of the science.

    PubMed

    Wang, Weiqi; Krishnan, Eswar

    2014-01-17

    In the past few decades, medically related data collection saw a huge increase, referred to as big data. These huge datasets bring challenges in storage, processing, and analysis. In clinical medicine, big data is expected to play an important role in identifying causality of patient symptoms, in predicting hazards of disease incidence or reoccurrence, and in improving primary-care quality. The objective of this review was to provide an overview of the features of clinical big data, describe a few commonly employed computational algorithms, statistical methods, and software toolkits for data manipulation and analysis, and discuss the challenges and limitations in this realm. We conducted a literature review to identify studies on big data in medicine, especially clinical medicine. We used different combinations of keywords to search PubMed, Science Direct, Web of Knowledge, and Google Scholar for literature of interest from the past 10 years. This paper reviewed studies that analyzed clinical big data and discussed issues related to storage and analysis of this type of data. Big data is becoming a common feature of biological and clinical studies. Researchers who use clinical big data face multiple challenges, and the data itself has limitations. It is imperative that methodologies for data analysis keep pace with our ability to collect and store data.

  10. The Challenge of Handling Big Data Sets in the Sensor Web

    NASA Astrophysics Data System (ADS)

    Autermann, Christian; Stasch, Christoph; Jirka, Simon

    2016-04-01

    More and more Sensor Web components are deployed in different domains such as hydrology, oceanography or air quality in order to make observation data accessible via the Web. However, besides variability of data formats and protocols in environmental applications, the fast growing volume of data with high temporal and spatial resolution is imposing new challenges for Sensor Web technologies when sharing observation data and metadata about sensors. Variability, volume and velocity are the core issues that are addressed by Big Data concepts and technologies. Most solutions in the geospatial sector focus on remote sensing and raster data, whereas big in-situ observation data sets relying on vector features require novel approaches. Hence, in order to deal with big data sets in infrastructures for observational data, the following questions need to be answered: 1. How can big heterogeneous spatio-temporal datasets be organized, managed, and provided to Sensor Web applications? 2. How can views on big data sets and derived information products be made accessible in the Sensor Web? 3. How can big observation data sets be processed efficiently? We illustrate these challenges with examples from the marine domain and outline how we address these challenges. We therefore show how big data approaches from mainstream IT can be re-used and applied to Sensor Web application scenarios.

  11. Big Data in industry

    NASA Astrophysics Data System (ADS)

    Latinović, T. S.; Preradović, D. M.; Barz, C. R.; Latinović, M. T.; Petrica, P. P.; Pop-Vadean, A.

    2016-08-01

    The amount of data at the global level has grown exponentially. Along with this phenomena, we have a need for a new unit of measure like exabyte, zettabyte, and yottabyte as the last unit measures the amount of data. The growth of data gives a situation where the classic systems for the collection, storage, processing, and visualization of data losing the battle with a large amount, speed, and variety of data that is generated continuously. Many of data that is created by the Internet of Things, IoT (cameras, satellites, cars, GPS navigation, etc.). It is our challenge to come up with new technologies and tools for the management and exploitation of these large amounts of data. Big Data is a hot topic in recent years in IT circles. However, Big Data is recognized in the business world, and increasingly in the public administration. This paper proposes an ontology of big data analytics and examines how to enhance business intelligence through big data analytics as a service by presenting a big data analytics services-oriented architecture. This paper also discusses the interrelationship between business intelligence and big data analytics. The proposed approach in this paper might facilitate the research and development of business analytics, big data analytics, and business intelligence as well as intelligent agents.

  12. Contemporary Research Discourse and Issues on Big Data in Higher Education

    ERIC Educational Resources Information Center

    Daniel, Ben

    2017-01-01

    The increasing availability of digital data in higher education provides an extraordinary resource for researchers to undertake educational research, targeted at understanding challenges facing the sector. Big data can stimulate new ways to transform processes relating to learning and teaching, and helps identify useful data, sources of evidence…

  13. More on Sports and the Big6.

    ERIC Educational Resources Information Center

    Eisenberg, Mike

    1998-01-01

    Presents strategies for relating the Big6 information problem-solving process to sports to gain students' attention, sustain it, and make instruction relevant to their interests. Lectures by coaches, computer-based sports games, sports information sources, the use of technology in sports, and judging sports events are discussed. (LRW)

  14. Native bunchgrass response to prescribed fire in ungrazed Mountain Big Sagebrush ecosystems

    Treesearch

    Lisa M. Ellsworth; J. Boone Kauffman

    2010-01-01

    Fire was historically a dominant ecological process throughout mountain big sagebrush (Artemisia tridentata Nutt. ssp. vaseyana [Rydb.] Beetle) ecosystems of western North America, and the native biota have developed many adaptations to persist in a regime typified by frequent fires. Following spring and fall prescribed fires...

  15. Reconnaissance-level assessment of water quality near Flandreau, South Dakota

    USGS Publications Warehouse

    Schaap, Bryan D.

    2002-01-01

    This report presents water-quality data that have been compiled and collected for a reconnaissance-level assessment of water quality near Flandreau, South Dakota. The investigation was initiated as a cooperative effort between the U.S. Geological Survey and the Flandreau Santee Sioux Tribe. Members of the Flandreau Santee Sioux Tribe have expressed concern that Tribal members residing in the city of Flandreau experience more health problems than the general population in the surrounding area. Prior to December 2000, water for the city of Flandreau was supplied by wells completed in the Big Sioux aquifer within the city of Flandreau. After December 2000, water for the city of Flandreau was supplied by the Big Sioux Community Water System from wells completed in the Big Sioux aquifer along the Big Sioux River near Egan, about 8 river miles downstream of Flandreau. There is some concern that the public and private water supplies provided by wells completed in the Big Sioux aquifer near the Big Sioux River may contain chemicals that contribute to the health problems. Data compiled from other investigations provide information about the water quality of the Big Sioux River and the Big Sioux aquifer in the Flandreau area from 1978 through 2001. The median, minimum, and maximum values are presented for fecal bacteria, nitrate, arsenic, and atrazine. Nitrate concentrations of water from Flandreau public-supply wells occasionally exceeded the Maximum Contaminant Level of 10 milligrams per liter for public drinking water. For this study, untreated-water samples were collected from the Big Sioux River in Flandreau and from five wells completed in the Big Sioux aquifer in and near Flandreau. Treated-water samples from the Big Sioux Community Water System were collected at a site about midway between the treatment facility near Egan and the city of Flandreau. The first round of sampling occurred during July 9-12, 2001, and the second round of sampling occurred during August 20-27, 2001. Samples were analyzed for a broad range of compounds, including major ions, nutrients, trace elements, pesticides, antibiotics, and organic wastewater compounds, some of which might cause adverse health effects after long-term exposure. Samples collected on August 27, 2001, from the Big Sioux River also were analyzed for human pharmaceutical compounds. The quality of the water in the Big Sioux River and the Big Sioux aquifer in the Flandreau area cannot be thoroughly characterized with the limited number of samples collected within a 2-month period, and for many analytes, neither drinking-water standards nor associations with adverse health effects have been established. Concentrations of some selected analytes were less than U.S. Environmental Protection Agency drinking-water standards at the time of the sampling, and concentrations of most organic compounds were less than the respective method reporting levels for most of the samples.

  16. Direct Self-Sustained Fragmentation Cascade of Reactive Droplets

    NASA Astrophysics Data System (ADS)

    Inoue, Chihiro; Izato, Yu-ichiro; Miyake, Atsumi; Villermaux, Emmanuel

    2017-02-01

    A traditional hand-held firework generates light streaks similar to branched pine needles, with ever smaller ramifications. These streaks are the trajectories of incandescent reactive liquid droplets bursting from a melted powder. We have uncovered the detailed sequence of events, which involve a chemical reaction with the oxygen of air, thermal decomposition of metastable compounds in the melt, gas bubble nucleation and bursting, liquid ligaments and droplets formation, all occurring in a sequential fashion. We have also evidenced a rare instance in nature of a spontaneous fragmentation process involving a direct cascade from big to smaller droplets. Here, the self-sustained direct cascade is shown to proceed over up to eight generations, with well-defined time and length scales, thus answering a century old question, and enriching, with a new example, the phenomenology of comminution.

  17. Storage capacity: how big should it be

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Malina, M.A.

    1980-01-28

    A mathematical model was developed for determining the economically optimal storage capacity of a given material or product at a manufacturing plant. The optimum was defined as a trade-off between the inventory-holding costs and the cost of customer-service failures caused by insufficient stocks for a peak-demand period. The order-arrival, production, storage, and shipment process was simulated by Monte Carlo techniques to calculate the probability of order delays for various lengths of time as a function of storage capacity. Example calculations for the storage of a bulk liquid chemical in tanks showed that the conclusions arrived at, via this model, aremore » comparatively insensitive to errors made in estimating the capital cost of storage or the risk of losing an order because of a late delivery.« less

  18. Advancing Nucleosynthesis in Core-Collapse Supernovae Models Using 2D CHIMERA Simulations

    NASA Astrophysics Data System (ADS)

    Harris, J. A.; Hix, W. R.; Chertkow, M. A.; Bruenn, S. W.; Lentz, E. J.; Messer, O. B.; Mezzacappa, A.; Blondin, J. M.; Marronetti, P.; Yakunin, K.

    2014-01-01

    The deaths of massive stars as core-collapse supernovae (CCSN) serve as a crucial link in understanding galactic chemical evolution since the birth of the universe via the Big Bang. We investigate CCSN in polar axisymmetric simulations using the multidimensional radiation hydrodynamics code CHIMERA. Computational costs have traditionally constrained the evolution of the nuclear composition in CCSN models to, at best, a 14-species α-network. However, the limited capacity of the α-network to accurately evolve detailed composition, the neutronization and the nuclear energy generation rate has fettered the ability of prior CCSN simulations to accurately reproduce the chemical abundances and energy distributions as known from observations. These deficits can be partially ameliorated by "post-processing" with a more realistic network. Lagrangian tracer particles placed throughout the star record the temporal evolution of the initial simulation and enable the extension of the nuclear network evolution by incorporating larger systems in post-processing nucleosynthesis calculations. We present post-processing results of the four ab initio axisymmetric CCSN 2D models of Bruenn et al. (2013) evolved with the smaller α-network, and initiated from stellar metallicity, non-rotating progenitors of mass 12, 15, 20, and 25 M⊙ from Woosley & Heger (2007). As a test of the limitations of post-processing, we provide preliminary results from an ongoing simulation of the 15 M⊙ model evolved with a realistic 150 species nuclear reaction network in situ. With more accurate energy generation rates and an improved determination of the thermodynamic trajectories of the tracer particles, we can better unravel the complicated multidimensional "mass-cut" in CCSN simulations and probe for less energetically significant nuclear processes like the νp-process and the r-process, which require still larger networks.

  19. -Omic and Electronic Health Records Big Data Analytics for Precision Medicine

    PubMed Central

    Wu, Po-Yen; Cheng, Chih-Wen; Kaddi, Chanchala D.; Venugopalan, Janani; Hoffman, Ryan; Wang, May D.

    2017-01-01

    Objective Rapid advances of high-throughput technologies and wide adoption of electronic health records (EHRs) have led to fast accumulation of -omic and EHR data. These voluminous complex data contain abundant information for precision medicine, and big data analytics can extract such knowledge to improve the quality of health care. Methods In this article, we present -omic and EHR data characteristics, associated challenges, and data analytics including data pre-processing, mining, and modeling. Results To demonstrate how big data analytics enables precision medicine, we provide two case studies, including identifying disease biomarkers from multi-omic data and incorporating -omic information into EHR. Conclusion Big data analytics is able to address –omic and EHR data challenges for paradigm shift towards precision medicine. Significance Big data analytics makes sense of –omic and EHR data to improve healthcare outcome. It has long lasting societal impact. PMID:27740470

  20. [Empowerment of women in difficult life situations: the BIG project].

    PubMed

    Rütten, A; Röger, U; Abu-Omar, K; Frahsa, A

    2008-12-01

    BIG is a project for the promotion of physical activity among women in difficult life situations. Following the main health promotion principles of the WHO, the women shall be enabled or empowered to take control of determinants of their health. A comprehensive participatory approach was applied and women were included in planning, implementing and evaluating the project. For measuring the effects of BIG on the empowerment of participating women, qualitative semi-structured interviews with 15 women participating in BIG were conducted. For data analysis, qualitative content analysis was used. Results showed the empowerment of the women on the individual level as they gained different competencies and perceived self-efficacy. These effects were supported through the empowerment process on the organizational and community levels where women gained control over their life situations and over policies influencing them. Therefore, the participatory approach of BIG is a key success factor for empowerment promotion of women in difficult life situations.

  1. Integrating Remote and Social Sensing Data for a Scenario on Secure Societies in Big Data Platform

    NASA Astrophysics Data System (ADS)

    Albani, Sergio; Lazzarini, Michele; Koubarakis, Manolis; Taniskidou, Efi Karra; Papadakis, George; Karkaletsis, Vangelis; Giannakopoulos, George

    2016-08-01

    In the framework of the Horizon 2020 project BigDataEurope (Integrating Big Data, Software & Communities for Addressing Europe's Societal Challenges), a pilot for the Secure Societies Societal Challenge was designed considering the requirements coming from relevant stakeholders. The pilot is focusing on the integration in a Big Data platform of data coming from remote and social sensing.The information on land changes coming from the Copernicus Sentinel 1A sensor (Change Detection workflow) is integrated with information coming from selected Twitter and news agencies accounts (Event Detection workflow) in order to provide the user with multiple sources of information.The Change Detection workflow implements a processing chain in a distributed parallel manner, exploiting the Big Data capabilities in place; the Event Detection workflow implements parallel and distributed social media and news agencies monitoring as well as suitable mechanisms to detect and geo-annotate the related events.

  2. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    NASA Astrophysics Data System (ADS)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  3. Big Data and Biomedical Informatics: A Challenging Opportunity

    PubMed Central

    2014-01-01

    Summary Big data are receiving an increasing attention in biomedicine and healthcare. It is therefore important to understand the reason why big data are assuming a crucial role for the biomedical informatics community. The capability of handling big data is becoming an enabler to carry out unprecedented research studies and to implement new models of healthcare delivery. Therefore, it is first necessary to deeply understand the four elements that constitute big data, namely Volume, Variety, Velocity, and Veracity, and their meaning in practice. Then, it is mandatory to understand where big data are present, and where they can be beneficially collected. There are research fields, such as translational bioinformatics, which need to rely on big data technologies to withstand the shock wave of data that is generated every day. Other areas, ranging from epidemiology to clinical care, can benefit from the exploitation of the large amounts of data that are nowadays available, from personal monitoring to primary care. However, building big data-enabled systems carries on relevant implications in terms of reproducibility of research studies and management of privacy and data access; proper actions should be taken to deal with these issues. An interesting consequence of the big data scenario is the availability of new software, methods, and tools, such as map-reduce, cloud computing, and concept drift machine learning algorithms, which will not only contribute to big data research, but may be beneficial in many biomedical informatics applications. The way forward with the big data opportunity will require properly applied engineering principles to design studies and applications, to avoid preconceptions or over-enthusiasms, to fully exploit the available technologies, and to improve data processing and data management regulations. PMID:24853034

  4. Big data and biomedical informatics: a challenging opportunity.

    PubMed

    Bellazzi, R

    2014-05-22

    Big data are receiving an increasing attention in biomedicine and healthcare. It is therefore important to understand the reason why big data are assuming a crucial role for the biomedical informatics community. The capability of handling big data is becoming an enabler to carry out unprecedented research studies and to implement new models of healthcare delivery. Therefore, it is first necessary to deeply understand the four elements that constitute big data, namely Volume, Variety, Velocity, and Veracity, and their meaning in practice. Then, it is mandatory to understand where big data are present, and where they can be beneficially collected. There are research fields, such as translational bioinformatics, which need to rely on big data technologies to withstand the shock wave of data that is generated every day. Other areas, ranging from epidemiology to clinical care, can benefit from the exploitation of the large amounts of data that are nowadays available, from personal monitoring to primary care. However, building big data-enabled systems carries on relevant implications in terms of reproducibility of research studies and management of privacy and data access; proper actions should be taken to deal with these issues. An interesting consequence of the big data scenario is the availability of new software, methods, and tools, such as map-reduce, cloud computing, and concept drift machine learning algorithms, which will not only contribute to big data research, but may be beneficial in many biomedical informatics applications. The way forward with the big data opportunity will require properly applied engineering principles to design studies and applications, to avoid preconceptions or over-enthusiasms, to fully exploit the available technologies, and to improve data processing and data management regulations.

  5. MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing.

    PubMed

    Awan, Muaaz Gul; Saeed, Fahad

    2016-05-15

    Modern proteomics studies utilize high-throughput mass spectrometers which can produce data at an astonishing rate. These big mass spectrometry (MS) datasets can easily reach peta-scale level creating storage and analytic problems for large-scale systems biology studies. Each spectrum consists of thousands of peaks which have to be processed to deduce the peptide. However, only a small percentage of peaks in a spectrum are useful for peptide deduction as most of the peaks are either noise or not useful for a given spectrum. This redundant processing of non-useful peaks is a bottleneck for streaming high-throughput processing of big MS data. One way to reduce the amount of computation required in a high-throughput environment is to eliminate non-useful peaks. Existing noise removing algorithms are limited in their data-reduction capability and are compute intensive making them unsuitable for big data and high-throughput environments. In this paper we introduce a novel low-complexity technique based on classification, quantization and sampling of MS peaks. We present a novel data-reductive strategy for analysis of Big MS data. Our algorithm, called MS-REDUCE, is capable of eliminating noisy peaks as well as peaks that do not contribute to peptide deduction before any peptide deduction is attempted. Our experiments have shown up to 100× speed up over existing state of the art noise elimination algorithms while maintaining comparable high quality matches. Using our approach we were able to process a million spectra in just under an hour on a moderate server. The developed tool and strategy has been made available to wider proteomics and parallel computing community and the code can be found at https://github.com/pcdslab/MSREDUCE CONTACT: : fahad.saeed@wmich.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Big Data Analysis Framework for Healthcare and Social Sectors in Korea

    PubMed Central

    Song, Tae-Min

    2015-01-01

    Objectives We reviewed applications of big data analysis of healthcare and social services in developed countries, and subsequently devised a framework for such an analysis in Korea. Methods We reviewed the status of implementing big data analysis of health care and social services in developed countries, and strategies used by the Ministry of Health and Welfare of Korea (Government 3.0). We formulated a conceptual framework of big data in the healthcare and social service sectors at the national level. As a specific case, we designed a process and method of social big data analysis on suicide buzz. Results Developed countries (e.g., the United States, the UK, Singapore, Australia, and even OECD and EU) are emphasizing the potential of big data, and using it as a tool to solve their long-standing problems. Big data strategies for the healthcare and social service sectors were formulated based on an ICT-based policy of current government and the strategic goals of the Ministry of Health and Welfare. We suggest a framework of big data analysis in the healthcare and welfare service sectors separately and assigned them tentative names: 'health risk analysis center' and 'integrated social welfare service network'. A framework of social big data analysis is presented by applying it to the prevention and proactive detection of suicide in Korea. Conclusions There are some concerns with the utilization of big data in the healthcare and social welfare sectors. Thus, research on these issues must be conducted so that sophisticated and practical solutions can be reached. PMID:25705552

  7. Modelling and scale-up of chemical flooding

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pope, G.A.; Lake, L.W.; Sepehrnoori, K.

    1990-03-01

    The objective of this research is to develop, validate, and apply a comprehensive chemical flooding simulator for chemical recovery processes involving surfactants, polymers, and alkaline chemicals in various combinations. This integrated program includes components of laboratory experiments, physical property modelling, scale-up theory, and numerical analysis as necessary and integral components of the simulation activity. We have continued to develop, test, and apply our chemical flooding simulator (UTCHEM) to a wide variety of laboratory and reservoir problems involving tracers, polymers, polymer gels, surfactants, and alkaline agents. Part I is an update on the Application of Higher-Order Methods in Chemical Flooding Simulation.more » This update focuses on the comparison of grid orientation effects for four different numerical methods implemented in UTCHEM. Part II is on Simulation Design Studies and is a continuation of Saad's Big Muddy surfactant pilot simulation study reported last year. Part III reports on the Simulation of Gravity Effects under conditions similar to those of some of the oil reservoirs in the North Sea. Part IV is on Determining Oil Saturation from Interwell Tracers UTCHEM is used for large-scale interwell tracer tests. A systematic procedure for estimating oil saturation from interwell tracer data is developed and a specific example based on the actual field data provided by Sun E P Co. is given. Part V reports on the Application of Vectorization and Microtasking for Reservoir Simulation. Part VI reports on Alkaline Simulation. The alkaline/surfactant/polymer flood compositional simulator (UTCHEM) reported last year is further extended to include reactions involving chemical species containing magnesium, aluminium and silicon as constituent elements. Part VII reports on permeability and trapping of microemulsion.« less

  8. Single-cell Transcriptome Study as Big Data

    PubMed Central

    Yu, Pingjian; Lin, Wei

    2016-01-01

    The rapid growth of single-cell RNA-seq studies (scRNA-seq) demands efficient data storage, processing, and analysis. Big-data technology provides a framework that facilitates the comprehensive discovery of biological signals from inter-institutional scRNA-seq datasets. The strategies to solve the stochastic and heterogeneous single-cell transcriptome signal are discussed in this article. After extensively reviewing the available big-data applications of next-generation sequencing (NGS)-based studies, we propose a workflow that accounts for the unique characteristics of scRNA-seq data and primary objectives of single-cell studies. PMID:26876720

  9. Organochlorine residues in females and nursing young of the big brown bat (Eptesicus fuscus)

    USGS Publications Warehouse

    Clark, D.R.; Lamont, T.G.

    1976-01-01

    Carcasses and brains of 18 big brown bats from Gaithersburg, Maryland, were analyzed for residues of organochlorine insecticides and PCB's. Eleven bats were adult females, and six of these had seven nursing young associated with them....Young bats resembled their parents in microgram amounts of PCB and DDE present in carcasses. However, concentrations of chemicals (expressed as ppm) were significantly higher in young. Brains of three young contained detectable residues of PCB and DDE....Younger adult females contained higher levels of PCB and DDE than did older ones. However, among the oldest females, amounts appeared to begin rising again. This pattern resembles that in free-tailed bats from Bracken Cave, Texas, but differs from the continuous linear decline seen in a Laurel, Maryland population of big brown bats, in which initial levels among younger females were higher than those in the Gaithersburg population....DDE was transferred from female to young more readily than was PCB by nursing. Five of 51 neonate big brown bats from the Laurel population were thought to have been born dead because of residues of PCB that were transferred across the placenta. Present data show that even greater amounts of PCB may be transferred to young by lactation and nursing.

  10. Hierarchy of the Collective Effects in Water Clusters.

    PubMed

    Bakó, Imre; Mayer, István

    2016-02-04

    The results of dipole moment as well as of intra- and intermolecular bond order calculations indicate the big importance of collective electrostatic effects caused by the nonimmediate environment in liquid water models. It is also discussed how these collective effects are built up as consequences of the electrostatic and quantum chemical interactions in water clusters.

  11. Biogeochemical controls on diel cycling of stable isotopes of dissolved 02 and dissolved inorganic carbon in the Big Hole River, Montana

    USGS Publications Warehouse

    Parker, Stephen R.; Poulson, Simon R.; Gammons, Christopher H.; DeGrandpre, Michael D.

    2005-01-01

    Rivers with high biological productivity typically show substantial increases in pH and dissolved oxygen (DO) concentration during the day and decreases at night, in response to changes in the relative rates of aquatic photosynthesis and respiration. These changes, coupled with temperature variations, may impart diel (24-h) fluctuations in the concentration of trace metals, nutrients, and other chemical species. A better understanding of diel processes in rivers is needed and will lead to improved methods of data collection for both monitoring and research purposes. Previous studies have used stable isotopes of dissolved oxygen (DO) and dissolved inorganic carbon (DIC) as tracers of geochemical and biological processes in streams, lakes, and marine systems. Although seasonal variation in δ18O of DO in rivers and lakes has been documented, no study has investigated diel changes in this parameter. Here, we demonstrate large (up to 13‰) cycles in δ18O-DO for two late summer sampling periods in the Big Hole River of southwest Montana and illustrate that these changes are correlated to variations in the DO concentration, the C-isotopic composition of DIC, and the primary productivity of the system. The magnitude of the diel cycle in δ18O-DO was greater in August versus September because of the longer photoperiod and warmer water temperatures. This study provides another biogeochemical tool for investigating the O2 and C budgets in rivers and may also be applicable to lake and groundwater systems.

  12. Towards a Cloud Based Smart Traffic Management Framework

    NASA Astrophysics Data System (ADS)

    Rahimi, M. M.; Hakimpour, F.

    2017-09-01

    Traffic big data has brought many opportunities for traffic management applications. However several challenges like heterogeneity, storage, management, processing and analysis of traffic big data may hinder their efficient and real-time applications. All these challenges call for well-adapted distributed framework for smart traffic management that can efficiently handle big traffic data integration, indexing, query processing, mining and analysis. In this paper, we present a novel, distributed, scalable and efficient framework for traffic management applications. The proposed cloud computing based framework can answer technical challenges for efficient and real-time storage, management, process and analyse of traffic big data. For evaluation of the framework, we have used OpenStreetMap (OSM) real trajectories and road network on a distributed environment. Our evaluation results indicate that speed of data importing to this framework exceeds 8000 records per second when the size of datasets is near to 5 million. We also evaluate performance of data retrieval in our proposed framework. The data retrieval speed exceeds 15000 records per second when the size of datasets is near to 5 million. We have also evaluated scalability and performance of our proposed framework using parallelisation of a critical pre-analysis in transportation applications. The results show that proposed framework achieves considerable performance and efficiency in traffic management applications.

  13. Properties of TiNi intermetallic compound industrially produced by combustion synthesis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kaieda, Yoshinari

    Most TiNi shape memory intermetallic compounds are conventionally produced by the process including high frequency induction vacuum melting and casting. A gravity segregation occurs in a cast TiNi ingot because of the big difference in the specific gravity between Ti and Ni. It is difficult to control accurately the phase transformation temperature of TiNi shape memory intermetallic compound produced by the conventional process, because the martensitic transformation temperature shifts by 10K due to the change in 0.1 % of Ni content. Homogeneous TiNi intermetallic compound is produced by the industrial process including combustion synthesis method, which is a newly developedmore » manufacturing process. In the new process, phase transformation temperatures of TiNi can be controlled accurately by controlling the ratio of Ti and Ni elemental starting powders. The chemical component, the impurities and the phase transformation temperatures of the TiNi products industrially produced by the process are revealed. These properties are vitally important when combustion synthesis method is applied to an industrial mass production process for producing TiNi shape memory intermetallic compounds. TiNi shape memory products are industrially and commercially produced today the industrial process including combustion synthesis. The total production weight in a year is 30 tins in 1994.« less

  14. Interpreting Evidence-of-Learning: Educational Research in the Era of Big Data

    ERIC Educational Resources Information Center

    Cope, Bill; Kalantzis, Mary

    2015-01-01

    In this article, we argue that big data can offer new opportunities and roles for educational researchers. In the traditional model of evidence-gathering and interpretation in education, researchers are independent observers, who pre-emptively create instruments of measurement, and insert these into the educational process in specialized times and…

  15. No Child Left Behind-The Implications for Educators.

    ERIC Educational Resources Information Center

    Serim, Ferdi

    2002-01-01

    Presents an interview with Mike Eisenberg, co-founder of Big6, that explains how the Big6 process for information-based problem-solving can be used to address the challenges educators face related to the No Child Left Behind legislation. Highlights include the need for teamwork between principals, library media specialists, teachers, and…

  16. Moving Every Child Ahead: The Big6 Success Strategy.

    ERIC Educational Resources Information Center

    Berkowitz, Bob; Serim, Ferdi

    2002-01-01

    Explains the Big6 approach to teaching information skills and describes its use in a high school social studies class to improve student test scores, teach them how to learn, and improve the teachers' skills. Highlights include the balance between content and process, formative and summative evaluation, assignment organizers, and study tips. (LRW)

  17. Michael Eisenberg and Robert Berkowitz's Big6[TM] Information Problem-Solving Model.

    ERIC Educational Resources Information Center

    Carey, James O.

    2003-01-01

    Reviews the Big6 information problem-solving model. Highlights include benefits and dangers of the simplicity of the model; theories of instruction; testing of the model; the model as a process for completing research projects; and advice for school library media specialists considering use of the model. (LRW)

  18. GraphStore: A Distributed Graph Storage System for Big Data Networks

    ERIC Educational Resources Information Center

    Martha, VenkataSwamy

    2013-01-01

    Networks, such as social networks, are a universal solution for modeling complex problems in real time, especially in the Big Data community. While previous studies have attempted to enhance network processing algorithms, none have paved a path for the development of a persistent storage system. The proposed solution, GraphStore, provides an…

  19. Evolution of the Air Toxics under the Big Sky Program

    ERIC Educational Resources Information Center

    Marra, Nancy; Vanek, Diana; Hester, Carolyn; Holian, Andrij; Ward, Tony; Adams, Earle; Knuth, Randy

    2011-01-01

    As a yearlong exploration of air quality and its relation to respiratory health, the "Air Toxics Under the Big Sky" program offers opportunities for students to learn and apply science process skills through self-designed inquiry-based research projects conducted within their communities. The program follows a systematic scope and sequence…

  20. Where do the Field Plots Belong? A Multiple-Constraint Sampling Design for the BigFoot Project

    NASA Astrophysics Data System (ADS)

    Kennedy, R. E.; Cohen, W. B.; Kirschbaum, A. A.; Gower, S. T.

    2002-12-01

    A key component of a MODIS validation project is effective characterization of biophysical measures on the ground. Fine-grain ecological field measurements must be placed strategically to capture variability at the scale of the MODIS imagery. Here we describe the BigFoot project's revised sampling scheme, designed to simultaneously meet three important goals: capture landscape variability, avoid spatial autocorrelation between field plots, and minimize time and expense of field sampling. A stochastic process places plots in clumped constellations to reduce field sampling costs, while minimizing spatial autocorrelation. This stochastic process is repeated, creating several hundred realizations of plot constellations. Each constellation is scored and ranked according to its ability to match landscape variability in several Landsat-based spectral indices, and its ability to minimize field sampling costs. We show how this approach has recently been used to place sample plots at the BigFoot project's two newest study areas, one in a desert system and one in a tundra system. We also contrast this sampling approach to that already used at the four prior BigFoot project sites.

  1. A practical guide to big data research in psychology.

    PubMed

    Chen, Eric Evan; Wojcik, Sean P

    2016-12-01

    The massive volume of data that now covers a wide variety of human behaviors offers researchers in psychology an unprecedented opportunity to conduct innovative theory- and data-driven field research. This article is a practical guide to conducting big data research, covering data management, acquisition, processing, and analytics (including key supervised and unsupervised learning data mining methods). It is accompanied by walkthrough tutorials on data acquisition, text analysis with latent Dirichlet allocation topic modeling, and classification with support vector machines. Big data practitioners in academia, industry, and the community have built a comprehensive base of tools and knowledge that makes big data research accessible to researchers in a broad range of fields. However, big data research does require knowledge of software programming and a different analytical mindset. For those willing to acquire the requisite skills, innovative analyses of unexpected or previously untapped data sources can offer fresh ways to develop, test, and extend theories. When conducted with care and respect, big data research can become an essential complement to traditional research. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  2. From big data to deep insight in developmental science.

    PubMed

    Gilmore, Rick O

    2016-01-01

    The use of the term 'big data' has grown substantially over the past several decades and is now widespread. In this review, I ask what makes data 'big' and what implications the size, density, or complexity of datasets have for the science of human development. A survey of existing datasets illustrates how existing large, complex, multilevel, and multimeasure data can reveal the complexities of developmental processes. At the same time, significant technical, policy, ethics, transparency, cultural, and conceptual issues associated with the use of big data must be addressed. Most big developmental science data are currently hard to find and cumbersome to access, the field lacks a culture of data sharing, and there is no consensus about who owns or should control research data. But, these barriers are dissolving. Developmental researchers are finding new ways to collect, manage, store, share, and enable others to reuse data. This promises a future in which big data can lead to deeper insights about some of the most profound questions in behavioral science. © 2016 The Authors. WIREs Cognitive Science published by Wiley Periodicals, Inc.

  3. How Does National Scientific Funding Support Emerging Interdisciplinary Research: A Comparison Study of Big Data Research in the US and China

    PubMed Central

    Huang, Ying; Zhang, Yi; Youtie, Jan; Porter, Alan L.; Wang, Xuefeng

    2016-01-01

    How do funding agencies ramp-up their capabilities to support research in a rapidly emerging area? This paper addresses this question through a comparison of research proposals awarded by the US National Science Foundation (NSF) and the National Natural Science Foundation of China (NSFC) in the field of Big Data. Big data is characterized by its size and difficulties in capturing, curating, managing and processing it in reasonable periods of time. Although Big Data has its legacy in longstanding information technology research, the field grew very rapidly over a short period. We find that the extent of interdisciplinarity is a key aspect in how these funding agencies address the rise of Big Data. Our results show that both agencies have been able to marshal funding to support Big Data research in multiple areas, but the NSF relies to a greater extent on multi-program funding from different fields. We discuss how these interdisciplinary approaches reflect the research hot-spots and innovation pathways in these two countries. PMID:27219466

  4. Translating Big Data into Smart Data for Veterinary Epidemiology

    PubMed Central

    VanderWaal, Kimberly; Morrison, Robert B.; Neuhauser, Claudia; Vilalta, Carles; Perez, Andres M.

    2017-01-01

    The increasing availability and complexity of data has led to new opportunities and challenges in veterinary epidemiology around how to translate abundant, diverse, and rapidly growing “big” data into meaningful insights for animal health. Big data analytics are used to understand health risks and minimize the impact of adverse animal health issues through identifying high-risk populations, combining data or processes acting at multiple scales through epidemiological modeling approaches, and harnessing high velocity data to monitor animal health trends and detect emerging health threats. The advent of big data requires the incorporation of new skills into veterinary epidemiology training, including, for example, machine learning and coding, to prepare a new generation of scientists and practitioners to engage with big data. Establishing pipelines to analyze big data in near real-time is the next step for progressing from simply having “big data” to create “smart data,” with the objective of improving understanding of health risks, effectiveness of management and policy decisions, and ultimately preventing or at least minimizing the impact of adverse animal health issues. PMID:28770216

  5. Advanced Research and Data Methods in Women's Health: Big Data Analytics, Adaptive Studies, and the Road Ahead.

    PubMed

    Macedonia, Christian R; Johnson, Clark T; Rajapakse, Indika

    2017-02-01

    Technical advances in science have had broad implications in reproductive and women's health care. Recent innovations in population-level data collection and storage have made available an unprecedented amount of data for analysis while computational technology has evolved to permit processing of data previously thought too dense to study. "Big data" is a term used to describe data that are a combination of dramatically greater volume, complexity, and scale. The number of variables in typical big data research can readily be in the thousands, challenging the limits of traditional research methodologies. Regardless of what it is called, advanced data methods, predictive analytics, or big data, this unprecedented revolution in scientific exploration has the potential to dramatically assist research in obstetrics and gynecology broadly across subject matter. Before implementation of big data research methodologies, however, potential researchers and reviewers should be aware of strengths, strategies, study design methods, and potential pitfalls. Examination of big data research examples contained in this article provides insight into the potential and the limitations of this data science revolution and practical pathways for its useful implementation.

  6. How Does National Scientific Funding Support Emerging Interdisciplinary Research: A Comparison Study of Big Data Research in the US and China.

    PubMed

    Huang, Ying; Zhang, Yi; Youtie, Jan; Porter, Alan L; Wang, Xuefeng

    2016-01-01

    How do funding agencies ramp-up their capabilities to support research in a rapidly emerging area? This paper addresses this question through a comparison of research proposals awarded by the US National Science Foundation (NSF) and the National Natural Science Foundation of China (NSFC) in the field of Big Data. Big data is characterized by its size and difficulties in capturing, curating, managing and processing it in reasonable periods of time. Although Big Data has its legacy in longstanding information technology research, the field grew very rapidly over a short period. We find that the extent of interdisciplinarity is a key aspect in how these funding agencies address the rise of Big Data. Our results show that both agencies have been able to marshal funding to support Big Data research in multiple areas, but the NSF relies to a greater extent on multi-program funding from different fields. We discuss how these interdisciplinary approaches reflect the research hot-spots and innovation pathways in these two countries.

  7. Big Data Approaches for the Analysis of Large-Scale fMRI Data Using Apache Spark and GPU Processing: A Demonstration on Resting-State fMRI Data from the Human Connectome Project

    PubMed Central

    Boubela, Roland N.; Kalcher, Klaudius; Huf, Wolfgang; Našel, Christian; Moser, Ewald

    2016-01-01

    Technologies for scalable analysis of very large datasets have emerged in the domain of internet computing, but are still rarely used in neuroimaging despite the existence of data and research questions in need of efficient computation tools especially in fMRI. In this work, we present software tools for the application of Apache Spark and Graphics Processing Units (GPUs) to neuroimaging datasets, in particular providing distributed file input for 4D NIfTI fMRI datasets in Scala for use in an Apache Spark environment. Examples for using this Big Data platform in graph analysis of fMRI datasets are shown to illustrate how processing pipelines employing it can be developed. With more tools for the convenient integration of neuroimaging file formats and typical processing steps, big data technologies could find wider endorsement in the community, leading to a range of potentially useful applications especially in view of the current collaborative creation of a wealth of large data repositories including thousands of individual fMRI datasets. PMID:26778951

  8. Identifying key climate and environmental factors affecting rates of post-fire big sagebrush (Artemisia tridentata) recovery in the northern Columbia Basin, USA

    USGS Publications Warehouse

    Shinneman, Douglas; McIlroy, Susan

    2016-01-01

    Sagebrush steppe of North America is considered highly imperilled, in part owing to increased fire frequency. Sagebrush ecosystems support numerous species, and it is important to understand those factors that affect rates of post-fire sagebrush recovery. We explored recovery of Wyoming big sagebrush (Artemisia tridentata ssp.wyomingensis) and basin big sagebrush (A. tridentata ssp. tridentata) communities following fire in the northern Columbia Basin (Washington, USA). We sampled plots across 16 fires that burned in big sagebrush communities from 5 to 28 years ago, and also sampled nearby unburned locations. Mixed-effects models demonstrated that density of large–mature big sagebrush plants and percentage cover of big sagebrush were higher with time since fire and in plots with more precipitation during the winter immediately following fire, but were lower when precipitation the next winter was higher than average, especially on soils with higher available water supply, and with greater post-fire mortality of mature big sagebrush plants. Bunchgrass cover 5 to 28 years after fire was predicted to be lower with higher cover of both shrubs and non-native herbaceous species, and only slightly higher with time. Post-fire recovery of big sagebrush in the northern Columbia Basin is a slow process that may require several decades on average, but faster recovery rates may occur under specific site and climate conditions.

  9. Empowering Personalized Medicine with Big Data and Semantic Web Technology: Promises, Challenges, and Use Cases.

    PubMed

    Panahiazar, Maryam; Taslimitehrani, Vahid; Jadhav, Ashutosh; Pathak, Jyotishman

    2014-10-01

    In healthcare, big data tools and technologies have the potential to create significant value by improving outcomes while lowering costs for each individual patient. Diagnostic images, genetic test results and biometric information are increasingly generated and stored in electronic health records presenting us with challenges in data that is by nature high volume, variety and velocity, thereby necessitating novel ways to store, manage and process big data. This presents an urgent need to develop new, scalable and expandable big data infrastructure and analytical methods that can enable healthcare providers access knowledge for the individual patient, yielding better decisions and outcomes. In this paper, we briefly discuss the nature of big data and the role of semantic web and data analysis for generating "smart data" which offer actionable information that supports better decision for personalized medicine. In our view, the biggest challenge is to create a system that makes big data robust and smart for healthcare providers and patients that can lead to more effective clinical decision-making, improved health outcomes, and ultimately, managing the healthcare costs. We highlight some of the challenges in using big data and propose the need for a semantic data-driven environment to address them. We illustrate our vision with practical use cases, and discuss a path for empowering personalized medicine using big data and semantic web technology.

  10. The Need for a Definition of Big Data for Nursing Science: A Case Study of Disaster Preparedness.

    PubMed

    Wong, Ho Ting; Chiang, Vico Chung Lim; Choi, Kup Sze; Loke, Alice Yuen

    2016-10-17

    The rapid development of technology has made enormous volumes of data available and achievable anytime and anywhere around the world. Data scientists call this change a data era and have introduced the term "Big Data", which has drawn the attention of nursing scholars. Nevertheless, the concept of Big Data is quite fuzzy and there is no agreement on its definition among researchers of different disciplines. Without a clear consensus on this issue, nursing scholars who are relatively new to the concept may consider Big Data to be merely a dataset of a bigger size. Having a suitable definition for nurse researchers in their context of research and practice is essential for the advancement of nursing research. In view of the need for a better understanding on what Big Data is, the aim in this paper is to explore and discuss the concept. Furthermore, an example of a Big Data research study on disaster nursing preparedness involving six million patient records is used for discussion. The example demonstrates that a Big Data analysis can be conducted from many more perspectives than would be possible in traditional sampling, and is superior to traditional sampling. Experience gained from the process of using Big Data in this study will shed light on future opportunities for conducting evidence-based nursing research to achieve competence in disaster nursing.

  11. Big Data and Clinicians: A Review on the State of the Science

    PubMed Central

    Wang, Weiqi

    2014-01-01

    Background In the past few decades, medically related data collection saw a huge increase, referred to as big data. These huge datasets bring challenges in storage, processing, and analysis. In clinical medicine, big data is expected to play an important role in identifying causality of patient symptoms, in predicting hazards of disease incidence or reoccurrence, and in improving primary-care quality. Objective The objective of this review was to provide an overview of the features of clinical big data, describe a few commonly employed computational algorithms, statistical methods, and software toolkits for data manipulation and analysis, and discuss the challenges and limitations in this realm. Methods We conducted a literature review to identify studies on big data in medicine, especially clinical medicine. We used different combinations of keywords to search PubMed, Science Direct, Web of Knowledge, and Google Scholar for literature of interest from the past 10 years. Results This paper reviewed studies that analyzed clinical big data and discussed issues related to storage and analysis of this type of data. Conclusions Big data is becoming a common feature of biological and clinical studies. Researchers who use clinical big data face multiple challenges, and the data itself has limitations. It is imperative that methodologies for data analysis keep pace with our ability to collect and store data. PMID:25600256

  12. The Need for a Definition of Big Data for Nursing Science: A Case Study of Disaster Preparedness

    PubMed Central

    Wong, Ho Ting; Chiang, Vico Chung Lim; Choi, Kup Sze; Loke, Alice Yuen

    2016-01-01

    The rapid development of technology has made enormous volumes of data available and achievable anytime and anywhere around the world. Data scientists call this change a data era and have introduced the term “Big Data”, which has drawn the attention of nursing scholars. Nevertheless, the concept of Big Data is quite fuzzy and there is no agreement on its definition among researchers of different disciplines. Without a clear consensus on this issue, nursing scholars who are relatively new to the concept may consider Big Data to be merely a dataset of a bigger size. Having a suitable definition for nurse researchers in their context of research and practice is essential for the advancement of nursing research. In view of the need for a better understanding on what Big Data is, the aim in this paper is to explore and discuss the concept. Furthermore, an example of a Big Data research study on disaster nursing preparedness involving six million patient records is used for discussion. The example demonstrates that a Big Data analysis can be conducted from many more perspectives than would be possible in traditional sampling, and is superior to traditional sampling. Experience gained from the process of using Big Data in this study will shed light on future opportunities for conducting evidence-based nursing research to achieve competence in disaster nursing. PMID:27763525

  13. Chemical Resistance of Ornamental Compound Stone Produced with Marble Waste and Unsaturated Polyester

    NASA Astrophysics Data System (ADS)

    Ribeiro, Carlos E. Gomes; Rodriguez, Rubén J. Sánchez; Vieira, Carlos M. Fontes

    Ornamental compound stone are produced by industry for decades, however, few published studies describe these materials. Brazil has many deposits of stone wastes and a big potential to produce these materials. This work aims to evaluate the chemical resistance of ornamental compound stones produced with marble waste and unsaturated polyester. An adaptation of Annex H of ABNT NBR 13818:97 standard, with reagents commonly used in household products, was used. The results were compared with those obtained for natural stone used in composite production.

  14. Ground-water appraisal in northwestern Big Stone County, west-central Minnesota

    USGS Publications Warehouse

    Soukup, W.G.

    1980-01-01

    Samples of water were collected for chemical analysis from wells in the surficial outwash, buried outwash, and Cretaceous aquifers. With the exception of nitrate, the greatest difference in chemical quality was between samples from the buried outwash and Cretaceous aquifers. Water from Cretaceous aquifers is softer than water from the outwash aquifers, but contains concentrations of sodium and boron that are high enough to damage soils and crops if used for irrigation. Nitrate concentrations exceeded the Minnesota Pollution Control Agency's recommended limits for drinking water in one sample from the surficial aquifer.

  15. Fluid Enhanced Deformation and Metamorphism in Exhumed Lower Crust from the Northern Madison Range, Southwestern Montana, USA

    NASA Astrophysics Data System (ADS)

    Condit, Cailey Brown

    Deep crustal processes during collisional orogenesis exert first-order controls on the development, scale and behavior of an orogenic belt. The presence or absence of fluids play important roles in these processes by enhancing deformation, catalyzing chemical reactions, and facilitating wholesale alteration of lithologic properties. However, the scales over which these fluid-related interactions occur and the specific feedbacks among them remain poorly constrained. The late Paleoproterozoic Big Sky orogen, expressed as high-grade deep crust exposed in the Laramide basement-cored uplifts of SW Montana, USA, offers an exceptional natural laboratory to address some of these questions. New data are presented from field and structural analysis, petrology, geochemistry, and geochronology in the Northern Madison Range, a key locality for constraining the hinterland-foreland transition of the orogen. Combined with other regional data, the age of high-grade metamorphism youngs by 80-40 Myr across an 100 km transect suggesting propagation of the orogenic core towards its foreland over time. In the southeastern part of the Northern Madison Range, two domains separated by a km-scale ductile shear zone, were transformed by hydrous fluids at significantly different spatial scales. The Gallatin Peak terrane was widely metamorphosed, metasomatized, and penetratively deformed in the presence of fluids at upper amphibolite facies during the Big Sky orogeny. Together, these data suggest that this area was pervasively hydrated and deformed over scales of several kilometers during thermotectonism at 30-25 km paleodepths. In the Moon Lake block, fluid flow at similar crustal depths and temperatures played a more localized but equally important role. Discrete flow along brittle fractures in metagabbronorite dikes led to nucleation of cm-scale ductile shear zones and metasomatic alteration. A model for shear zone evolution is presented that requires feedbacks between mechanical and chemical processes for strain localization. Seismic anisotropy was calculated for one of these shear zones. Deformation-induced crystallographic preferred orientation (CPO) of anisotropic minerals typically produces seismic anisotropy in the deep crust. However, this shear zone deformed by mechanisms that yielded no significant CPO, in part due to the fluid-rich environment, and very low seismic anisotropy, suggesting that high anisotropy does not always correlate with high strain.

  16. Real-time analysis of healthcare using big data analytics

    NASA Astrophysics Data System (ADS)

    Basco, J. Antony; Senthilkumar, N. C.

    2017-11-01

    Big Data Analytics (BDA) provides a tremendous advantage where there is a need of revolutionary performance in handling large amount of data that covers 4 characteristics such as Volume Velocity Variety Veracity. BDA has the ability to handle such dynamic data providing functioning effectiveness and exceptionally beneficial output in several day to day applications for various organizations. Healthcare is one of the sectors which generate data constantly covering all four characteristics with outstanding growth. There are several challenges in processing patient records which deals with variety of structured and unstructured format. Inducing BDA in to Healthcare (HBDA) will deal with sensitive patient driven information mostly in unstructured format comprising of prescriptions, reports, data from imaging system, etc., the challenges will be overcome by big data with enhanced efficiency in fetching and storing of data. In this project, dataset alike Electronic Medical Records (EMR) produced from numerous medical devices and mobile applications will be induced into MongoDB using Hadoop framework with Improvised processing technique to improve outcome of processing patient records.

  17. Biometric Attendance and Big Data Analysis for Optimizing Work Processes.

    PubMed

    Verma, Neetu; Xavier, Teenu; Agrawal, Deepak

    2016-01-01

    Although biometric attendance management is available, large healthcare organizations have difficulty in big data analysis for optimization of work processes. The aim of this project was to assess the implementation of a biometric attendance system and its utility following big data analysis. In this prospective study the implementation of biometric system was evaluated over 3 month period at our institution. Software integration with other existing systems for data analysis was also evaluated. Implementation of the biometric system could be successfully done over a two month period with enrollment of 10,000 employees into the system. However generating reports and taking action this large number of staff was a challenge. For this purpose software was made for capturing the duty roster of each employee and integrating it with the biometric system and adding an SMS gateway. This helped in automating the process of sending SMSs to each employee who had not signed in. Standalone biometric systems have limited functionality in large organizations unless it is meshed with employee duty roster.

  18. Use Cases for Combining Web Services with ArcPython Tools for Enabling Quality Control of Land Remote Sensing Data Products.

    NASA Astrophysics Data System (ADS)

    Krehbiel, C.; Maiersperger, T.; Friesz, A.; Harriman, L.; Quenzer, R.; Impecoven, K.

    2016-12-01

    Three major obstacles facing big Earth data users include data storage, management, and analysis. As the amount of satellite remote sensing data increases, so does the need for better data storage and management strategies to exploit the plethora of data now available. Standard GIS tools can help big Earth data users whom interact with and analyze increasingly large and diverse datasets. In this presentation we highlight how NASA's Land Processes Distributed Active Archive Center (LP DAAC) is tackling these big Earth data challenges. We provide a real life use case example to describe three tools and services provided by the LP DAAC to more efficiently exploit big Earth data in a GIS environment. First, we describe the Open-source Project for a Network Data Access Protocol (OPeNDAP), which calls to specific data, minimizing the amount of data that a user downloads and improves the efficiency of data downloading and processing. Next, we cover the LP DAAC's Application for Extracting and Exploring Analysis Ready Samples (AppEEARS), a web application interface for extracting and analyzing land remote sensing data. From there, we review an ArcPython toolbox that was developed to provide quality control services to land remote sensing data products. Locating and extracting specific subsets of larger big Earth datasets improves data storage and management efficiency for the end user, and quality control services provides a straightforward interpretation of big Earth data. These tools and services are beneficial to the GIS user community in terms of standardizing workflows and improving data storage, management, and analysis tactics.

  19. Implementing Operational Analytics using Big Data Technologies to Detect and Predict Sensor Anomalies

    NASA Astrophysics Data System (ADS)

    Coughlin, J.; Mital, R.; Nittur, S.; SanNicolas, B.; Wolf, C.; Jusufi, R.

    2016-09-01

    Operational analytics when combined with Big Data technologies and predictive techniques have been shown to be valuable in detecting mission critical sensor anomalies that might be missed by conventional analytical techniques. Our approach helps analysts and leaders make informed and rapid decisions by analyzing large volumes of complex data in near real-time and presenting it in a manner that facilitates decision making. It provides cost savings by being able to alert and predict when sensor degradations pass a critical threshold and impact mission operations. Operational analytics, which uses Big Data tools and technologies, can process very large data sets containing a variety of data types to uncover hidden patterns, unknown correlations, and other relevant information. When combined with predictive techniques, it provides a mechanism to monitor and visualize these data sets and provide insight into degradations encountered in large sensor systems such as the space surveillance network. In this study, data from a notional sensor is simulated and we use big data technologies, predictive algorithms and operational analytics to process the data and predict sensor degradations. This study uses data products that would commonly be analyzed at a site. This study builds on a big data architecture that has previously been proven valuable in detecting anomalies. This paper outlines our methodology of implementing an operational analytic solution through data discovery, learning and training of data modeling and predictive techniques, and deployment. Through this methodology, we implement a functional architecture focused on exploring available big data sets and determine practical analytic, visualization, and predictive technologies.

  20. Big Data and medicine: a big deal?

    PubMed

    Mayer-Schönberger, V; Ingelsson, E

    2018-05-01

    Big Data promises huge benefits for medical research. Looking beyond superficial increases in the amount of data collected, we identify three key areas where Big Data differs from conventional analyses of data samples: (i) data are captured more comprehensively relative to the phenomenon under study; this reduces some bias but surfaces important trade-offs, such as between data quantity and data quality; (ii) data are often analysed using machine learning tools, such as neural networks rather than conventional statistical methods resulting in systems that over time capture insights implicit in data, but remain black boxes, rarely revealing causal connections; and (iii) the purpose of the analyses of data is no longer simply answering existing questions, but hinting at novel ones and generating promising new hypotheses. As a consequence, when performed right, Big Data analyses can accelerate research. Because Big Data approaches differ so fundamentally from small data ones, research structures, processes and mindsets need to adjust. The latent value of data is being reaped through repeated reuse of data, which runs counter to existing practices not only regarding data privacy, but data management more generally. Consequently, we suggest a number of adjustments such as boards reviewing responsible data use, and incentives to facilitate comprehensive data sharing. As data's role changes to a resource of insight, we also need to acknowledge the importance of collecting and making data available as a crucial part of our research endeavours, and reassess our formal processes from career advancement to treatment approval. © 2017 The Association for the Publication of the Journal of Internal Medicine.

  1. Wisdom within: unlocking the potential of big data for nursing regulators.

    PubMed

    Blumer, L; Giblin, C; Lemermeyer, G; Kwan, J A

    2017-03-01

    This paper explores the potential for incorporating big data in nursing regulators' decision-making and policy development. Big data, commonly described as the extensive volume of information that individuals and agencies generate daily, is a concept familiar to the business community but is only beginning to be explored by the public sector. Using insights gained from a recent research project, the College and Association of Registered Nurses of Alberta, in Canada is creating an organizational culture of data-driven decision-making throughout its regulatory and professional functions. The goal is to enable the organization to respond quickly and profoundly to nursing issues in a rapidly changing healthcare environment. The evidence includes a review of the Learning from Experience: Improving the Process of Internationally Educated Nurses' Applications for Registration (LFE) research project (2011-2016), combined with a literature review on data-driven decision-making within nursing and healthcare settings, and the incorporation of big data in the private and public sectors, primarily in North America. This paper discusses experience and, more broadly, how data can enhance the rigour and integrity of nursing and health policy. Nursing regulatory bodies have access to extensive data, and the opportunity to use these data to inform decision-making and policy development by investing in how it is captured, analysed and incorporated into decision-making processes. Understanding and using big data is a critical part of developing relevant, sound and credible policy. Rigorous collection and analysis of big data supports the integrity of the evidence used by nurse regulators in developing nursing and health policy. © 2016 International Council of Nurses.

  2. [Determination of proximal chemical composition of squid (dosidicus gigas) and development of gel products].

    PubMed

    Abugoch, L; Guarda, A; Pérez, L M; Paredes, M P

    1999-06-01

    The good nutritional properties of meat from big squid (Dosidicus gigas) living on the Chilean coast, was determined through its proximal composition 70 cal/100 g fresh meat; 82.23 +/- 0.98% moisture; 15.32 +/- 0.93% protein; 1.31 +/- 0.12% ashes; 0.87 +/- 0.18% fat and 0.27% NNE (non-nitrogen extract). The big squid meat was used to develop a gel product which contained NaCl and TPP. It was necessary to use additives for gel preparation, such as carragenin or alginate or egg albumin, due to the lack of gelation properties of squid meat. Formulations containing egg albumin showed the highest gel force measured by penetration as compared to those that contained carragenin or alginate.

  3. Big data challenges for large radio arrays

    NASA Astrophysics Data System (ADS)

    Jones, D. L.; Wagstaff, K.; Thompson, D. R.; D'Addario, L.; Navarro, R.; Mattmann, C.; Majid, W.; Lazio, J.; Preston, J.; Rebbapragada, U.

    2012-03-01

    Future large radio astronomy arrays, particularly the Square Kilometre Array (SKA), will be able to generate data at rates far higher than can be analyzed or stored affordably with current practices. This is, by definition, a "big data" problem, and requires an end-to-end solution if future radio arrays are to reach their full scientific potential. Similar data processing, transport, storage, and management challenges face next-generation facilities in many other fields. The Jet Propulsion Laboratory is developing technologies to address big data issues, with an emphasis in three areas: 1) Lower-power digital processing architectures to make highvolume data generation operationally affordable, 2) Date-adaptive machine learning algorithms for real-time analysis (or "data triage") of large data volumes, and 3) Scalable data archive systems that allow efficient data mining and remote user code to run locally where the data are stored.

  4. A peek into the future of radiology using big data applications.

    PubMed

    Kharat, Amit T; Singhal, Shubham

    2017-01-01

    Big data is extremely large amount of data which is available in the radiology department. Big data is identified by four Vs - Volume, Velocity, Variety, and Veracity. By applying different algorithmic tools and converting raw data to transformed data in such large datasets, there is a possibility of understanding and using radiology data for gaining new knowledge and insights. Big data analytics consists of 6Cs - Connection, Cloud, Cyber, Content, Community, and Customization. The global technological prowess and per-capita capacity to save digital information has roughly doubled every 40 months since the 1980's. By using big data, the planning and implementation of radiological procedures in radiology departments can be given a great boost. Potential applications of big data in the future are scheduling of scans, creating patient-specific personalized scanning protocols, radiologist decision support, emergency reporting, virtual quality assurance for the radiologist, etc. Targeted use of big data applications can be done for images by supporting the analytic process. Screening software tools designed on big data can be used to highlight a region of interest, such as subtle changes in parenchymal density, solitary pulmonary nodule, or focal hepatic lesions, by plotting its multidimensional anatomy. Following this, we can run more complex applications such as three-dimensional multi planar reconstructions (MPR), volumetric rendering (VR), and curved planar reconstruction, which consume higher system resources on targeted data subsets rather than querying the complete cross-sectional imaging dataset. This pre-emptive selection of dataset can substantially reduce the system requirements such as system memory, server load and provide prompt results. However, a word of caution, "big data should not become "dump data" due to inadequate and poor analysis and non-structured improperly stored data. In the near future, big data can ring in the era of personalized and individualized healthcare.

  5. From Big Data to Smart Data for Pharmacovigilance: The Role of Healthcare Databases and Other Emerging Sources.

    PubMed

    Trifirò, Gianluca; Sultana, Janet; Bate, Andrew

    2018-02-01

    In the last decade 'big data' has become a buzzword used in several industrial sectors, including but not limited to telephony, finance and healthcare. Despite its popularity, it is not always clear what big data refers to exactly. Big data has become a very popular topic in healthcare, where the term primarily refers to the vast and growing volumes of computerized medical information available in the form of electronic health records, administrative or health claims data, disease and drug monitoring registries and so on. This kind of data is generally collected routinely during administrative processes and clinical practice by different healthcare professionals: from doctors recording their patients' medical history, drug prescriptions or medical claims to pharmacists registering dispensed prescriptions. For a long time, this data accumulated without its value being fully recognized and leveraged. Today big data has an important place in healthcare, including in pharmacovigilance. The expanding role of big data in pharmacovigilance includes signal detection, substantiation and validation of drug or vaccine safety signals, and increasingly new sources of information such as social media are also being considered. The aim of the present paper is to discuss the uses of big data for drug safety post-marketing assessment.

  6. Raman hyperspectral imaging of iron transport across membranes in cells

    NASA Astrophysics Data System (ADS)

    Das, Anupam; Costa, Xavier Felipe; Khmaladze, Alexander; Barroso, Margarida; Sharikova, Anna

    2016-09-01

    Raman scattering microscopy is a powerful imaging technique used to identify chemical composition, structural and conformational state of molecules of complex samples in biology, biophysics, medicine and materials science. In this work, we have shown that Raman techniques allow the measurement of the iron content in protein mixtures and cells. Since the mechanisms of iron acquisition, storage, and excretion by cells are not completely understood, improved knowledge of iron metabolism can offer insight into many diseases in which iron plays a role in the pathogenic process, such as diabetes, neurodegenerative diseases, cancer, and metabolic syndrome. Understanding of the processes involved in cellular iron metabolism will improve our knowledge of cell functioning. It will also have a big impact on treatment of diseases caused by iron deficiency (anemias) and iron overload (hereditary hemochromatosis). Previously, Raman studies have shown substantial differences in spectra of transferrin with and without bound iron, thus proving that it is an appropriate technique to determine the levels of bound iron in the protein mixture. We have extended these studies to obtain hyperspectral images of transferrin in cells. By employing a Raman scanning microscope together with spectral detection by a highly sensitive back-illuminated cooled CCD camera, we were able to rapidly acquire and process images of fixed cells with chemical selectivity. We discuss and compare various methods of hyperspectral Raman image analysis and demonstrate the use of these methods to characterize cellular iron content without the need for dye labeling.

  7. A Systematic Literature Mapping of Risk Analysis of Big Data in Cloud Computing Environment

    NASA Astrophysics Data System (ADS)

    Bee Yusof Ali, Hazirah; Marziana Abdullah, Lili; Kartiwi, Mira; Nordin, Azlin; Salleh, Norsaremah; Sham Awang Abu Bakar, Normi

    2018-05-01

    This paper investigates previous literature that focusses on the three elements: risk assessment, big data and cloud. We use a systematic literature mapping method to search for journals and proceedings. The systematic literature mapping process is utilized to get a properly screened and focused literature. With the help of inclusion and exclusion criteria, the search of literature is further narrowed. Classification helps us in grouping the literature into categories. At the end of the mapping, gaps can be seen. The gap is where our focus should be in analysing risk of big data in cloud computing environment. Thus, a framework of how to assess the risk of security, privacy and trust associated with big data and cloud computing environment is highly needed.

  8. A study on specialist or special disease clinics based on big data.

    PubMed

    Fang, Zhuyuan; Fan, Xiaowei; Chen, Gong

    2014-09-01

    Correlation analysis and processing of massive medical information can be implemented through big data technology to find the relevance of different factors in the life cycle of a disease and to provide the basis for scientific research and clinical practice. This paper explores the concept of constructing a big medical data platform and introduces the clinical model construction. Medical data can be collected and consolidated by distributed computing technology. Through analysis technology, such as artificial neural network and grey model, a medical model can be built. Big data analysis, such as Hadoop, can be used to construct early prediction and intervention models as well as clinical decision-making model for specialist and special disease clinics. It establishes a new model for common clinical research for specialist and special disease clinics.

  9. Embracing Big Data in Complex Educational Systems: The Learning Analytics Imperative and the Policy Challenge

    ERIC Educational Resources Information Center

    Macfadyen, Leah P.; Dawson, Shane; Pardo, Abelardo; Gaševic, Dragan

    2014-01-01

    In the new era of big educational data, learning analytics (LA) offer the possibility of implementing real-time assessment and feedback systems and processes at scale that are focused on improvement of learning, development of self-regulated learning skills, and student success. However, to realize this promise, the necessary shifts in the…

  10. Beyond simple charts: Design of visualizations for big health data

    PubMed Central

    Ola, Oluwakemi; Sedig, Kamran

    2016-01-01

    Health data is often big data due to its high volume, low veracity, great variety, and high velocity. Big health data has the potential to improve productivity, eliminate waste, and support a broad range of tasks related to disease surveillance, patient care, research, and population health management. Interactive visualizations have the potential to amplify big data’s utilization. Visualizations can be used to support a variety of tasks, such as tracking the geographic distribution of diseases, analyzing the prevalence of disease, triaging medical records, predicting outbreaks, and discovering at-risk populations. Currently, many health visualization tools use simple charts, such as bar charts and scatter plots, that only represent few facets of data. These tools, while beneficial for simple perceptual and cognitive tasks, are ineffective when dealing with more complex sensemaking tasks that involve exploration of various facets and elements of big data simultaneously. There is need for sophisticated and elaborate visualizations that encode many facets of data and support human-data interaction with big data and more complex tasks. When not approached systematically, design of such visualizations is labor-intensive, and the resulting designs may not facilitate big-data-driven tasks. Conceptual frameworks that guide the design of visualizations for big data can make the design process more manageable and result in more effective visualizations. In this paper, we demonstrate how a framework-based approach can help designers create novel, elaborate, non-trivial visualizations for big health data. We present four visualizations that are components of a larger tool for making sense of large-scale public health data. PMID:28210416

  11. Beyond simple charts: Design of visualizations for big health data.

    PubMed

    Ola, Oluwakemi; Sedig, Kamran

    2016-01-01

    Health data is often big data due to its high volume, low veracity, great variety, and high velocity. Big health data has the potential to improve productivity, eliminate waste, and support a broad range of tasks related to disease surveillance, patient care, research, and population health management. Interactive visualizations have the potential to amplify big data's utilization. Visualizations can be used to support a variety of tasks, such as tracking the geographic distribution of diseases, analyzing the prevalence of disease, triaging medical records, predicting outbreaks, and discovering at-risk populations. Currently, many health visualization tools use simple charts, such as bar charts and scatter plots, that only represent few facets of data. These tools, while beneficial for simple perceptual and cognitive tasks, are ineffective when dealing with more complex sensemaking tasks that involve exploration of various facets and elements of big data simultaneously. There is need for sophisticated and elaborate visualizations that encode many facets of data and support human-data interaction with big data and more complex tasks. When not approached systematically, design of such visualizations is labor-intensive, and the resulting designs may not facilitate big-data-driven tasks. Conceptual frameworks that guide the design of visualizations for big data can make the design process more manageable and result in more effective visualizations. In this paper, we demonstrate how a framework-based approach can help designers create novel, elaborate, non-trivial visualizations for big health data. We present four visualizations that are components of a larger tool for making sense of large-scale public health data.

  12. Mutation Analysis in Cultured Cells of Transgenic Rodents

    PubMed Central

    Zheng, Albert; Bates, Steven E.; Tommasi, Stella

    2018-01-01

    To comply with guiding principles for the ethical use of animals for experimental research, the field of mutation research has witnessed a shift of interest from large-scale in vivo animal experiments to small-sized in vitro studies. Mutation assays in cultured cells of transgenic rodents constitute, in many ways, viable alternatives to in vivo mutagenicity experiments in the corresponding animals. A variety of transgenic rodent cell culture models and mutation detection systems have been developed for mutagenicity testing of carcinogens. Of these, transgenic Big Blue® (Stratagene Corp., La Jolla, CA, USA, acquired by Agilent Technologies Inc., Santa Clara, CA, USA, BioReliance/Sigma-Aldrich Corp., Darmstadt, Germany) mouse embryonic fibroblasts and the λ Select cII Mutation Detection System have been used by many research groups to investigate the mutagenic effects of a wide range of chemical and/or physical carcinogens. Here, we review techniques and principles involved in preparation and culturing of Big Blue® mouse embryonic fibroblasts, treatment in vitro with chemical/physical agent(s) of interest, determination of the cII mutant frequency by the λ Select cII assay and establishment of the mutation spectrum by DNA sequencing. We describe various approaches for data analysis and interpretation of the results. Furthermore, we highlight representative studies in which the Big Blue® mouse cell culture model and the λ Select cII assay have been used for mutagenicity testing of diverse carcinogens. We delineate the advantages of this approach and discuss its limitations, while underscoring auxiliary methods, where applicable. PMID:29337872

  13. [Big Data- challenges and risks].

    PubMed

    Krauß, Manuela; Tóth, Tamás; Hanika, Heinrich; Kozlovszky, Miklós; Dinya, Elek

    2015-12-06

    The term "Big Data" is commonly used to describe the growing mass of information being created recently. New conclusions can be drawn and new services can be developed by the connection, processing and analysis of these information. This affects all aspects of life, including health and medicine. The authors review the application areas of Big Data, and present examples from health and other areas. However, there are several preconditions of the effective use of the opportunities: proper infrastructure, well defined regulatory environment with particular emphasis on data protection and privacy. These issues and the current actions for solution are also presented.

  14. Big data mining: In-database Oracle data mining over hadoop

    NASA Astrophysics Data System (ADS)

    Kovacheva, Zlatinka; Naydenova, Ina; Kaloyanova, Kalinka; Markov, Krasimir

    2017-07-01

    Big data challenges different aspects of storing, processing and managing data, as well as analyzing and using data for business purposes. Applying Data Mining methods over Big Data is another challenge because of huge data volumes, variety of information, and the dynamic of the sources. Different applications are made in this area, but their successful usage depends on understanding many specific parameters. In this paper we present several opportunities for using Data Mining techniques provided by the analytical engine of RDBMS Oracle over data stored in Hadoop Distributed File System (HDFS). Some experimental results are given and they are discussed.

  15. Integrating continental-scale ecological data into university courses: Developing NEON's Online Learning Portal

    NASA Astrophysics Data System (ADS)

    Wasser, L. A.; Gram, W.; Lunch, C. K.; Petroy, S. B.; Elmendorf, S.

    2013-12-01

    'Big Data' are becoming increasingly common in many fields. The National Ecological Observatory Network (NEON) will be collecting data over the 30 years, using consistent, standardized methods across the United States. Similar efforts are underway in other parts of the globe (e.g. Australia's Terrestrial Ecosystem Research Network, TERN). These freely available new data provide an opportunity for increased understanding of continental- and global scale processes such as changes in vegetation structure and condition, biodiversity and landuse. However, while 'big data' are becoming more accessible and available, integrating big data into the university courses is challenging. New and potentially unfamiliar data types and associated processing methods, required to work with a growing diversity of available data, may warrant time and resources that present a barrier to classroom integration. Analysis of these big datasets may further present a challenge given large file sizes, and uncertainty regarding best methods to properly statistically summarize and analyze results. Finally, teaching resources, in the form of demonstrative illustrations, and other supporting media that might help teach key data concepts, take time to find and more time to develop. Available resources are often spread widely across multi-online spaces. This presentation will overview the development of NEON's collaborative University-focused online education portal. Portal content will include 1) interactive, online multi-media content that explains key concepts related to NEON's data products including collection methods, key metadata to consider and consideration of potential error and uncertainty surrounding data analysis; and 2) packaged 'lab' activities that include supporting data to be used in an ecology, biology or earth science classroom. To facilitate broad use in classrooms, lab activities will take advantage of freely and commonly available processing tools, techniques and scripts. All NEON materials are being developed in collaboration with labs and organizations across the globe. Integrating data analysis and processing techniques, early in student's careers will support and facilitate student advancement in the sciences - contributing to a larger body of knowledge and understanding of continental and global scale issues. Facilitating understanding of data use and empowering young ecologists with the tools required to process the data, is thus as integral to the observatory as the data itself. In this presentation, we discuss the integral role of freely available education materials that demonstrate the use of big data to address ecological questions and concepts. We also review gaps in existing educational resources related to big data and associated tools. Further, we address the great potential for big data inclusion into both an existing ecological, physical and environmental science courses and self-paced learning model through engaging and interactive multi-media presentation. Finally, we present beta-versions of the interactive, multi-media modules and results from feedback following early piloting and review.

  16. Cosmochemistry

    NASA Astrophysics Data System (ADS)

    Esteban, C.; García López, R. J.; Herrero, A.; Sánchez, F.

    2004-03-01

    1. Primordial alchemy: from the Big Bang to the present Universe G. Steigman; 2. Stellar nucleosynthesis N. Langer; 3. Obervational aspects of stellar nucleosynthesis D. L. Lambert; 4. Abundance determinations in HII regions and planetary nebulae G. Stasinska; 5. Element abundances in nearby galaxies D. R. Garnett; 6. Chemical evolution of galaxies and intracluster medium F.Matteucci; 7. Element abundances through the cosmic ages M. Pettini.

  17. Cosmochemistry

    NASA Astrophysics Data System (ADS)

    Esteban, C.; García López, R. J.; Herrero, A.; Sánchez, F.

    2011-01-01

    1. Primordial alchemy: from the Big Bang to the present Universe G. Steigman; 2. Stellar nucleosynthesis N. Langer; 3. Obervational aspects of stellar nucleosynthesis D. L. Lambert; 4. Abundance determinations in HII regions and planetary nebulae G. Stasinska; 5. Element abundances in nearby galaxies D. R. Garnett; 6. Chemical evolution of galaxies and intracluster medium F.Matteucci; 7. Element abundances through the cosmic ages M. Pettini.

  18. ConfChem Conference on Select 2016 BCCE Presentations: Twentieth Year of the OLCC

    ERIC Educational Resources Information Center

    Belford, Robert E.

    2017-01-01

    The ACS CHED Committee on Computers in Chemical Education (CCCE) ran the first intercollegiate OnLine Chemistry Course (OLCC) on Environmental and Industrial Chemistry in 1996, and is offering the seventh OLCC on Cheminformatics and Public Compound Databases: An Introduction to Big Data in Chemistry in 2017. This Communication summarizes the past,…

  19. Overall View of Chemical and Biochemical Weapons

    PubMed Central

    Pitschmann, Vladimír

    2014-01-01

    This article describes a brief history of chemical warfare, which culminated in the signing of the Chemical Weapons Convention. It describes the current level of chemical weapons and the risk of using them. Furthermore, some traditional technology for the development of chemical weapons, such as increasing toxicity, methods of overcoming chemical protection, research on natural toxins or the introduction of binary technology, has been described. In accordance with many parameters, chemical weapons based on traditional technologies have achieved the limit of their development. There is, however, a big potential of their further development based on the most recent knowledge of modern scientific and technical disciplines, particularly at the boundary of chemistry and biology. The risk is even higher due to the fact that already, today, there is a general acceptance of the development of non-lethal chemical weapons at a technologically higher level. In the future, the chemical arsenal will be based on the accumulation of important information from the fields of chemical, biological and toxin weapons. Data banks obtained in this way will be hardly accessible and the risk of their materialization will persist. PMID:24902078

  20. Overall view of chemical and biochemical weapons.

    PubMed

    Pitschmann, Vladimír

    2014-06-04

    This article describes a brief history of chemical warfare, which culminated in the signing of the Chemical Weapons Convention. It describes the current level of chemical weapons and the risk of using them. Furthermore, some traditional technology for the development of chemical weapons, such as increasing toxicity, methods of overcoming chemical protection, research on natural toxins or the introduction of binary technology, has been described. In accordance with many parameters, chemical weapons based on traditional technologies have achieved the limit of their development. There is, however, a big potential of their further development based on the most recent knowledge of modern scientific and technical disciplines, particularly at the boundary of chemistry and biology. The risk is even higher due to the fact that already, today, there is a general acceptance of the development of non-lethal chemical weapons at a technologically higher level. In the future, the chemical arsenal will be based on the accumulation of important information from the fields of chemical, biological and toxin weapons. Data banks obtained in this way will be hardly accessible and the risk of their materialization will persist.

  1. Some experiences and opportunities for big data in translational research.

    PubMed

    Chute, Christopher G; Ullman-Cullere, Mollie; Wood, Grant M; Lin, Simon M; He, Min; Pathak, Jyotishman

    2013-10-01

    Health care has become increasingly information intensive. The advent of genomic data, integrated into patient care, significantly accelerates the complexity and amount of clinical data. Translational research in the present day increasingly embraces new biomedical discovery in this data-intensive world, thus entering the domain of "big data." The Electronic Medical Records and Genomics consortium has taught us many lessons, while simultaneously advances in commodity computing methods enable the academic community to affordably manage and process big data. Although great promise can emerge from the adoption of big data methods and philosophy, the heterogeneity and complexity of clinical data, in particular, pose additional challenges for big data inferencing and clinical application. However, the ultimate comparability and consistency of heterogeneous clinical information sources can be enhanced by existing and emerging data standards, which promise to bring order to clinical data chaos. Meaningful Use data standards in particular have already simplified the task of identifying clinical phenotyping patterns in electronic health records.

  2. Some experiences and opportunities for big data in translational research

    PubMed Central

    Chute, Christopher G.; Ullman-Cullere, Mollie; Wood, Grant M.; Lin, Simon M.; He, Min; Pathak, Jyotishman

    2014-01-01

    Health care has become increasingly information intensive. The advent of genomic data, integrated into patient care, significantly accelerates the complexity and amount of clinical data. Translational research in the present day increasingly embraces new biomedical discovery in this data-intensive world, thus entering the domain of “big data.” The Electronic Medical Records and Genomics consortium has taught us many lessons, while simultaneously advances in commodity computing methods enable the academic community to affordably manage and process big data. Although great promise can emerge from the adoption of big data methods and philosophy, the heterogeneity and complexity of clinical data, in particular, pose additional challenges for big data inferencing and clinical application. However, the ultimate comparability and consistency of heterogeneous clinical information sources can be enhanced by existing and emerging data standards, which promise to bring order to clinical data chaos. Meaningful Use data standards in particular have already simplified the task of identifying clinical phenotyping patterns in electronic health records. PMID:24008998

  3. Valorisation of waste tyre by pyrolysis in a moving bed reactor.

    PubMed

    Aylón, E; Fernández-Colino, A; Murillo, R; Navarro, M V; García, T; Mastral, A M

    2010-07-01

    The aim of this work is to assess the behaviour of a moving bed reactor, based on a screw transporter design, in waste tyre pyrolysis under several experimental conditions. Waste tyre represents a significant problem in developed countries and it is necessary to develop new technology that could easily process big amounts of this potentially raw material. In this work, the influence of the main pyrolysis process variables (temperature, solid residence time, mass flow rate and inert gas flow) has been studied by a thorough analysis of product yields and properties. It has been found that regardless the process operational parameters, a total waste tyre devolatilisation is achieved, producing a pyrolytic carbon black with a volatile matter content under 5 wt.%. In addition, it has been proven that, in the range studied, the most influencing process variables are temperature and solid mass flow rate, mainly because both variables modify the gas residence time inside the reactor. In addition, it has been found that the modification of these variables affects to the chemical properties of the products. This fact is mainly associated to the different cracking reaction of the primary pyrolysis products. Copyright (c) 2009 Elsevier Ltd. All rights reserved.

  4. Valorisation of waste tyre by pyrolysis in a moving bed reactor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aylon, E.; Fernandez-Colino, A.; Murillo, R., E-mail: ramonm@icb.csic.e

    2010-07-15

    The aim of this work is to assess the behaviour of a moving bed reactor, based on a screw transporter design, in waste tyre pyrolysis under several experimental conditions. Waste tyre represents a significant problem in developed countries and it is necessary to develop new technology that could easily process big amounts of this potentially raw material. In this work, the influence of the main pyrolysis process variables (temperature, solid residence time, mass flow rate and inert gas flow) has been studied by a thorough analysis of product yields and properties. It has been found that regardless the process operationalmore » parameters, a total waste tyre devolatilisation is achieved, producing a pyrolytic carbon black with a volatile matter content under 5 wt.%. In addition, it has been proven that, in the range studied, the most influencing process variables are temperature and solid mass flow rate, mainly because both variables modify the gas residence time inside the reactor. In addition, it has been found that the modification of these variables affects to the chemical properties of the products. This fact is mainly associated to the different cracking reaction of the primary pyrolysis products.« less

  5. Thickness of surficial sediment at and near the Idaho National Engineering Laboratory, Idaho

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, S.R.; Liszewski, M.J.; Ackerman, D.J.

    1996-06-01

    Thickness of surficial sediment was determined from natural-gamma logs in 333 wells at and near the Idaho National Engineering Laboratory in eastern Idaho to provide reconnaissance data for future site-characterization studies. Surficial sediment, which is defined as the unconsolidated clay, silt, sand, and gravel that overlie the uppermost basalt flow at each well, ranges in thickness from 0 feet in seven wells drilled through basalt outcrops east of the Idaho Chemical Processing Plant to 313 feet in well Site 14 southeast of the Big Lost River sinks. Surficial sediment includes alluvial, lacustrine, eolian, and colluvial deposits that generally accumulated duringmore » the past 200 thousand years. Additional thickness data, not included in this report, are available from numerous auger holes and foundation borings at and near most facilities.« less

  6. Physics of primordial star formation

    NASA Astrophysics Data System (ADS)

    Yoshida, Naoki

    2012-09-01

    The study of primordial star formation has a history of nearly sixty years. It is generally thought that primordial stars are one of the key elements in a broad range of topics in astronomy and cosmology, from Galactic chemical evolution to the formation of super-massive blackholes. We review recent progress in the theory of primordial star formation. The standard theory of cosmic structure formation posits that the present-day rich structure of the Universe developed through gravitational amplification of tiny matter density fluctuations left over from the Big Bang. It has become possible to study primordial star formation rigorously within the framework of the standard cosmological model. We first lay out the key physical processes in a primordial gas. Then, we introduce recent developments in computer simulations. Finally, we discuss prospects for future observations of the first generation of stars.

  7. Corrosion Challenges for the Oil and Gas Industry in the State of Qatar

    NASA Astrophysics Data System (ADS)

    Johnsen, Roy

    In Qatar oil and gas has been produced from onshore fields in more than 70 years, while the first offshore field delivered its first crude oil in 1965. Due to the atmospheric conditions in Qatar with periodically high humidity, high chloride content, dust/sand combined with the temperature variations, external corrosion is a big treat to the installations and connecting infrastructure. Internal corrosion in tubing, piping and process systems is also a challenge due to high H2S content in the hydrocarbon mixture and exposure to corrosive aquifer water. To avoid corrosion different type of mitigations like application of coating, chemical treatment and material selection are important elements. This presentation will review the experiences with corrosion challenges for oil & gas installations in Qatar including some examples of corrosion failures that have been seen.

  8. Exploitation of biotechnology in a large company.

    PubMed

    Dart, E C

    1989-08-31

    Almost from the outset, most large companies saw the 'new biotechnology' not as a new business but as a set of very powerful techniques that, in time, would radically improve the understanding of biological systems. This new knowledge was generally seen by them as enhancing the process of invention and not as a substitute for tried and tested ways of meeting clearly identified targets. As the knowledge base grows, so the big-company response to biotechnology becomes more positive. Within ICI, biotechnology is now integrated into five bio-businesses (Pharmaceuticals, Agrochemicals, Seeds, Diagnostics and Biological Products). Within the Central Toxicology Laboratory it also contributes to the understanding of the mechanisms of toxic action of chemicals as part of assessing risk. ICI has entered two of these businesses (Seeds and Diagnostics) because it sees biotechnology making a major contribution to the profitability of each.

  9. Automatized Assessment of Protective Group Reactivity: A Step Toward Big Reaction Data Analysis.

    PubMed

    Lin, Arkadii I; Madzhidov, Timur I; Klimchuk, Olga; Nugmanov, Ramil I; Antipin, Igor S; Varnek, Alexandre

    2016-11-28

    We report a new method to assess protective groups (PGs) reactivity as a function of reaction conditions (catalyst, solvent) using raw reaction data. It is based on an intuitive similarity principle for chemical reactions: similar reactions proceed under similar conditions. Technically, reaction similarity can be assessed using the Condensed Graph of Reaction (CGR) approach representing an ensemble of reactants and products as a single molecular graph, i.e., as a pseudomolecule for which molecular descriptors or fingerprints can be calculated. CGR-based in-house tools were used to process data for 142,111 catalytic hydrogenation reactions extracted from the Reaxys database. Our results reveal some contradictions with famous Greene's Reactivity Charts based on manual expert analysis. Models developed in this study show high accuracy (ca. 90%) for predicting optimal experimental conditions of protective group deprotection.

  10. Evolutionary Metal Oxide Clusters for Novel Applications: Toward High-Density Data Storage in Nonvolatile Memories.

    PubMed

    Chen, Xiaoli; Zhou, Ye; Roy, Vellaisamy A L; Han, Su-Ting

    2018-01-01

    Because of current fabrication limitations, miniaturizing nonvolatile memory devices for managing the explosive increase in big data is challenging. Molecular memories constitute a promising candidate for next-generation memories because their properties can be readily modulated through chemical synthesis. Moreover, these memories can be fabricated through mild solution processing, which can be easily scaled up. Among the various materials, polyoxometalate (POM) molecules have attracted considerable attention for use as novel data-storage nodes for nonvolatile memories. Here, an overview of recent advances in the development of POMs for nonvolatile memories is presented. The general background knowledge of the structure and property diversity of POMs is also summarized. Finally, the challenges and perspectives in the application of POMs in memories are discussed. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. What is Your Cosmic Connection to the Elements?

    NASA Technical Reports Server (NTRS)

    Lochner, J.

    2003-01-01

    This booklet provides information and classroom activities covering topics in astronomy, physics, and chemistry. Chemistry teachers will find information about the cosmic origin of the chemical elements. The astronomy topics include the big bang, life cycles of small and large stars, supernovae, and cosmic rays. Physics teachers will find information on fusion processes, and physical principles important in stellar evolution. While not meant to replace a textbook, the information provided here is meant to give the necessary background for the theme of :our cosmic connection to the elements." The activities can be used to re-enforce the material across a number of disciplines, using a variety of techniques, and to engage and excite students about the topic. Additional activities, and on-line versions of the activities published here, are available at http://imagine.gsfc.nasa.gov/docs/teachers/elements/.

  12. Vacancy clustering and acceptor activation in nitrogen-implanted ZnO

    NASA Astrophysics Data System (ADS)

    Børseth, Thomas Moe; Tuomisto, Filip; Christensen, Jens S.; Monakhov, Edouard V.; Svensson, Bengt G.; Kuznetsov, Andrej Yu.

    2008-01-01

    The role of vacancy clustering and acceptor activation on resistivity evolution in N ion-implanted n -type hydrothermally grown bulk ZnO has been investigated by positron annihilation spectroscopy, resistivity measurements, and chemical profiling. Room temperature 220keV N implantation using doses in the low 1015cm-2 range induces small and big vacancy clusters containing at least 2 and 3-4 Zn vacancies, respectively. The small clusters are present already in as-implanted samples and remain stable up to 1000°C with no significant effect on the resistivity evolution. In contrast, formation of the big clusters at 600°C is associated with a significant increase in the free electron concentration attributed to gettering of amphoteric Li impurities by these clusters. Further annealing at 800°C results in a dramatic decrease in the free electron concentration correlated with activation of 1016-1017cm-3 acceptors likely to be N and/or Li related. The samples remain n type, however, and further annealing at 1000°C results in passivation of the acceptor states while the big clusters dissociate.

  13. Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Science and Real-Time Decision Support

    NASA Astrophysics Data System (ADS)

    Wright, D. J.; Raad, M.; Hoel, E.; Park, M.; Mollenkopf, A.; Trujillo, R.

    2016-12-01

    Introduced is a new approach for processing spatiotemporal big data by leveraging distributed analytics and storage. A suite of temporally-aware analysis tools summarizes data nearby or within variable windows, aggregates points (e.g., for various sensor observations or vessel positions), reconstructs time-enabled points into tracks (e.g., for mapping and visualizing storm tracks), joins features (e.g., to find associations between features based on attributes, spatial relationships, temporal relationships or all three simultaneously), calculates point densities, finds hot spots (e.g., in species distributions), and creates space-time slices and cubes (e.g., in microweather applications with temperature, humidity, and pressure, or within human mobility studies). These "feature geo analytics" tools run in both batch and streaming spatial analysis mode as distributed computations across a cluster of servers on typical "big" data sets, where static data exist in traditional geospatial formats (e.g., shapefile) locally on a disk or file share, attached as static spatiotemporal big data stores, or streamed in near-real-time. In other words, the approach registers large datasets or data stores with ArcGIS Server, then distributes analysis across a cluster of machines for parallel processing. Several brief use cases will be highlighted based on a 16-node server cluster at 14 Gb RAM per node, allowing, for example, the buffering of over 8 million points or thousands of polygons in 1 minute. The approach is "hybrid" in that ArcGIS Server integrates open-source big data frameworks such as Apache Hadoop and Apache Spark on the cluster in order to run the analytics. In addition, the user may devise and connect custom open-source interfaces and tools developed in Python or Python Notebooks; the common denominator being the familiar REST API.

  14. Big Soda Lake (Nevada). 1. Pelagic bacterial heterotrophy and biomass

    USGS Publications Warehouse

    Zehr, Jon P.; Harvey, Ronald W.; Oremland, Ronald S.; Cloern, James E.; George, Leah H.; Lane, Judith L.

    1987-01-01

    Bacterial activities and abundance were measured seasonally in the water column of meromictic Big Soda Lake which is divided into three chemically distinct zones: aerobic mixolimnion, anaerobic mixolimnion, and anaerobic monimolimnion. Bacterial abundance ranged between 5 and 52 x 106 cells ml−1, with highest biomass at the interfaces between these zones: 2–4 mg C liter−1 in the photosynthetic bacterial layer (oxycline) and 0.8–2.0 mg C liter−1 in the chemocline. Bacterial cell size and morphology also varied with depth: small coccoid cells were dominant in the aerobic mixolimnion, whereas the monimolimnion had a more diverse population that included cocci, rods, and large filaments. Heterotrophic activity was measured by [methyl-3H]thymidine incorporation and [14C]glutamate uptake. Highest uptake rates were at or just below the photosynthetic bacterial layer and were attributable to small (<1 µm) heterotrophs rather than the larger photosynthetic bacteria. These high rates of heterotrophic uptake were apparently linked with fermentation; rates of other mineralization processes (e.g. sulfate reduction, methanogenesis, denitrification) in the anoxic mixolimnion were insignificant. Heterotrophic activity in the highly reduced monimolimnion was generally much lower than elsewhere in the water column. Therefore, although the monimolimnion contained most of the bacterial abundance and biomass (∼60%), most of the cells there were inactive.

  15. Water-quality, phytoplankton, and trophic-status characteristics of Big Base and Little Base lakes, Little Rock Air Force Base, Arkansas, 2003-2004

    USGS Publications Warehouse

    Justus, B.G.

    2005-01-01

    Little Rock Air Force Base is the largest C-130 base in the Air Force and is the only C-130 training base in the Department of Defense. Little Rock Air Force Base is located in central Arkansas near the eastern edge of the Ouachita Mountains, near the Mississippi Alluvial Plain, and within the Arkansas Valley Ecoregion. Habitats include upland pine forests, upland deciduous forest, broad-leaved deciduous swamps, and two small freshwater lakes?Big Base Lake and Little Base Lake. Big Base and Little Base Lakes are used primarily for recreational fishing by base personnel and the civilian public. Under normal (rainfall) conditions, Big Base Lake has a surface area of approximately 39 acres while surface area of Little Base Lake is approximately 1 acre. Little Rock Air Force Base personnel are responsible for managing the fishery in these two lakes and since 1999 have started a nutrient enhancement program that involves sporadically adding fertilizer to Big Base Lake. As a means of determining the relations between water quality and primary production, Little Rock Air Force Base personnel have a need for biological (phytoplankton density), chemical (dissolved-oxygen and nutrient concentrations), and physical (water temperature and light transparency) data. To address these monitoring needs, the U.S. Geological Survey in cooperation with Little Rock Air Force Base, conducted a study to collect and analyze biological, chemical, and physical data. The U.S. Geological Survey sampled water quality in Big Base Lake and Little Base Lake on nine occasions from July 2003 through June 2004. Because of the difference in size, two sampling sites were established on Big Base Lake, while only one site was established on Little Base Lake. Lake profile data for Big Base Lake indicate that low dissolved- oxygen concentrations in the hypolimnion probably constrain most fish species to the upper 5-6 feet of depth during the summer stratification period. Dissolved-oxygen concentrations in Big Base Lake below a depth of 6 feet generally were less than 3 milligrams per liter for summer months that were sampled in 2003 and 2004. Some evidence indicates that phosphorus was limiting primary production during the sampling period. Dissolved nitrogen constituents frequently were detected in water samples (indicating availability) but dissolved phosphorus constituents-orthophosphorus and dissolved phosphorus-were not detected in any samples collected at the two lakes. The absence of dissolved phosphorus constituents and presence of total phosphorus indicates that all phosphorus was bound to suspended material (sediment particles and living organisms). Nitrogen:phosphorus ratios on most sampling occasions tended to be slightly higher than 16:1, which can be interpreted as further indication that phosphorus could be limiting primary production to some extent. An alkalinity of 20 milligrams per liter of calcium carbonate or higher is recommended to optimize nutrient availability and buffering capacity in recreational fishing lakes and ponds. Median values for water samples collected at the three sites ranged from 12-13 milligrams per liter of calcium carbonate. Alkalinities ranged from 9-60 milligrams per liter of calcium carbonate, but 13 of 17 samples collected at the deepest site had alkalinities less than 20 milligrams per liter of calcium carbonate. Results of three trophic-state indices, and a general trophic classification, as well as abundant green algae and large growths of blue-green algae indicate that Big Base Lake may be eutrophic. Trophic-state index values calculated using total phosphorus, chlorophyll a, and Secchi disc measurements from both lakes generally exceeded criteria at which lakes are considered to be eutrophic. A second method of determining lake trophic status-the general trophic classification-categorized the three sampling sites as mesotrophic or eutrophic. Green algae were found to be in abundance throughout mos

  16. NETIMIS: Dynamic Simulation of Health Economics Outcomes Using Big Data.

    PubMed

    Johnson, Owen A; Hall, Peter S; Hulme, Claire

    2016-02-01

    Many healthcare organizations are now making good use of electronic health record (EHR) systems to record clinical information about their patients and the details of their healthcare. Electronic data in EHRs is generated by people engaged in complex processes within complex environments, and their human input, albeit shaped by computer systems, is compromised by many human factors. These data are potentially valuable to health economists and outcomes researchers but are sufficiently large and complex enough to be considered part of the new frontier of 'big data'. This paper describes emerging methods that draw together data mining, process modelling, activity-based costing and dynamic simulation models. Our research infrastructure includes safe links to Leeds hospital's EHRs with 3 million secondary and tertiary care patients. We created a multidisciplinary team of health economists, clinical specialists, and data and computer scientists, and developed a dynamic simulation tool called NETIMIS (Network Tools for Intervention Modelling with Intelligent Simulation; http://www.netimis.com ) suitable for visualization of both human-designed and data-mined processes which can then be used for 'what-if' analysis by stakeholders interested in costing, designing and evaluating healthcare interventions. We present two examples of model development to illustrate how dynamic simulation can be informed by big data from an EHR. We found the tool provided a focal point for multidisciplinary team work to help them iteratively and collaboratively 'deep dive' into big data.

  17. Questioning the "big assumptions". Part I: addressing personal contradictions that impede professional development.

    PubMed

    Bowe, Constance M; Lahey, Lisa; Armstrong, Elizabeth; Kegan, Robert

    2003-08-01

    The ultimate success of recent medical curriculum reforms is, in large part, dependent upon the faculty's ability to adopt and sustain new attitudes and behaviors. However, like many New Year's resolutions, sincere intent to change may be short lived and followed by a discouraging return to old behaviors. Failure to sustain the initial resolve to change can be misinterpreted as a lack of commitment to one's original goals and eventually lead to greater effort expended in rationalizing the status quo rather than changing it. The present article outlines how a transformative process that has proven to be effective in managing personal change, Questioning the Big Assumptions, was successfully used in an international faculty development program for medical educators to enhance individual personal satisfaction and professional effectiveness. This process systematically encouraged participants to explore and proactively address currently operative mechanisms that could stall their attempts to change at the professional level. The applications of the Big Assumptions process in faculty development helped individuals to recognize and subsequently utilize unchallenged and deep rooted personal beliefs to overcome unconscious resistance to change. This approach systematically led participants away from circular griping about what was not right in their current situation to identifying the actions that they needed to take to realize their individual goals. By thoughtful testing of personal Big Assumptions, participants designed behavioral changes that could be broadly supported and, most importantly, sustained.

  18. BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark.

    PubMed

    Gulzar, Muhammad Ali; Interlandi, Matteo; Yoo, Seunghyun; Tetali, Sai Deep; Condie, Tyson; Millstein, Todd; Kim, Miryung

    2016-05-01

    Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today's data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform. This requires re-thinking the notion of step-through debugging in a traditional debugger such as gdb, because pausing the entire computation across distributed worker nodes causes significant delay and naively inspecting millions of records using a watchpoint is too time consuming for an end user. First, BIGDEBUG's simulated breakpoints and on-demand watchpoints allow users to selectively examine distributed, intermediate data on the cloud with little overhead. Second, a user can also pinpoint a crash-inducing record and selectively resume relevant sub-computations after a quick fix. Third, a user can determine the root causes of errors (or delays) at the level of individual records through a fine-grained data provenance capability. Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average. It determines crash culprits orders of magnitude more accurately and provides up to 100% time saving compared to the baseline replay debugger. The results show that BIGDEBUG supports debugging at interactive speeds with minimal performance impact.

  19. Big Data: A Parallel Particle Swarm Optimization-Back-Propagation Neural Network Algorithm Based on MapReduce.

    PubMed

    Cao, Jianfang; Cui, Hongyan; Shi, Hao; Jiao, Lijuan

    2016-01-01

    A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design. The PSO algorithm was used to optimize the BP neural network's initial weights and thresholds and improve the accuracy of the classification algorithm. The MapReduce parallel programming model was utilized to achieve parallel processing of the BP algorithm, thereby solving the problems of hardware and communication overhead when the BP neural network addresses big data. Datasets on 5 different scales were constructed using the scene image library from the SUN Database. The classification accuracy of the parallel PSO-BP neural network algorithm is approximately 92%, and the system efficiency is approximately 0.85, which presents obvious advantages when processing big data. The algorithm proposed in this study demonstrated both higher classification accuracy and improved time efficiency, which represents a significant improvement obtained from applying parallel processing to an intelligent algorithm on big data.

  20. The BIG protein distinguishes the process of CO2 -induced stomatal closure from the inhibition of stomatal opening by CO2.

    PubMed

    He, Jingjing; Zhang, Ruo-Xi; Peng, Kai; Tagliavia, Cecilia; Li, Siwen; Xue, Shaowu; Liu, Amy; Hu, Honghong; Zhang, Jingbo; Hubbard, Katharine E; Held, Katrin; McAinsh, Martin R; Gray, Julie E; Kudla, Jörg; Schroeder, Julian I; Liang, Yun-Kuan; Hetherington, Alistair M

    2018-04-01

    We conducted an infrared thermal imaging-based genetic screen to identify Arabidopsis mutants displaying aberrant stomatal behavior in response to elevated concentrations of CO 2 . This approach resulted in the isolation of a novel allele of the Arabidopsis BIG locus (At3g02260) that we have called CO 2 insensitive 1 (cis1). BIG mutants are compromised in elevated CO 2 -induced stomatal closure and bicarbonate activation of S-type anion channel currents. In contrast with the wild-type, they fail to exhibit reductions in stomatal density and index when grown in elevated CO 2 . However, like the wild-type, BIG mutants display inhibition of stomatal opening when exposed to elevated CO 2 . BIG mutants also display wild-type stomatal aperture responses to the closure-inducing stimulus abscisic acid (ABA). Our results indicate that BIG is a signaling component involved in the elevated CO 2 -mediated control of stomatal development. In the control of stomatal aperture by CO 2 , BIG is only required in elevated CO 2 -induced closure and not in the inhibition of stomatal opening by this environmental signal. These data show that, at the molecular level, the CO 2 -mediated inhibition of opening and promotion of stomatal closure signaling pathways are separable and BIG represents a distinguishing element in these two CO 2 -mediated responses. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.

  1. [Relevance of big data for molecular diagnostics].

    PubMed

    Bonin-Andresen, M; Smiljanovic, B; Stuhlmüller, B; Sörensen, T; Grützkau, A; Häupl, T

    2018-04-01

    Big data analysis raises the expectation that computerized algorithms may extract new knowledge from otherwise unmanageable vast data sets. What are the algorithms behind the big data discussion? In principle, high throughput technologies in molecular research already introduced big data and the development and application of analysis tools into the field of rheumatology some 15 years ago. This includes especially omics technologies, such as genomics, transcriptomics and cytomics. Some basic methods of data analysis are provided along with the technology, however, functional analysis and interpretation requires adaptation of existing or development of new software tools. For these steps, structuring and evaluating according to the biological context is extremely important and not only a mathematical problem. This aspect has to be considered much more for molecular big data than for those analyzed in health economy or epidemiology. Molecular data are structured in a first order determined by the applied technology and present quantitative characteristics that follow the principles of their biological nature. These biological dependencies have to be integrated into software solutions, which may require networks of molecular big data of the same or even different technologies in order to achieve cross-technology confirmation. More and more extensive recording of molecular processes also in individual patients are generating personal big data and require new strategies for management in order to develop data-driven individualized interpretation concepts. With this perspective in mind, translation of information derived from molecular big data will also require new specifications for education and professional competence.

  2. Change in the quality of atmospheric air under the influence of chemical enterprises in the Sverdlovsk Region cities

    NASA Astrophysics Data System (ADS)

    Ivantsova, Maria N.; Selezneva, Irina S.; Bezmaternykh, Maxim A.

    2017-06-01

    In this article, we investigated the gaseous emission dynamics of Sverdlovsk region chemical complex from 2011 to 2015. We have analyzed the dynamics of changes in the atmospheric air state in cities where the big chemical enterprises are located. Our main attention is paid to gases emissions such as carbon monoxide, nitrogen oxides and sulfur dioxide. We also looked at the volume of purified gas emissions in all gaseous emissions from the chemical industry enterprises. The contribution of emissions from chemical industry is relatively low and accounts for approximately 4 % of emissions from all manufacturing industries, including metallurgical complexes. There is a general tendency to reduce emissions of pollutants in connection with the implementation of environmental measures. Reducing the gaseous substances emission can be achieved both by improving the production technology, and by installing new more sophisticated equipment which catches harmful substances emissions.

  3. ClimateSpark: An in-memory distributed computing framework for big climate data analytics

    NASA Astrophysics Data System (ADS)

    Hu, Fei; Yang, Chaowei; Schnase, John L.; Duffy, Daniel Q.; Xu, Mengchao; Bowen, Michael K.; Lee, Tsengdar; Song, Weiwei

    2018-06-01

    The unprecedented growth of climate data creates new opportunities for climate studies, and yet big climate data pose a grand challenge to climatologists to efficiently manage and analyze big data. The complexity of climate data content and analytical algorithms increases the difficulty of implementing algorithms on high performance computing systems. This paper proposes an in-memory, distributed computing framework, ClimateSpark, to facilitate complex big data analytics and time-consuming computational tasks. Chunking data structure improves parallel I/O efficiency, while a spatiotemporal index is built for the chunks to avoid unnecessary data reading and preprocessing. An integrated, multi-dimensional, array-based data model (ClimateRDD) and ETL operations are developed to address big climate data variety by integrating the processing components of the climate data lifecycle. ClimateSpark utilizes Spark SQL and Apache Zeppelin to develop a web portal to facilitate the interaction among climatologists, climate data, analytic operations and computing resources (e.g., using SQL query and Scala/Python notebook). Experimental results show that ClimateSpark conducts different spatiotemporal data queries/analytics with high efficiency and data locality. ClimateSpark is easily adaptable to other big multiple-dimensional, array-based datasets in various geoscience domains.

  4. A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science.

    PubMed

    Faghmous, James H; Kumar, Vipin

    2014-09-01

    Global climate change and its impact on human life has become one of our era's greatest challenges. Despite the urgency, data science has had little impact on furthering our understanding of our planet in spite of the abundance of climate data. This is a stark contrast from other fields such as advertising or electronic commerce where big data has been a great success story. This discrepancy stems from the complex nature of climate data as well as the scientific questions climate science brings forth. This article introduces a data science audience to the challenges and opportunities to mine large climate datasets, with an emphasis on the nuanced difference between mining climate data and traditional big data approaches. We focus on data, methods, and application challenges that must be addressed in order for big data to fulfill their promise with regard to climate science applications. More importantly, we highlight research showing that solely relying on traditional big data techniques results in dubious findings, and we instead propose a theory-guided data science paradigm that uses scientific theory to constrain both the big data techniques as well as the results-interpretation process to extract accurate insight from large climate data .

  5. Big Ozone Holes Headed For Extinction By 2040

    NASA Image and Video Library

    2015-05-06

    Caption: This is a conceptual animation showing ozone-depleting chemicals moving from the equator to the poles. The chemicals become trapped by the winds of the polar vortex, a ring of fast moving air that circles the South Pole. Watch full video: youtu.be/7n2km69jZu8 -- The next three decades will see an end of the era of big ozone holes. In a new study, scientists from NASA Goddard Space Flight Center say that the ozone hole will be consistently smaller than 12 million square miles by the year 2040. Ozone-depleting chemicals in the atmosphere cause an ozone hole to form over Antarctica during the winter months in the Southern Hemisphere. Since the Montreal Protocol agreement in 1987, emissions have been regulated and chemical levels have been declining. However, the ozone hole has still remained bigger than 12 million square miles since the early 1990s, with exact sizes varying from year to year. The size of the ozone hole varies due to both temperature and levels of ozone-depleting chemicals in the atmosphere. In order to get a more accurate picture of the future size of the ozone hole, scientists used NASA’s AURA satellite to determine how much the levels of these chemicals in the atmosphere varied each year. With this new knowledge, scientists can confidently say that the ozone hole will be consistently smaller than 12 million square miles by the year 2040. Scientists will continue to use satellites to monitor the recovery of the ozone hole and they hope to see its full recovery by the end of the century. Research: Inorganic chlorine variability in the Antarctic vortex and implications for ozone recovery. Journal: Geophysical Research: Atmospheres, December 18, 2014. Link to paper: onlinelibrary.wiley.com/doi/10.1002/2014JD022295/abstract.

  6. caBIG compatibility review system: software to support the evaluation of applications using defined interoperability criteria.

    PubMed

    Freimuth, Robert R; Schauer, Michael W; Lodha, Preeti; Govindrao, Poornima; Nagarajan, Rakesh; Chute, Christopher G

    2008-11-06

    The caBIG Compatibility Review System (CRS) is a web-based application to support compatibility reviews, which certify that software applications that pass the review meet a specific set of criteria that allow them to interoperate. The CRS contains workflows that support both semantic and syntactic reviews, which are performed by the caBIG Vocabularies and Common Data Elements (VCDE) and Architecture workspaces, respectively. The CRS increases the efficiency of compatibility reviews by reducing administrative overhead and it improves uniformity by ensuring that each review is conducted according to a standard process. The CRS provides metrics that allow the review team to evaluate the level of data element reuse in an application, a first step towards quantifying the extent of harmonization between applications. Finally, functionality is being added that will provide automated validation of checklist criteria, which will further simplify the review process.

  7. Limnology of Big Lake, south-central Alaska, 1983-84

    USGS Publications Warehouse

    Woods, Paul F.

    1992-01-01

    The limnological characteristics and trophic state of Big Lake in south-central Alaska were determined from the results of an intensive study during 1983-84. The study was begun in response to concern over the potential for eutrophication of Big Lake, which has experienced substantial residential development and recreational use because of its proximity to Anchorage. The east and west basins of the 1,213 square-hectometer lake were each visited 36 times during the 2-year study to obtain a wide variety of physical, chemical, and biological data. During 1984, an estimate was made of the lake's annual primary production. Big Lake was classified as oligotrophic on the basis of its annual mean values for total phosphorus (9.5 micrograms per liter), total nitrogen (209 micrograms per liter), chlorophyll-a (2.5 micrograms per liter), secchi-disc transparency (6.3 meters), and its mean daily integral primary production of 81.1 milligrams of carbon fixed per square meter. The lake was, however, uncharacteristic of oligotrophic lakes in that a severe dissolved-oxygen deficit developed within the hypolimnion during summer stratification and under winter ice cover. The summer dissolved-oxygen deficit resulted from the combination of strong and persistent thermal stratification, which developed within 1 week of the melting of the lake's ice cover in May, and the failure of the spring circulation to fully reaerate the hypolimnion. The autumn circulation did reaerate the entire water column, but the ensuing 6 months of ice and snow cover prevented atmospheric reaeration of the water column and led to development of the winter dissolved-oxygen deficit. The anoxic conditions that eventually developed near the lake bottom allowed the release of nutrients from the bottom sediments and facilitated ammonification reactions. These processes yielded hypolimnetic concentrations of nitrogen and phosphorus compounds, which were much larger than the oligotrophic concentrations measured within the epilimnion. An analysis of nitrogen-to-phosphorus ratios showed that nitrogen was the nutrient most likely to limit phytoplankton growth during the summer. Although mean chlorophyll-a concentrations were at oligotrophic levels, concentrations did peak at 46.5 micrograms per liter in the east basin. During each year and in both basins, the peak chlorophyll-a concentrations were measured within the hypolimnion because the euphotic zone commonly was deeper than the epilimnion during the summer. The annual integral primary production of Big Lake in 1984 was 29.6 grams of carbon fixed per square meter with about 90 percent of that produced during May through October. During this time period, the lake received 76 percent of its annual input of solar irradiance. Monthly integral primary production, in milligrams of carbon fixed per square meter, ranged from 1.5 in January to 7,050 in July. When compared with the range of annual integral primary production measured in 50 International Biological Program lakes throughout the world, Big Lake had a low value of annual integral primary production. The results of this study lend credence to the concerns about the potential eutrophication of Big Lake. Increases in the supply of oxygen-demanding materials to Big Lake could worsen the hypolimnetic dissolved-oxygen deficit and possibly shift the lake's trophic state toward mesotrophy or eutrophy.

  8. The Measurand Framework: Scaling Exploratory Data Analysis

    NASA Astrophysics Data System (ADS)

    Schneider, D.; MacLean, L. S.; Kappler, K. N.; Bleier, T.

    2017-12-01

    Since 2005 QuakeFinder (QF) has acquired a unique dataset with outstanding spatial and temporal sampling of earth's time varying magnetic field along several active fault systems. This QF network consists of 124 stations in California and 45 stations along fault zones in Greece, Taiwan, Peru, Chile and Indonesia. Each station is equipped with three feedback induction magnetometers, two ion sensors, a 4 Hz geophone, a temperature sensor, and a humidity sensor. Data are continuously recorded at 50 Hz with GPS timing and transmitted daily to the QF data center in California for analysis. QF is attempting to detect and characterize anomalous EM activity occurring ahead of earthquakes. In order to analyze this sizable dataset, QF has developed an analytical framework to support processing the time series input data and hypothesis testing to evaluate the statistical significance of potential precursory signals. The framework was developed with a need to support legacy, in-house processing but with an eye towards big-data processing with Apache Spark and other modern big data technologies. In this presentation, we describe our framework, which supports rapid experimentation and iteration of candidate signal processing techniques via modular data transformation stages, tracking of provenance, and automatic re-computation of downstream data when upstream data is updated. Furthermore, we discuss how the processing modules can be ported to big data platforms like Apache Spark and demonstrate a migration path from local, in-house processing to cloud-friendly processing.

  9. Innovating Big Data Computing Geoprocessing for Analysis of Engineered-Natural Systems

    NASA Astrophysics Data System (ADS)

    Rose, K.; Baker, V.; Bauer, J. R.; Vasylkivska, V.

    2016-12-01

    Big data computing and analytical techniques offer opportunities to improve predictions about subsurface systems while quantifying and characterizing associated uncertainties from these analyses. Spatial analysis, big data and otherwise, of subsurface natural and engineered systems are based on variable resolution, discontinuous, and often point-driven data to represent continuous phenomena. We will present examples from two spatio-temporal methods that have been adapted for use with big datasets and big data geo-processing capabilities. The first approach uses regional earthquake data to evaluate spatio-temporal trends associated with natural and induced seismicity. The second algorithm, the Variable Grid Method (VGM), is a flexible approach that presents spatial trends and patterns, such as those resulting from interpolation methods, while simultaneously visualizing and quantifying uncertainty in the underlying spatial datasets. In this presentation we will show how we are utilizing Hadoop to store and perform spatial analyses to efficiently consume and utilize large geospatial data in these custom analytical algorithms through the development of custom Spark and MapReduce applications that incorporate ESRI Hadoop libraries. The team will present custom `Big Data' geospatial applications that run on the Hadoop cluster and integrate with ESRI ArcMap with the team's probabilistic VGM approach. The VGM-Hadoop tool has been specially built as a multi-step MapReduce application running on the Hadoop cluster for the purpose of data reduction. This reduction is accomplished by generating multi-resolution, non-overlapping, attributed topology that is then further processed using ESRI's geostatistical analyst to convey a probabilistic model of a chosen study region. Finally, we will share our approach for implementation of data reduction and topology generation via custom multi-step Hadoop applications, performance benchmarking comparisons, and Hadoop-centric opportunities for greater parallelization of geospatial operations.

  10. Implementation of a Big Data Accessing and Processing Platform for Medical Records in Cloud.

    PubMed

    Yang, Chao-Tung; Liu, Jung-Chun; Chen, Shuo-Tsung; Lu, Hsin-Wen

    2017-08-18

    Big Data analysis has become a key factor of being innovative and competitive. Along with population growth worldwide and the trend aging of population in developed countries, the rate of the national medical care usage has been increasing. Due to the fact that individual medical data are usually scattered in different institutions and their data formats are varied, to integrate those data that continue increasing is challenging. In order to have scalable load capacity for these data platforms, we must build them in good platform architecture. Some issues must be considered in order to use the cloud computing to quickly integrate big medical data into database for easy analyzing, searching, and filtering big data to obtain valuable information.This work builds a cloud storage system with HBase of Hadoop for storing and analyzing big data of medical records and improves the performance of importing data into database. The data of medical records are stored in HBase database platform for big data analysis. This system performs distributed computing on medical records data processing through Hadoop MapReduce programming, and to provide functions, including keyword search, data filtering, and basic statistics for HBase database. This system uses the Put with the single-threaded method and the CompleteBulkload mechanism to import medical data. From the experimental results, we find that when the file size is less than 300MB, the Put with single-threaded method is used and when the file size is larger than 300MB, the CompleteBulkload mechanism is used to improve the performance of data import into database. This system provides a web interface that allows users to search data, filter out meaningful information through the web, and analyze and convert data in suitable forms that will be helpful for medical staff and institutions.

  11. Big Data and Knowledge Management: A Possible Course to Combine Them Together

    ERIC Educational Resources Information Center

    Hijazi, Sam

    2017-01-01

    Big data (BD) is the buzz phrase these days. Everyone is talking about its potential, its volume, its variety, and its velocity. Knowledge management (KM) has been around since the mid-1990s. The goals of KM have been to collect, store, categorize, mine, and process data into knowledge. The methods of knowledge acquisition varied from…

  12. Distilling Big Data: Refining Quality Information in the Era of Yottabytes

    PubMed Central

    Subramaniam, Srinivasan; Ramasamy, Chandrasekeran

    2015-01-01

    Big Data is the buzzword of the modern century. With the invasion of pervasive computing, we live in a data centric environment, where we always leave a track of data related to our day to day activities. Be it a visit to a shopping mall or hospital or surfing Internet, we create voluminous data related to credit card transactions, user details, location information, and so on. These trails of data simply define an individual and form the backbone for user-profiling. With the mobile phones and their easy access to online social networks on the go, sensor data such as geo-taggings and events and sentiments around them contribute to the already overwhelming data containers. With reductions in the cost of storage and computational devices and with increasing proliferation of Cloud, we never felt any constraints in storing or processing such data. Eventually we end up having several exabytes of data and analysing them for their usefulness has introduced new frontiers of research. Effective distillation of these data is the need of the hour to improve the veracity of the Big Data. This research targets the utilization of the Fuzzy Bayesian process model to improve the quality of information in Big Data. PMID:26495424

  13. Distilling Big Data: Refining Quality Information in the Era of Yottabytes.

    PubMed

    Ramachandramurthy, Sivaraman; Subramaniam, Srinivasan; Ramasamy, Chandrasekeran

    2015-01-01

    Big Data is the buzzword of the modern century. With the invasion of pervasive computing, we live in a data centric environment, where we always leave a track of data related to our day to day activities. Be it a visit to a shopping mall or hospital or surfing Internet, we create voluminous data related to credit card transactions, user details, location information, and so on. These trails of data simply define an individual and form the backbone for user-profiling. With the mobile phones and their easy access to online social networks on the go, sensor data such as geo-taggings and events and sentiments around them contribute to the already overwhelming data containers. With reductions in the cost of storage and computational devices and with increasing proliferation of Cloud, we never felt any constraints in storing or processing such data. Eventually we end up having several exabytes of data and analysing them for their usefulness has introduced new frontiers of research. Effective distillation of these data is the need of the hour to improve the veracity of the Big Data. This research targets the utilization of the Fuzzy Bayesian process model to improve the quality of information in Big Data.

  14. [Big data, medical language and biomedical terminology systems].

    PubMed

    Schulz, Stefan; López-García, Pablo

    2015-08-01

    A variety of rich terminology systems, such as thesauri, classifications, nomenclatures and ontologies support information and knowledge processing in health care and biomedical research. Nevertheless, human language, manifested as individually written texts, persists as the primary carrier of information, in the description of disease courses or treatment episodes in electronic medical records, and in the description of biomedical research in scientific publications. In the context of the discussion about big data in biomedicine, we hypothesize that the abstraction of the individuality of natural language utterances into structured and semantically normalized information facilitates the use of statistical data analytics to distil new knowledge out of textual data from biomedical research and clinical routine. Computerized human language technologies are constantly evolving and are increasingly ready to annotate narratives with codes from biomedical terminology. However, this depends heavily on linguistic and terminological resources. The creation and maintenance of such resources is labor-intensive. Nevertheless, it is sensible to assume that big data methods can be used to support this process. Examples include the learning of hierarchical relationships, the grouping of synonymous terms into concepts and the disambiguation of homonyms. Although clear evidence is still lacking, the combination of natural language technologies, semantic resources, and big data analytics is promising.

  15. Toward a Learning Health-care System – Knowledge Delivery at the Point of Care Empowered by Big Data and NLP

    PubMed Central

    Kaggal, Vinod C.; Elayavilli, Ravikumar Komandur; Mehrabi, Saeed; Pankratz, Joshua J.; Sohn, Sunghwan; Wang, Yanshan; Li, Dingcheng; Rastegar, Majid Mojarad; Murphy, Sean P.; Ross, Jason L.; Chaudhry, Rajeev; Buntrock, James D.; Liu, Hongfang

    2016-01-01

    The concept of optimizing health care by understanding and generating knowledge from previous evidence, ie, the Learning Health-care System (LHS), has gained momentum and now has national prominence. Meanwhile, the rapid adoption of electronic health records (EHRs) enables the data collection required to form the basis for facilitating LHS. A prerequisite for using EHR data within the LHS is an infrastructure that enables access to EHR data longitudinally for health-care analytics and real time for knowledge delivery. Additionally, significant clinical information is embedded in the free text, making natural language processing (NLP) an essential component in implementing an LHS. Herein, we share our institutional implementation of a big data-empowered clinical NLP infrastructure, which not only enables health-care analytics but also has real-time NLP processing capability. The infrastructure has been utilized for multiple institutional projects including the MayoExpertAdvisor, an individualized care recommendation solution for clinical care. We compared the advantages of big data over two other environments. Big data infrastructure significantly outperformed other infrastructure in terms of computing speed, demonstrating its value in making the LHS a possibility in the near future. PMID:27385912

  16. Toward a Learning Health-care System - Knowledge Delivery at the Point of Care Empowered by Big Data and NLP.

    PubMed

    Kaggal, Vinod C; Elayavilli, Ravikumar Komandur; Mehrabi, Saeed; Pankratz, Joshua J; Sohn, Sunghwan; Wang, Yanshan; Li, Dingcheng; Rastegar, Majid Mojarad; Murphy, Sean P; Ross, Jason L; Chaudhry, Rajeev; Buntrock, James D; Liu, Hongfang

    2016-01-01

    The concept of optimizing health care by understanding and generating knowledge from previous evidence, ie, the Learning Health-care System (LHS), has gained momentum and now has national prominence. Meanwhile, the rapid adoption of electronic health records (EHRs) enables the data collection required to form the basis for facilitating LHS. A prerequisite for using EHR data within the LHS is an infrastructure that enables access to EHR data longitudinally for health-care analytics and real time for knowledge delivery. Additionally, significant clinical information is embedded in the free text, making natural language processing (NLP) an essential component in implementing an LHS. Herein, we share our institutional implementation of a big data-empowered clinical NLP infrastructure, which not only enables health-care analytics but also has real-time NLP processing capability. The infrastructure has been utilized for multiple institutional projects including the MayoExpertAdvisor, an individualized care recommendation solution for clinical care. We compared the advantages of big data over two other environments. Big data infrastructure significantly outperformed other infrastructure in terms of computing speed, demonstrating its value in making the LHS a possibility in the near future.

  17. Medical Big Data Warehouse: Architecture and System Design, a Case Study: Improving Healthcare Resources Distribution.

    PubMed

    Sebaa, Abderrazak; Chikh, Fatima; Nouicer, Amina; Tari, AbdelKamel

    2018-02-19

    The huge increases in medical devices and clinical applications which generate enormous data have raised a big issue in managing, processing, and mining this massive amount of data. Indeed, traditional data warehousing frameworks can not be effective when managing the volume, variety, and velocity of current medical applications. As a result, several data warehouses face many issues over medical data and many challenges need to be addressed. New solutions have emerged and Hadoop is one of the best examples, it can be used to process these streams of medical data. However, without an efficient system design and architecture, these performances will not be significant and valuable for medical managers. In this paper, we provide a short review of the literature about research issues of traditional data warehouses and we present some important Hadoop-based data warehouses. In addition, a Hadoop-based architecture and a conceptual data model for designing medical Big Data warehouse are given. In our case study, we provide implementation detail of big data warehouse based on the proposed architecture and data model in the Apache Hadoop platform to ensure an optimal allocation of health resources.

  18. Suppression of copper thin film loss during graphene synthesis.

    PubMed

    Lee, Alvin L; Tao, Li; Akinwande, Deji

    2015-01-28

    Thin metal films can be used to catalyze the growth of nanomaterials in place of the bulk metal, while greatly reducing the amount of material used. A big drawback of copper thin films (0.5-1.5 μm thick) is that, under high temperature/vacuum synthesis, the mass loss of films severely reduces the process time due to discontinuities in the metal film, thereby limiting the time scale for controlling metal grain and film growth. In this work, we have developed a facile method, namely "covered growth" to extend the time copper thin films can be exposed to high temperature/vacuum environment for graphene synthesis. The key to preventing severe mass loss of copper film during the high temperature chemical vapor deposition (CVD) process is to have a cover piece on top of the growth substrate. This new "covered growth" method enables the high-temperature annealing of the copper film upward of 4 h with minimal mass loss, while increasing copper film grain and graphene domain size. Graphene was then successfully grown on the capped copper film with subsequent transfer for device fabrication. Device characterization indicated equivalent physical, chemical, and electrical properties to conventional CVD graphene. Our "covered growth" provides a convenient and effective solution to the mass loss issue of thin films that serve as catalysts for a variety of 2D material syntheses.

  19. Towards Limits on Neutrino Mixing Parameters from Nucleosynthesis in the Big Bang and Supernovae

    NASA Astrophysics Data System (ADS)

    Cardall, Christian Young

    1997-11-01

    Astrophysical environments can often provide stricter limits on neutrino mass and mixing parameters than terrestrial experiments. However, before firm limits can be found, there must be confidence in the understanding of the astrophysical environment being used to make these limits. In this dissertation, progress towards limits on neutrino mixing parameters from big bang nucleosynthesis and supernova r-process nucleosynthesis is sought. By way of assessment of current knowledge of neutrino oscillation parameters, we examine the potential for a 'natural' three-neutrino mixing scheme (one without sterile neutrinos) to satisfy available data and astrophysical arguments. A small parameter space currently exists for a natural three-neutrino oscillation solution meeting known constraints. If such a solution is ruled out, and current hints about neutrino oscillations are confirmed, mixing between active and sterile neutrinos will probably be required. Because mixing between active and sterile neutrinos with parameters appropriate for the atmospheric or solar neutrino problems increases the primordial 4He abundance, big bang nucleosynthesis considerations can place limits on such mixing. In the present work the overall consistency of standard big bang nucleosynthesis is discussed in light of recent discordant determinations of the primordial deuterium abundance. Cosmological considerations favor a larger baryon density, which supports the lower reported value of D/H. Studies of limits on active-sterile neutrino mixing derived from big bang nucleosynthesis considerations are here extended to consider the dependance of these constraints on the primordial deuterium abundance. If the neutrino-heated ejecta in the post-core-bounce supernova environment is the site of r-process nucleosynthesis, limits can be placed on mixing between νe, and νsbμ, or νsbτ. Refined limits will require a better understanding of this r-process environment, since current supernova models do not show a completely successful r-process. In this work it is shown that general relativistic effects associated with a more compact supernova core can provide more suitable conditions for the r-process. As a step towards analyzing the effects of neutrino mixing in such a relativistic environment, neutrino oscillations in curved spacetime are studied.

  20. Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, Qunzhi; Simmhan, Yogesh; Prasanna, Viktor K.

    Emerging Big Data applications in areas like e-commerce and energy industry require both online and on-demand queries to be performed over vast and fast data arriving as streams. These present novel challenges to Big Data management systems. Complex Event Processing (CEP) is recognized as a high performance online query scheme which in particular deals with the velocity aspect of the 3-V’s of Big Data. However, traditional CEP systems do not consider data variety and lack the capability to embed ad hoc queries over the volume of data streams. In this paper, we propose H2O, a stateful complex event processing framework,more » to support hybrid online and on-demand queries over realtime data. We propose a semantically enriched event and query model to address data variety. A formal query algebra is developed to precisely capture the stateful and containment semantics of online and on-demand queries. We describe techniques to achieve the interactive query processing over realtime data featured by efficient online querying, dynamic stream data persistence and on-demand access. The system architecture is presented and the current implementation status reported.« less

  1. Evidence for metalloprotease involvement in the in vivo effects of big endothelin 1.

    PubMed

    Pollock, D M; Opgenorth, T J

    1991-07-01

    The potent vasoconstrictor endothelin 1 (ET-1) is thought to arise from the proteolytic processing of big endothelin 1 (Big ET) by a unique endothelin-converting enzyme, possibly a metalloprotease. Experiments were conducted to determine the effects of Big ET on cardiovascular and renal functions during inhibition of metalloprotease activity in vivo. Intravenous infusion of Big ET (0.1 nmol.kg-1.min-1) in anesthetized euvolemic rats produced a significant increase in mean arterial pressure (MAP; 39 +/- 8%) and a decrease in effective renal plasma flow (ERPF; -39 +/- 2%), whereas glomerular filtration rate (GFR) remained unchanged (-8 +/- 8%). Simultaneous intravenous infusion of phosphoramidon (0.25 mg.kg-1.min-1), an inhibitor of metalloprotease activity including neutral endopeptidase EC 3.4.24.11 (NEP), completely prevented these effects of Big ET. Thiorphan (0.1 mg.kg-1.min-1), also an inhibitor of NEP, had absolutely no effect on either the renal or cardiovascular response to Big ET. Similarly, the response to Big ET was unaffected by infusion of enalaprilat (0.1 mg.kg-1.min-1), an inhibitor of the angiotensin-converting enzyme, which is also a metalloprotease. To determine whether the effect of phosphoramidon was due to antagonism of ET-1, an identical series of experiments was performed using ET-1 infusion (0.02 nmol.kg-1.min-1). Although the increase in MAP (24 +/- 5%) produced by ET-1 was less than that observed for the given dose of Big ET, the renal vasoconstriction was much more severe; the smaller peptide changed ERPF and GFR by -66 +/- 7 and -54 +/- 9%, respectively.(ABSTRACT TRUNCATED AT 250 WORDS)

  2. Association of Big Endothelin-1 with Coronary Artery Calcification.

    PubMed

    Qing, Ping; Li, Xiao-Lin; Zhang, Yan; Li, Yi-Lin; Xu, Rui-Xia; Guo, Yuan-Lin; Li, Sha; Wu, Na-Qiong; Li, Jian-Jun

    2015-01-01

    The coronary artery calcification (CAC) is clinically considered as one of the important predictors of atherosclerosis. Several studies have confirmed that endothelin-1(ET-1) plays an important role in the process of atherosclerosis formation. The aim of this study was to investigate whether big ET-1 is associated with CAC. A total of 510 consecutively admitted patients from February 2011 to May 2012 in Fu Wai Hospital were analyzed. All patients had received coronary computed tomography angiography and then divided into two groups based on the results of coronary artery calcium score (CACS). The clinical characteristics including traditional and calcification-related risk factors were collected and plasma big ET-1 level was measured by ELISA. Patients with CAC had significantly elevated big ET-1 level compared with those without CAC (0.5 ± 0.4 vs. 0.2 ± 0.2, P<0.001). In the multivariate analysis, big ET-1 (Tertile 2, HR = 3.09, 95% CI 1.66-5.74, P <0.001, Tertile3 HR = 10.42, 95% CI 3.62-29.99, P<0.001) appeared as an independent predictive factor of the presence of CAC. There was a positive correlation of the big ET-1 level with CACS (r = 0.567, p<0.001). The 10-year Framingham risk (%) was higher in the group with CACS>0 and the highest tertile of big ET-1 (P<0.01). The area under the receiver operating characteristic curve for the big ET-1 level in predicting CAC was 0.83 (95% CI 0.79-0.87, p<0.001), with a sensitivity of 70.6% and specificity of 87.7%. The data firstly demonstrated that the plasma big ET-1 level was a valuable independent predictor for CAC in our study.

  3. Disproof of Big Bang's Foundational Expansion Redshift Assumption Overthrows the Big Bang and Its No-Center Universe and Is Replaced by a Spherically Symmetric Model with Nearby Center with the 2.73 K CMR Explained by Vacuum Gravity and Doppler Effects

    NASA Astrophysics Data System (ADS)

    Gentry, Robert

    2015-04-01

    Big bang theory holds its central expansion redshift assumption quickly reduced the theorized radiation flash to ~ 1010 K, and then over 13.8 billion years reduced it further to the present 2.73 K CMR. Weinberg claims this 2.73 K value agrees with big bang theory so well that ``...we can be sure that this radiation was indeed left over from a time about a million years after the `big bang.' '' (TF3M, p180, 1993 ed.) Actually his conclusion is all based on big bang's in-flight wavelength expansion being a valid physical process. In fact all his surmising is nothing but science fiction because our disproof of GR-induced in-flight wavelength expansion [1] definitely proves the 2.73 K CMR could never have been the wavelength-expanded relic of any radiation, much less the presumed big bang's. This disproof of big bang's premier prediction is a death blow to the big bang as it is also to the idea that the redshifts in Hubble's redshift relation are expansion shifts; this negates Friedmann's everywhere-the-same, no-center universe concept and proves it does have a nearby Center, a place which can be identified in Psalm 103:19 and in Revelation 20:11 as the location of God's eternal throne. Widely published (Science, Nature, ARNS) evidence of Earth's fiat creation will also be presented. The research is supported by the God of Creation. This paper [1] is in for publication.

  4. Predictive Big Data Analytics: A Study of Parkinson's Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations.

    PubMed

    Dinov, Ivo D; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W; Price, Nathan D; Van Horn, John D; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M; Dauer, William; Toga, Arthur W

    2016-01-01

    A unique archive of Big Data on Parkinson's Disease is collected, managed and disseminated by the Parkinson's Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson's disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data-large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources-all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson's disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson's disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer's, Huntington's, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications.

  5. Carbon Anode Materials

    NASA Astrophysics Data System (ADS)

    Ogumi, Zempachi; Wang, Hongyu

    Accompanying the impressive progress of human society, energy storage technologies become evermore urgent. Among the broad categories of energy sources, batteries or cells are the devices that successfully convert chemical energy into electrical energy. Lithium-based batteries stand out in the big family of batteries mainly because of their high-energy density, which comes from the fact that lithium is the most electropositive as well as the lightest metal. However, lithium dendrite growth after repeated charge-discharge cycles easily will lead to short-circuit of the cells and an explosion hazard. Substituting lithium metal for alloys with aluminum, silicon, zinc, and so forth could solve the dendrite growth problem.1 Nevertheless, the lithium storage capacity of alloys drops down quickly after merely several charge-discharge cycles because the big volume change causes great stress in alloy crystal lattice, and thus gives rise to cracking and crumbling of the alloy particles. Alternatively, Sony Corporation succeeded in discovering the highly reversible, low-voltage anode, carbonaceous material and commercialized the C/LiCoO2 rocking chair cells in the early 1990s.2 Figure 3.1 schematically shows the charge-discharge process for reversible lithium storage in carbon. By the application of a lithiated carbon in place of a lithium metal electrode, any lithium metal plating process and the conditions for the growth of irregular dendritic lithium could be considerably eliminated, which shows promise for reducing the chances of shorting and overheating of the batteries. This kind of lithium-ion battery, which possessed a working voltage as high as 3.6 V and gravimetric energy densities between 120 and 150 Wh/kg, rapidly found applications in high-performance portable electronic devices. Thus the research on reversible lithium storage in carbonaceous materials became very popular in the battery community worldwide.

  6. [Traditional Chinese Medicine data management policy in big data environment].

    PubMed

    Liang, Yang; Ding, Chang-Song; Huang, Xin-di; Deng, Le

    2018-02-01

    As traditional data management model cannot effectively manage the massive data in traditional Chinese medicine(TCM) due to the uncertainty of data object attributes as well as the diversity and abstraction of data representation, a management strategy for TCM data based on big data technology is proposed. Based on true characteristics of TCM data, this strategy could solve the problems of the uncertainty of data object attributes in TCM information and the non-uniformity of the data representation by using modeless properties of stored objects in big data technology. Hybrid indexing mode was also used to solve the conflicts brought by different storage modes in indexing process, with powerful capabilities in query processing of massive data through efficient parallel MapReduce process. The theoretical analysis provided the management framework and its key technology, while its performance was tested on Hadoop by using several common traditional Chinese medicines and prescriptions from practical TCM data source. Result showed that this strategy can effectively solve the storage problem of TCM information, with good performance in query efficiency, completeness and robustness. Copyright© by the Chinese Pharmaceutical Association.

  7. Supercomputations and big-data analysis in strong-field ultrafast optical physics: filamentation of high-peak-power ultrashort laser pulses

    NASA Astrophysics Data System (ADS)

    Voronin, A. A.; Panchenko, V. Ya; Zheltikov, A. M.

    2016-06-01

    High-intensity ultrashort laser pulses propagating in gas media or in condensed matter undergo complex nonlinear spatiotemporal evolution where temporal transformations of optical field waveforms are strongly coupled to an intricate beam dynamics and ultrafast field-induced ionization processes. At the level of laser peak powers orders of magnitude above the critical power of self-focusing, the beam exhibits modulation instabilities, producing random field hot spots and breaking up into multiple noise-seeded filaments. This problem is described by a (3  +  1)-dimensional nonlinear field evolution equation, which needs to be solved jointly with the equation for ultrafast ionization of a medium. Analysis of this problem, which is equivalent to solving a billion-dimensional evolution problem, is only possible by means of supercomputer simulations augmented with coordinated big-data processing of large volumes of information acquired through theory-guiding experiments and supercomputations. Here, we review the main challenges of supercomputations and big-data processing encountered in strong-field ultrafast optical physics and discuss strategies to confront these challenges.

  8. A Tour of Big Data, Open Source Data Management Technologies from the Apache Software Foundation

    NASA Astrophysics Data System (ADS)

    Mattmann, C. A.

    2012-12-01

    The Apache Software Foundation, a non-profit foundation charged with dissemination of open source software for the public good, provides a suite of data management technologies for distributed archiving, data ingestion, data dissemination, processing, triage and a host of other functionalities that are becoming critical in the Big Data regime. Apache is the world's largest open source software organization, boasting over 3000 developers from around the world all contributing to some of the most pervasive technologies in use today, from the HTTPD web server that powers a majority of Internet web sites to the Hadoop technology that is now projected at over a $1B dollar industry. Apache data management technologies are emerging as de facto off-the-shelf components for searching, distributing, processing and archiving key science data sets both geophysical, space and planetary based, all the way to biomedicine. In this talk, I will give a virtual tour of the Apache Software Foundation, its meritocracy and governance structure, and also its key big data technologies that organizations can take advantage of today and use to save cost, schedule, and resources in implementing their Big Data needs. I'll illustrate the Apache technologies in the context of several national priority projects, including the U.S. National Climate Assessment (NCA), and in the International Square Kilometre Array (SKA) project that are stretching the boundaries of volume, velocity, complexity, and other key Big Data dimensions.

  9. Nano Revolution--Big Impact: How Emerging Nanotechnologies Will Change the Future of Education and Industry in America (and More Specifically in Oklahoma). An Abbreviated Account

    ERIC Educational Resources Information Center

    Holley, Steven E.

    2009-01-01

    Scientists are creating new and amazing materials by manipulating molecules at the ultra-small scale of 0.1 to 100 nanometers. Nanosize super particles demonstrate powerful and unprecedented electrical, chemical, and mechanical properties. This study examines how nanotechnology, as the multidisciplinary engineering of novel nanomaterials into…

  10. Gender Differences in Achievement in Calculating Reacting Masses from Chemical Equations among Secondary School Students in Makurdi Metropols

    ERIC Educational Resources Information Center

    Eriba, Joel O.; Ande, Sesugh

    2006-01-01

    Over the years there exists gender inequality in science achievement among senior secondary school students the world over. It is observed that the males score higher than the females in science and science- related examinations. This has created a big psychological alienation or depression in the minds of female students towards science and…

  11. Big Atoms for Small Children: Building Atomic Models from Common Materials to Better Visualize and Conceptualize Atomic Structure

    ERIC Educational Resources Information Center

    Cipolla, Laura; Ferrari, Lia A.

    2016-01-01

    A hands-on approach to introduce the chemical elements and the atomic structure to elementary/middle school students is described. The proposed classroom activity presents Bohr models of atoms using common and inexpensive materials, such as nested plastic balls, colored modeling clay, and small-sized pasta (or small plastic beads).

  12. Bio-Defense Now: 56 Suggestions for Immediate Improvements

    DTIC Science & Technology

    2005-05-01

    Air Education and Training Command HVAC Heating, Ventilation and Air Conditioning ICAM Improved Chemical Agent Monitor ICD-9-CM Internal...conditioning ( HVAC ) system capabilities, making a big difference in removal of many BW agents. High Efficiency Particulate Air (HEPA) filters are also...agents. This program has developed biological sensor-activated heating, ventilation, and air conditioning ( HVAC ) control sys- tems, high efficiency

  13. Big Events in Greece and HIV Infection Among People Who Inject Drugs

    PubMed Central

    Nikolopoulos, Georgios K.; Sypsa, Vana; Bonovas, Stefanos; Paraskevis, Dimitrios; Malliori-Minerva, Melpomeni; Hatzakis, Angelos; Friedman, Samuel R.

    2015-01-01

    Big Events are processes like macroeconomic transitions that have lowered social well-being in various settings in the past. Greece has been hit by the global crisis and experienced an HIV outbreak among people who inject drugs. Since the crisis began (2008), Greece has seen population displacement, inter-communal violence, cuts in governmental expenditures, and social movements. These may have affected normative regulation, networks, and behaviors. However, most pathways to risk remain unknown or unmeasured. We use what is known and unknown about the Greek HIV outbreak to suggest modifications in Big Events models and the need for additional research. PMID:25723309

  14. Astrophysical S-factor for destructive reactions of lithium-7 in big bang nucleosynthesis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Komatsubara, Tetsuro; Kwon, YoungKwan; Moon, JunYoung

    One of the most prominent success with the Big Bang models is the precise reproduction of mass abundance ratio for {sup 4}He. In spite of the success, abundances of lithium isotopes are still inconsistent between observations and their calculated results, which is known as lithium abundance problem. Since the calculations were based on the experimental reaction data together with theoretical estimations, more precise experimental measurements may improve the knowledge of the Big Bang nucleosynthesis. As one of the destruction process of lithium-7, we have performed measurements for the reaction cross sections of the {sup 7}L({sup 3}He,p){sup 9}Be reaction.

  15. Responses of the crab Heterozius rotundifrons to heterospecific chemical alarm cues: phylogeny vs. ecological overlap.

    PubMed

    Hazlett, Brian A; McLay, Colin

    2005-03-01

    The big-handed brachyuran crab Heterozius rotundifrons extends the time spent in its anti-predator posture, limb extended posture, if exposed to chemical cues from crushed conspecifics. In this study, we tested whether crabs also respond to chemical cues from crushed heterospecific crabs, and if so, whether phylogenetic relations or ecological overlap is more important in influencing the duration of the anti-predator posture. Chemical cues from two other brachyuran crabs (Cyclograpsus lavauxi and Hemigrapsus sexdentatus), which do not overlap directly in ecological distribution with H. rotundifrons, elicited a duration of the anti-predator posture that was indistinguishable from that produced by conspecific chemical cues. In contrast, chemical cues from two anomuran crabs (Petrolisthes elongatus and Pagurus novizealandiae) that overlap in ecological distribution with H. rotundifrons, elicited durations of the antipredator posture that were significantly shorter than those of either conspecifics or more closely related crab species. Thus, phylogenetic relationship seems to be more important than ecological overlap in influencing anti-predator behavior in H. rotundifrons.

  16. The Big-Fish-Little-Pond Effect: Generalizability of Social Comparison Processes over Two Age Cohorts from Western, Asian, and Middle Eastern Islamic Countries

    ERIC Educational Resources Information Center

    Marsh, Herbert W.; Abduljabbar, Adel Salah; Morin, Alexandre J. S.; Parker, Philip; Abdelfattah, Faisal; Nagengast, Benjamin; Abu-Hilal, Maher M.

    2015-01-01

    Extensive support for the seemingly paradoxical negative effects of school- and class-average achievement on academic self-concept (ASC)-the big-fish-little-pond effect (BFLPE)--is based largely on secondary students in Western countries or on cross-cultural Program for International Student Assessment studies. There is little research testing the…

  17. Meeting the Challenge of Doing an RCT Evaluation of Youth Mentoring in Ireland: A Journey in Mixed Methods

    ERIC Educational Resources Information Center

    Brady, Bernadine; O'Regan, Connie

    2009-01-01

    The youth mentoring program Big Brothers Big Sisters is one of the first social interventions involving youth in Ireland to be evaluated using a randomized controlled trial methodology. This article sets out the design process undertaken, describing how the research team came to adopt a concurrent embedded mixed methods design as a means of…

  18. Rethinking climate change adaptation and place through a situated pathways framework: A case study from the Big Hole Valley, USA

    Treesearch

    Daniel J. Murphy; Laurie Yung; Carina Wyborn; Daniel R. Williams

    2017-01-01

    This paper critically examines the temporal and spatial dynamics of adaptation in climate change science and explores how dynamic notions of 'place' elucidate novel ways of understanding community vulnerability and adaptation. Using data gathered from a narrative scenario-building process carried out among communities of the Big Hole Valley in Montana, the...

  19. Quality of Big Data in health care.

    PubMed

    Sukumar, Sreenivas R; Natarajan, Ramachandran; Ferrell, Regina K

    2015-01-01

    The current trend in Big Data analytics and in particular health information technology is toward building sophisticated models, methods and tools for business, operational and clinical intelligence. However, the critical issue of data quality required for these models is not getting the attention it deserves. The purpose of this paper is to highlight the issues of data quality in the context of Big Data health care analytics. The insights presented in this paper are the results of analytics work that was done in different organizations on a variety of health data sets. The data sets include Medicare and Medicaid claims, provider enrollment data sets from both public and private sources, electronic health records from regional health centers accessed through partnerships with health care claims processing entities under health privacy protected guidelines. Assessment of data quality in health care has to consider: first, the entire lifecycle of health data; second, problems arising from errors and inaccuracies in the data itself; third, the source(s) and the pedigree of the data; and fourth, how the underlying purpose of data collection impact the analytic processing and knowledge expected to be derived. Automation in the form of data handling, storage, entry and processing technologies is to be viewed as a double-edged sword. At one level, automation can be a good solution, while at another level it can create a different set of data quality issues. Implementation of health care analytics with Big Data is enabled by a road map that addresses the organizational and technological aspects of data quality assurance. The value derived from the use of analytics should be the primary determinant of data quality. Based on this premise, health care enterprises embracing Big Data should have a road map for a systematic approach to data quality. Health care data quality problems can be so very specific that organizations might have to build their own custom software or data quality rule engines. Today, data quality issues are diagnosed and addressed in a piece-meal fashion. The authors recommend a data lifecycle approach and provide a road map, that is more appropriate with the dimensions of Big Data and fits different stages in the analytical workflow.

  20. A peek into the future of radiology using big data applications

    PubMed Central

    Kharat, Amit T.; Singhal, Shubham

    2017-01-01

    Big data is extremely large amount of data which is available in the radiology department. Big data is identified by four Vs – Volume, Velocity, Variety, and Veracity. By applying different algorithmic tools and converting raw data to transformed data in such large datasets, there is a possibility of understanding and using radiology data for gaining new knowledge and insights. Big data analytics consists of 6Cs – Connection, Cloud, Cyber, Content, Community, and Customization. The global technological prowess and per-capita capacity to save digital information has roughly doubled every 40 months since the 1980's. By using big data, the planning and implementation of radiological procedures in radiology departments can be given a great boost. Potential applications of big data in the future are scheduling of scans, creating patient-specific personalized scanning protocols, radiologist decision support, emergency reporting, virtual quality assurance for the radiologist, etc. Targeted use of big data applications can be done for images by supporting the analytic process. Screening software tools designed on big data can be used to highlight a region of interest, such as subtle changes in parenchymal density, solitary pulmonary nodule, or focal hepatic lesions, by plotting its multidimensional anatomy. Following this, we can run more complex applications such as three-dimensional multi planar reconstructions (MPR), volumetric rendering (VR), and curved planar reconstruction, which consume higher system resources on targeted data subsets rather than querying the complete cross-sectional imaging dataset. This pre-emptive selection of dataset can substantially reduce the system requirements such as system memory, server load and provide prompt results. However, a word of caution, “big data should not become “dump data” due to inadequate and poor analysis and non-structured improperly stored data. In the near future, big data can ring in the era of personalized and individualized healthcare. PMID:28744087

  1. Refined scenario of standard Big Bang nucleosynthesis allowing for nonthermal nuclear reactions in the primordial plasma

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Voronchev, Victor T.; Nakao, Yasuyuki; Nakamura, Makoto

    The standard scenario of big bang nucleosynthesis (BBN) is generalized to take into account nonthermal nuclear reactions in the primordial plasma. These reactions are naturally triggered in the BBN epoch by fast particles generated in various exoergic processes. It is found that, although such particles can appreciably enhance the rates of some individual reactions, their influence on the whole process of element production is not significant. The nonthermal corrections to element abundances are obtained to be 0.1% ({sup 3}H), -0.03% ({sup 7}Li), and 0.34 %-0.63% (CNO group).

  2. Multifunctional 2D- Materials: Selenides and Halides

    NASA Technical Reports Server (NTRS)

    Singh, N. B.; Su, Ching Hua; Arnold, Brad; Choa, Fow-Sen; Bohorfous, Sara

    2016-01-01

    Material is the key component and controls the performance of the detectors, devices and sensors. The materials design, processing, growth and fabrication of bulk and nanocrystals and fabrication into devices and sensors involve multidisciplinary team of experts. This places a large burden on the cost of the novel materials development. Due to this reason there is a big thrust for the prediction of multifunctionality of materials before design and development. Up to some extent design can achieve certain properties. In multinary materials processing is also a big factor. In this presentation, examples of two classes of industrially important materials will be described.

  3. Growing and Educational Environment of College Students and Their Motivational and Self-regulated Learning

    NASA Astrophysics Data System (ADS)

    Peng, Cuixin

    Students growing and being educated in different social background may perform differently in their learning process. These differences can be found in self-regulated behavior in fulfilling a certain task. This paper focuses on the differences of students' various growing and educational environment in motivation and self-regulated learning. Results reveal that there exist differences among students from big cities, middle and small town and countryside in motivational and self-regulated learning. It also indicates that students from big cities gain more knowledge of cognitive strategies in there learning process.

  4. a New Initiative for Tiling, Stitching and Processing Geospatial Big Data in Distributed Computing Environments

    NASA Astrophysics Data System (ADS)

    Olasz, A.; Nguyen Thai, B.; Kristóf, D.

    2016-06-01

    Within recent years, several new approaches and solutions for Big Data processing have been developed. The Geospatial world is still facing the lack of well-established distributed processing solutions tailored to the amount and heterogeneity of geodata, especially when fast data processing is a must. The goal of such systems is to improve processing time by distributing data transparently across processing (and/or storage) nodes. These types of methodology are based on the concept of divide and conquer. Nevertheless, in the context of geospatial processing, most of the distributed computing frameworks have important limitations regarding both data distribution and data partitioning methods. Moreover, flexibility and expendability for handling various data types (often in binary formats) are also strongly required. This paper presents a concept for tiling, stitching and processing of big geospatial data. The system is based on the IQLib concept (https://github.com/posseidon/IQLib/) developed in the frame of the IQmulus EU FP7 research and development project (http://www.iqmulus.eu). The data distribution framework has no limitations on programming language environment and can execute scripts (and workflows) written in different development frameworks (e.g. Python, R or C#). It is capable of processing raster, vector and point cloud data. The above-mentioned prototype is presented through a case study dealing with country-wide processing of raster imagery. Further investigations on algorithmic and implementation details are in focus for the near future.

  5. Addressing the Big-Earth-Data Variety Challenge with the Hierarchical Triangular Mesh

    NASA Technical Reports Server (NTRS)

    Rilee, Michael L.; Kuo, Kwo-Sen; Clune, Thomas; Oloso, Amidu; Brown, Paul G.; Yu, Honfeng

    2016-01-01

    We have implemented an updated Hierarchical Triangular Mesh (HTM) as the basis for a unified data model and an indexing scheme for geoscience data to address the variety challenge of Big Earth Data. We observe that, in the absence of variety, the volume challenge of Big Data is relatively easily addressable with parallel processing. The more important challenge in achieving optimal value with a Big Data solution for Earth Science (ES) data analysis, however, is being able to achieve good scalability with variety. With HTM unifying at least the three popular data models, i.e. Grid, Swath, and Point, used by current ES data products, data preparation time for integrative analysis of diverse datasets can be drastically reduced and better variety scaling can be achieved. In addition, since HTM is also an indexing scheme, when it is used to index all ES datasets, data placement alignment (or co-location) on the shared nothing architecture, which most Big Data systems are based on, is guaranteed and better performance is ensured. Moreover, our updated HTM encoding turns most geospatial set operations into integer interval operations, gaining further performance advantages.

  6. The Prospect of Internet of Things and Big Data Analytics in Transportation System

    NASA Astrophysics Data System (ADS)

    Noori Hussein, Waleed; Kamarudin, L. M.; Hussain, Haider N.; Zakaria, A.; Badlishah Ahmed, R.; Zahri, N. A. H.

    2018-05-01

    Internet of Things (IoT); the new dawn technology that describes how data, people and interconnected physical objects act based on communicated information, and big data analytics have been adopted by diverse domains for varying purposes. Manufacturing, agriculture, banks, oil and gas, healthcare, retail, hospitality, and food services are few of the sectors that have adopted and massively utilized IoT and big data analytics. The transportation industry is also an early adopter, with significant attendant effects on its processes of tracking shipment, freight monitoring, and transparent warehousing. This is recorded in countries like England, Singapore, Portugal, and Germany, while Malaysia is currently assessing the potentials and researching a purpose-driven adoption and implementation. This paper, based on review of related literature, presents a summary of the inherent prospects in adopting IoT and big data analytics in the Malaysia transportation system. Efficient and safe port environment, predictive maintenance and remote management, boundary-less software platform and connected ecosystem, among others, are the inherent benefits in the IoT and big data analytics for the Malaysia transportation system.

  7. Big Data Analytics in Healthcare

    PubMed Central

    Belle, Ashwin; Thiagarajan, Raghuram; Soroushmehr, S. M. Reza; Beard, Daniel A.

    2015-01-01

    The rapidly expanding field of big data analytics has started to play a pivotal role in the evolution of healthcare practices and research. It has provided tools to accumulate, manage, analyze, and assimilate large volumes of disparate, structured, and unstructured data produced by current healthcare systems. Big data analytics has been recently applied towards aiding the process of care delivery and disease exploration. However, the adoption rate and research development in this space is still hindered by some fundamental problems inherent within the big data paradigm. In this paper, we discuss some of these major challenges with a focus on three upcoming and promising areas of medical research: image, signal, and genomics based analytics. Recent research which targets utilization of large volumes of medical data while combining multimodal data from disparate sources is discussed. Potential areas of research within this field which have the ability to provide meaningful impact on healthcare delivery are also examined. PMID:26229957

  8. The Opportunity and Challenge of The Age of Big Data

    NASA Astrophysics Data System (ADS)

    Yunguo, Hong

    2017-11-01

    The arrival of large data age has gradually expanded the scale of information industry in China, which has created favorable conditions for the expansion of information technology and computer network. Based on big data the computer system service function is becoming more and more perfect, and the efficiency of data processing in the system is improving, which provides important guarantee for the implementation of production plan in various industries. At the same time, the rapid development of fields such as Internet of things, social tools, cloud computing and the widen of information channel, these make the amount of data is increase, expand the influence range of the age of big data, we need to take the opportunities and challenges of the age of big data correctly, use data information resources effectively. Based on this, this paper will study the opportunities and challenges of the era of large data.

  9. Big Data Analytics in Healthcare.

    PubMed

    Belle, Ashwin; Thiagarajan, Raghuram; Soroushmehr, S M Reza; Navidi, Fatemeh; Beard, Daniel A; Najarian, Kayvan

    2015-01-01

    The rapidly expanding field of big data analytics has started to play a pivotal role in the evolution of healthcare practices and research. It has provided tools to accumulate, manage, analyze, and assimilate large volumes of disparate, structured, and unstructured data produced by current healthcare systems. Big data analytics has been recently applied towards aiding the process of care delivery and disease exploration. However, the adoption rate and research development in this space is still hindered by some fundamental problems inherent within the big data paradigm. In this paper, we discuss some of these major challenges with a focus on three upcoming and promising areas of medical research: image, signal, and genomics based analytics. Recent research which targets utilization of large volumes of medical data while combining multimodal data from disparate sources is discussed. Potential areas of research within this field which have the ability to provide meaningful impact on healthcare delivery are also examined.

  10. Questioning the "big assumptions". Part II: recognizing organizational contradictions that impede institutional change.

    PubMed

    Bowe, Constance M; Lahey, Lisa; Kegan, Robert; Armstrong, Elizabeth

    2003-08-01

    Well-designed medical curriculum reforms can fall short of their primary objectives during implementation when unanticipated or unaddressed organizational resistance surfaces. This typically occurs if the agents for change ignore faculty concerns during the planning stage or when the provision of essential institutional safeguards to support new behaviors are neglected. Disappointing outcomes in curriculum reforms then result in the perpetuation of or reversion to the status quo despite the loftiest of goals. Institutional resistance to change, much like that observed during personal development, does not necessarily indicate a communal lack of commitment to the organization's newly stated goals. It may reflect the existence of competing organizational objectives that must be addressed before substantive advances in a new direction can be accomplished. The authors describe how the Big Assumptions process (see previous article) was adapted and applied at the institutional level during a school of medicine's curriculum reform. Reform leaders encouraged faculty participants to articulate their reservations about considered changes to provided insights into the organization's competing commitments. The line of discussion provided an opportunity for faculty to appreciate the gridlock that existed until appropriate test of the school's long held Big Assumptions could be conducted. The Big Assumptions process proved useful in moving faculty groups to recognize and questions the validity of unchallenged institutional beliefs that were likely to undermine efforts toward change. The process also allowed the organization to put essential institutional safeguards in place that ultimately insured that substantive reforms could be sustained.

  11. BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark

    PubMed Central

    Gulzar, Muhammad Ali; Interlandi, Matteo; Yoo, Seunghyun; Tetali, Sai Deep; Condie, Tyson; Millstein, Todd; Kim, Miryung

    2016-01-01

    Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today’s data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform. This requires re-thinking the notion of step-through debugging in a traditional debugger such as gdb, because pausing the entire computation across distributed worker nodes causes significant delay and naively inspecting millions of records using a watchpoint is too time consuming for an end user. First, BIGDEBUG’s simulated breakpoints and on-demand watchpoints allow users to selectively examine distributed, intermediate data on the cloud with little overhead. Second, a user can also pinpoint a crash-inducing record and selectively resume relevant sub-computations after a quick fix. Third, a user can determine the root causes of errors (or delays) at the level of individual records through a fine-grained data provenance capability. Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average. It determines crash culprits orders of magnitude more accurately and provides up to 100% time saving compared to the baseline replay debugger. The results show that BIGDEBUG supports debugging at interactive speeds with minimal performance impact. PMID:27390389

  12. Big Data: A Parallel Particle Swarm Optimization-Back-Propagation Neural Network Algorithm Based on MapReduce

    PubMed Central

    Cao, Jianfang; Cui, Hongyan; Shi, Hao; Jiao, Lijuan

    2016-01-01

    A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design. The PSO algorithm was used to optimize the BP neural network’s initial weights and thresholds and improve the accuracy of the classification algorithm. The MapReduce parallel programming model was utilized to achieve parallel processing of the BP algorithm, thereby solving the problems of hardware and communication overhead when the BP neural network addresses big data. Datasets on 5 different scales were constructed using the scene image library from the SUN Database. The classification accuracy of the parallel PSO-BP neural network algorithm is approximately 92%, and the system efficiency is approximately 0.85, which presents obvious advantages when processing big data. The algorithm proposed in this study demonstrated both higher classification accuracy and improved time efficiency, which represents a significant improvement obtained from applying parallel processing to an intelligent algorithm on big data. PMID:27304987

  13. CoLiTec software - detection of the near-zero apparent motion

    NASA Astrophysics Data System (ADS)

    Khlamov, Sergii V.; Savanevych, Vadym E.; Briukhovetskyi, Olexandr B.; Pohorelov, Artem V.

    2017-06-01

    In this article we described CoLiTec software for full automated frames processing. CoLiTec software allows processing the Big Data of observation results as well as processing of data that is continuously formed during observation. The scope of solving tasks includes frames brightness equalization, moving objects detection, astrometry, photometry, etc. Along with the high efficiency of Big Data processing CoLiTec software also ensures high accuracy of data measurements. A comparative analysis of the functional characteristics and positional accuracy was performed between CoLiTec and Astrometrica software. The benefits of CoLiTec used with wide field and low quality frames were observed. The efficiency of the CoLiTec software was proved by about 700.000 observations and over 1.500 preliminary discoveries.

  14. Sideloading - Ingestion of Large Point Clouds Into the Apache Spark Big Data Engine

    NASA Astrophysics Data System (ADS)

    Boehm, J.; Liu, K.; Alis, C.

    2016-06-01

    In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.

  15. Big data for bipolar disorder.

    PubMed

    Monteith, Scott; Glenn, Tasha; Geddes, John; Whybrow, Peter C; Bauer, Michael

    2016-12-01

    The delivery of psychiatric care is changing with a new emphasis on integrated care, preventative measures, population health, and the biological basis of disease. Fundamental to this transformation are big data and advances in the ability to analyze these data. The impact of big data on the routine treatment of bipolar disorder today and in the near future is discussed, with examples that relate to health policy, the discovery of new associations, and the study of rare events. The primary sources of big data today are electronic medical records (EMR), claims, and registry data from providers and payers. In the near future, data created by patients from active monitoring, passive monitoring of Internet and smartphone activities, and from sensors may be integrated with the EMR. Diverse data sources from outside of medicine, such as government financial data, will be linked for research. Over the long term, genetic and imaging data will be integrated with the EMR, and there will be more emphasis on predictive models. Many technical challenges remain when analyzing big data that relates to size, heterogeneity, complexity, and unstructured text data in the EMR. Human judgement and subject matter expertise are critical parts of big data analysis, and the active participation of psychiatrists is needed throughout the analytical process.

  16. A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science

    PubMed Central

    Kumar, Vipin

    2014-01-01

    Abstract Global climate change and its impact on human life has become one of our era's greatest challenges. Despite the urgency, data science has had little impact on furthering our understanding of our planet in spite of the abundance of climate data. This is a stark contrast from other fields such as advertising or electronic commerce where big data has been a great success story. This discrepancy stems from the complex nature of climate data as well as the scientific questions climate science brings forth. This article introduces a data science audience to the challenges and opportunities to mine large climate datasets, with an emphasis on the nuanced difference between mining climate data and traditional big data approaches. We focus on data, methods, and application challenges that must be addressed in order for big data to fulfill their promise with regard to climate science applications. More importantly, we highlight research showing that solely relying on traditional big data techniques results in dubious findings, and we instead propose a theory-guided data science paradigm that uses scientific theory to constrain both the big data techniques as well as the results-interpretation process to extract accurate insight from large climate data. PMID:25276499

  17. Microalgal process-monitoring based on high-selectivity spectroscopy tools: status and future perspectives.

    PubMed

    Podevin, Michael; Fotidis, Ioannis A; Angelidaki, Irini

    2018-08-01

    Microalgae are well known for their ability to accumulate lipids intracellularly, which can be used for biofuels and mitigate CO 2 emissions. However, due to economic challenges, microalgae bioprocesses have maneuvered towards the simultaneous production of food, feed, fuel, and various high-value chemicals in a biorefinery concept. On-line and in-line monitoring of macromolecules such as lipids, proteins, carbohydrates, and high-value pigments will be more critical to maintain product quality and consistency for downstream processing in a biorefinery to maintain and valorize these markets. The main contribution of this review is to present current and prospective advances of on-line and in-line process analytical technology (PAT), with high-selectivity - the capability of monitoring several analytes simultaneously - in the interest of improving product quality, productivity, and process automation of a microalgal biorefinery. The high-selectivity PAT under consideration are mid-infrared (MIR), near-infrared (NIR), and Raman vibrational spectroscopies. The current review contains a critical assessment of these technologies in the context of recent advances in software and hardware in order to move microalgae production towards process automation through multivariate process control (MVPC) and software sensors trained on "big data". The paper will also include a comprehensive overview of off-line implementations of vibrational spectroscopy in microalgal research as it pertains to spectral interpretation and process automation to aid and motivate development.

  18. New route for hollow materials

    NASA Astrophysics Data System (ADS)

    Rivaldo-Gómez, C. M.; Ferreira, F. F.; Landi, G. T.; Souza, J. A.

    2016-08-01

    Hollow micro/nano structures form an important family of functional materials. We have used the thermal oxidation process combined with the passage of electric current during a structural phase transition to disclose a colossal mass diffusion transfer of Ti ions. This combination points to a new route for fabrication of hollow materials. A structural phase transition at high temperature prepares the stage by giving mobility to Ti ions and releasing vacancies to the system. The electric current then drives an inward delocalization of vacancies, condensing into voids, and finally turning into a big hollow. This strong physical phenomenon leading to a colossal mass transfer through ionic diffusion is suggested to be driven by a combination of phase transition and electrical current followed by chemical reaction. We show this phenomenon for Ti leading to TiO2 microtube formation, but we believe that it can be used to other metals undergoing structural phase transition at high temperatures.

  19. Impact of Aquifer Heterogeneities on Autotrophic Denitrification.

    NASA Astrophysics Data System (ADS)

    McCarthy, A.; Roques, C.; Selker, J. S.; Istok, J. D.; Pett-Ridge, J. C.

    2015-12-01

    Nitrate contamination in groundwater is a big challenge that will need to be addressed by hydrogeologists throughout the world. With a drinking water standard of 10mg/L of NO3-, innovative techniques will need to be pursued to ensure a decrease in drinking water nitrate concentration. At the pumping site scale, the influence and relationship between heterogeneous flow, mixing, and reactivity is not well understood. The purpose of this project is to incorporate both physical and chemical modeling techniques to better understand the effect of aquifer heterogeneities on autotrophic denitrification. We will investigate the link between heterogeneous hydraulic properties, transport, and the rate of autotrophic denitrification. Data collected in previous studies in laboratory experiments and pumping site scale experiments will be used to validate the models. The ultimate objective of this project is to develop a model in which such coupled processes are better understood resulting in best management practices of groundwater.

  20. Acute kidney injury in the era of big data: the 15(th) Consensus Conference of the Acute Dialysis Quality Initiative (ADQI).

    PubMed

    Bagshaw, Sean M; Goldstein, Stuart L; Ronco, Claudio; Kellum, John A

    2016-01-01

    The world is immersed in "big data". Big data has brought about radical innovations in the methods used to capture, transfer, store and analyze the vast quantities of data generated every minute of every day. At the same time; however, it has also become far easier and relatively inexpensive to do so. Rapidly transforming, integrating and applying this large volume and variety of data are what underlie the future of big data. The application of big data and predictive analytics in healthcare holds great promise to drive innovation, reduce cost and improve patient outcomes, health services operations and value. Acute kidney injury (AKI) may be an ideal syndrome from which various dimensions and applications built within the context of big data may influence the structure of services delivery, care processes and outcomes for patients. The use of innovative forms of "information technology" was originally identified by the Acute Dialysis Quality Initiative (ADQI) in 2002 as a core concept in need of attention to improve the care and outcomes for patients with AKI. For this 15(th) ADQI consensus meeting held on September 6-8, 2015 in Banff, Canada, five topics focused on AKI and acute renal replacement therapy were developed where extensive applications for use of big data were recognized and/or foreseen. In this series of articles in the Canadian Journal of Kidney Health and Disease, we describe the output from these discussions.

  1. HAFNI-enabled largescale platform for neuroimaging informatics (HELPNI).

    PubMed

    Makkie, Milad; Zhao, Shijie; Jiang, Xi; Lv, Jinglei; Zhao, Yu; Ge, Bao; Li, Xiang; Han, Junwei; Liu, Tianming

    Tremendous efforts have thus been devoted on the establishment of functional MRI informatics systems that recruit a comprehensive collection of statistical/computational approaches for fMRI data analysis. However, the state-of-the-art fMRI informatics systems are especially designed for specific fMRI sessions or studies of which the data size is not really big, and thus has difficulty in handling fMRI 'big data.' Given the size of fMRI data are growing explosively recently due to the advancement of neuroimaging technologies, an effective and efficient fMRI informatics system which can process and analyze fMRI big data is much needed. To address this challenge, in this work, we introduce our newly developed informatics platform, namely, 'HAFNI-enabled largescale platform for neuroimaging informatics (HELPNI).' HELPNI implements our recently developed computational framework of sparse representation of whole-brain fMRI signals which is called holistic atlases of functional networks and interactions (HAFNI) for fMRI data analysis. HELPNI provides integrated solutions to archive and process large-scale fMRI data automatically and structurally, to extract and visualize meaningful results information from raw fMRI data, and to share open-access processed and raw data with other collaborators through web. We tested the proposed HELPNI platform using publicly available 1000 Functional Connectomes dataset including over 1200 subjects. We identified consistent and meaningful functional brain networks across individuals and populations based on resting state fMRI (rsfMRI) big data. Using efficient sampling module, the experimental results demonstrate that our HELPNI system has superior performance than other systems for large-scale fMRI data in terms of processing and storing the data and associated results much faster.

  2. HAFNI-enabled largescale platform for neuroimaging informatics (HELPNI).

    PubMed

    Makkie, Milad; Zhao, Shijie; Jiang, Xi; Lv, Jinglei; Zhao, Yu; Ge, Bao; Li, Xiang; Han, Junwei; Liu, Tianming

    2015-12-01

    Tremendous efforts have thus been devoted on the establishment of functional MRI informatics systems that recruit a comprehensive collection of statistical/computational approaches for fMRI data analysis. However, the state-of-the-art fMRI informatics systems are especially designed for specific fMRI sessions or studies of which the data size is not really big, and thus has difficulty in handling fMRI 'big data.' Given the size of fMRI data are growing explosively recently due to the advancement of neuroimaging technologies, an effective and efficient fMRI informatics system which can process and analyze fMRI big data is much needed. To address this challenge, in this work, we introduce our newly developed informatics platform, namely, 'HAFNI-enabled largescale platform for neuroimaging informatics (HELPNI).' HELPNI implements our recently developed computational framework of sparse representation of whole-brain fMRI signals which is called holistic atlases of functional networks and interactions (HAFNI) for fMRI data analysis. HELPNI provides integrated solutions to archive and process large-scale fMRI data automatically and structurally, to extract and visualize meaningful results information from raw fMRI data, and to share open-access processed and raw data with other collaborators through web. We tested the proposed HELPNI platform using publicly available 1000 Functional Connectomes dataset including over 1200 subjects. We identified consistent and meaningful functional brain networks across individuals and populations based on resting state fMRI (rsfMRI) big data. Using efficient sampling module, the experimental results demonstrate that our HELPNI system has superior performance than other systems for large-scale fMRI data in terms of processing and storing the data and associated results much faster.

  3. [Chapter 7. Big Data or the illusion of a synthesis by aggregation. Epistemological, ethical and political critics].

    PubMed

    Coutellec, Léo; Weil-Dubuc, Paul-Loup

    2017-10-27

    In this article, we propose a critical approach to the big data phenomenon by deconstructing the methodological principle that structures its logic : the principle of aggregation. Our hypothesis is upstream of the critics who make the use of big data a new mode of government. Aggregation, as a mode of processing the heterogeneity of data, structures the thinking big data, it is its very logic. Fragmentation in order to better aggregate, to aggregate to better fragment, a dialectic based on a presumption of generalized aggregability and on the claim to make aggregation the preferred route for the production of new syntheses. We proceed in three steps to deconstruct this idea and undo the claim of aggregation to assert itself as a new way to produce knowledge, as a new synthesis of identity and finally as a new model of solidarity. Each time we show that these attempts at aggregation fail to produce their objects : no knowledge, no identity, no solidarity can result from a process of amalgamation. In all three cases, aggregation is always accompanied by a moment of fragmentation whose dissociation, dislocation and separation are different figures. The bet we are making then is to make hesitate what presents itself as a new way of thinking man and the world.

  4. Semantically-based priors and nuanced knowledge core for Big Data, Social AI, and language understanding.

    PubMed

    Olsher, Daniel

    2014-10-01

    Noise-resistant and nuanced, COGBASE makes 10 million pieces of commonsense data and a host of novel reasoning algorithms available via a family of semantically-driven prior probability distributions. Machine learning, Big Data, natural language understanding/processing, and social AI can draw on COGBASE to determine lexical semantics, infer goals and interests, simulate emotion and affect, calculate document gists and topic models, and link commonsense knowledge to domain models and social, spatial, cultural, and psychological data. COGBASE is especially ideal for social Big Data, which tends to involve highly implicit contexts, cognitive artifacts, difficult-to-parse texts, and deep domain knowledge dependencies. Copyright © 2014 Elsevier Ltd. All rights reserved.

  5. Perspective: Materials informatics and big data: Realization of the "fourth paradigm" of science in materials science

    NASA Astrophysics Data System (ADS)

    Agrawal, Ankit; Choudhary, Alok

    2016-05-01

    Our ability to collect "big data" has greatly surpassed our capability to analyze it, underscoring the emergence of the fourth paradigm of science, which is data-driven discovery. The need for data informatics is also emphasized by the Materials Genome Initiative (MGI), further boosting the emerging field of materials informatics. In this article, we look at how data-driven techniques are playing a big role in deciphering processing-structure-property-performance relationships in materials, with illustrative examples of both forward models (property prediction) and inverse models (materials discovery). Such analytics can significantly reduce time-to-insight and accelerate cost-effective materials discovery, which is the goal of MGI.

  6. Why don’t you use Evolutionary Algorithms in Big Data?

    NASA Astrophysics Data System (ADS)

    Stanovov, Vladimir; Brester, Christina; Kolehmainen, Mikko; Semenkina, Olga

    2017-02-01

    In this paper we raise the question of using evolutionary algorithms in the area of Big Data processing. We show that evolutionary algorithms provide evident advantages due to their high scalability and flexibility, their ability to solve global optimization problems and optimize several criteria at the same time for feature selection, instance selection and other data reduction problems. In particular, we consider the usage of evolutionary algorithms with all kinds of machine learning tools, such as neural networks and fuzzy systems. All our examples prove that Evolutionary Machine Learning is becoming more and more important in data analysis and we expect to see the further development of this field especially in respect to Big Data.

  7. Big data in pharmacy practice: current use, challenges, and the future.

    PubMed

    Ma, Carolyn; Smith, Helen Wong; Chu, Cherie; Juarez, Deborah T

    2015-01-01

    Pharmacy informatics is defined as the use and integration of data, information, knowledge, technology, and automation in the medication-use process for the purpose of improving health outcomes. The term "big data" has been coined and is often defined in three V's: volume, velocity, and variety. This paper describes three major areas in which pharmacy utilizes big data, including: 1) informed decision making (clinical pathways and clinical practice guidelines); 2) improved care delivery in health care settings such as hospitals and community pharmacy practice settings; and 3) quality performance measurement for the Centers for Medicare and Medicaid and medication management activities such as tracking medication adherence and medication reconciliation.

  8. Big data in pharmacy practice: current use, challenges, and the future

    PubMed Central

    Ma, Carolyn; Smith, Helen Wong; Chu, Cherie; Juarez, Deborah T

    2015-01-01

    Pharmacy informatics is defined as the use and integration of data, information, knowledge, technology, and automation in the medication-use process for the purpose of improving health outcomes. The term “big data” has been coined and is often defined in three V’s: volume, velocity, and variety. This paper describes three major areas in which pharmacy utilizes big data, including: 1) informed decision making (clinical pathways and clinical practice guidelines); 2) improved care delivery in health care settings such as hospitals and community pharmacy practice settings; and 3) quality performance measurement for the Centers for Medicare and Medicaid and medication management activities such as tracking medication adherence and medication reconciliation. PMID:29354523

  9. The Big Rust and the Red Queen: Long-Term Perspectives on Coffee Rust Research.

    PubMed

    McCook, Stuart; Vandermeer, John

    2015-09-01

    Since 2008, there has been a cluster of outbreaks of the coffee rust (Hemileia vastatrix) across the coffee-growing regions of the Americas, which have been collectively described as the Big Rust. These outbreaks have caused significant hardship to coffee producers and laborers. This essay situates the Big Rust in a broader historical context. Over the past two centuries, coffee farmers have had to deal with the "curse of the Red Queen"-the need to constantly innovate in the face of an increasing range of threats, which includes the rust. Over the 20th century, particularly after World War II, national governments and international organizations developed a network of national, regional, and international coffee research institutions. These public institutions played a vital role in helping coffee farmers manage the rust. Coffee farmers have pursued four major strategies for managing the rust: bioprospecting for resistant coffee plants, breeding resistant coffee plants, chemical control, and agroecological control. Currently, the main challenge for researchers is to develop rust control strategies that are both ecologically and economically viable for coffee farmers, in the context of a volatile, deregulated coffee industry and the emergent challenges of climate change.

  10. The Big Sky Model: A Regional Collaboration for Participatory Research on Environmental Health in the Rural West

    PubMed Central

    Ward, Tony J.; Vanek, Diana; Marra, Nancy; Holian, Andrij; Adams, Earle; Jones, David; Knuth, Randy

    2010-01-01

    The case for inquiry-based, hands-on, meaningful science education continues to gain credence as an effective and appropriate pedagogical approach (Karukstis 2005; NSF 2000). An innovative community-based framework for science learning, hereinafter referred to as the Big Sky Model, successfully addresses these educational aims, guiding high school and tribal college students from rural areas of Montana and Idaho in their understanding of chemical, physical, and environmental health concepts. Students participate in classroom lessons and continue with systematic inquiry through actual field research to investigate a pressing, real-world issue: understanding the complex links between poor air quality and respiratory health outcomes. This article provides background information, outlines the procedure for implementing the model, and discusses its effectiveness as demonstrated through various evaluation tools. PMID:20428505

  11. IBM Watson: How Cognitive Computing Can Be Applied to Big Data Challenges in Life Sciences Research.

    PubMed

    Chen, Ying; Elenee Argentinis, J D; Weber, Griff

    2016-04-01

    Life sciences researchers are under pressure to innovate faster than ever. Big data offer the promise of unlocking novel insights and accelerating breakthroughs. Ironically, although more data are available than ever, only a fraction is being integrated, understood, and analyzed. The challenge lies in harnessing volumes of data, integrating the data from hundreds of sources, and understanding their various formats. New technologies such as cognitive computing offer promise for addressing this challenge because cognitive solutions are specifically designed to integrate and analyze big datasets. Cognitive solutions can understand different types of data such as lab values in a structured database or the text of a scientific publication. Cognitive solutions are trained to understand technical, industry-specific content and use advanced reasoning, predictive modeling, and machine learning techniques to advance research faster. Watson, a cognitive computing technology, has been configured to support life sciences research. This version of Watson includes medical literature, patents, genomics, and chemical and pharmacological data that researchers would typically use in their work. Watson has also been developed with specific comprehension of scientific terminology so it can make novel connections in millions of pages of text. Watson has been applied to a few pilot studies in the areas of drug target identification and drug repurposing. The pilot results suggest that Watson can accelerate identification of novel drug candidates and novel drug targets by harnessing the potential of big data. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  12. Plasma level of big endothelin-1 predicts the prognosis in patients with hypertrophic cardiomyopathy.

    PubMed

    Wang, Yilu; Tang, Yida; Zou, Yubao; Wang, Dong; Zhu, Ling; Tian, Tao; Wang, Jizheng; Bao, Jingru; Hui, Rutai; Kang, Lianming; Song, Lei; Wang, Ji

    2017-09-15

    Cardiac remodeling is one of major pathological process in hypertrophic cardiomyopathy (HCM). Endothelin-1 has been linked to cardiac remodeling. Big endothelin-1 is the precursor of endothelin-1. A total of 245 patients with HCM were enrolled from 1999 to 2011 and partitioned to low, middle and high level groups according to their plasma big endothelin-1 levels. At baseline, significant associations were found between high level of big endothelin-1 and left atrium size, heart function and atrial fibrillation. Big endothelin-1 was positively correlated with N-terminal B-type natriuretic peptide (r=0.291, p<0.001) and late gadolinium enhancement (LGE) on magnetic resonance imaging (r=0.222, p=0.016). During a follow-up of 3 (range, 2-5) years, big endothelin-1 level was positively associated with the risks of all-cause mortality, cardiovascular death and progression to NYHA class 3 or 4 (p=0.020, 0.044 and 0.032, respectively). The rate of above events in the highest tertile were 18.1%, 15.7%, 24.2%, respectively. After adjusting for multiple factors related to survival and cardiac function, the significance remained in the association of big endothelin-1 with the risk of all-cause mortality (hazard ratio (HR)=4.94, 95% confidence interval (CI) 1.07-22.88; p=0.041) and progression to NYHA class 3 or 4 (HR=4.10, 95%CI 1.32-12.75, p=0.015). Our study showed that high level of plasma big endothelin-1 predicted prognosis for patients with HCM and it can be added to the marker panel in stratifying HCM patients for giving treatment priority to those at high risk. Copyright © 2017. Published by Elsevier B.V.

  13. Involvement of a phosphoramidon-sensitive endopeptidase in the processing of big endothelin-1 in the guinea-pig.

    PubMed

    Pons, F; Touvay, C; Lagente, V; Mencia-Huerta, J M; Braquet, P

    1992-06-24

    In anaesthetized and ventilated guinea-pigs, i.v. injection of 1 nmol/kg big endothelin-1 (big ET-1) did not evoke significant changes in pulmonary inflation pressure (PIP) and mean arterial blood pressure (MBP), whereas injection of the same dose of endothelin-1 (ET-1) induced marked and rapid bronchoconstrictor and pressor responses. Administered at the dose of 10 nmol/kg, big ET-1 provoked significant increases in PIP and MBP, which developed slowly and were long-lasting as compared to those evoked by ET-1. When big ET-1 was incubated for 45 min at 37 degrees C with alpha-chymotrypsin (2 mU/nmol) or pepsin (1 microgram/nmol) and then injected into guinea-pigs at the dose of 1 nmol/kg, marked bronchoconstrictor and pressor responses were observed, with kinetics similar to those noted after administration of the same dose of ET-1. The magnitude of the alpha-chymotrypsin- or pepsin-treated big ET-1 responses was similar to that induced by ET-1, incubated or not with the enzymes. Injected i.v. at the dose of 5 mg/kg, 5 min before the challenge, phosphoramidon almost totally inhibited the bronchoconstrictor and pressor responses induced by 10 nmol/kg big ET-1, whereas thiorphan (5 mg/kg) partially reduced the increase in PIP and exerted a minimal effect on the changes in MBP. Administered at the dose of 20 mg/kg per os, 1 h before i.v. administration of 10 nmol/kg big ET-1, enalapril maleate and captopril did not significantly alter the bronchoconstriction and the hypertensive response evoked by the peptide.(ABSTRACT TRUNCATED AT 250 WORDS)

  14. Big Data Application in Biomedical Research and Health Care: A Literature Review.

    PubMed

    Luo, Jake; Wu, Min; Gopukumar, Deepika; Zhao, Yiqing

    2016-01-01

    Big data technologies are increasingly used for biomedical and health-care informatics research. Large amounts of biological and clinical data have been generated and collected at an unprecedented speed and scale. For example, the new generation of sequencing technologies enables the processing of billions of DNA sequence data per day, and the application of electronic health records (EHRs) is documenting large amounts of patient data. The cost of acquiring and analyzing biomedical data is expected to decrease dramatically with the help of technology upgrades, such as the emergence of new sequencing machines, the development of novel hardware and software for parallel computing, and the extensive expansion of EHRs. Big data applications present new opportunities to discover new knowledge and create novel methods to improve the quality of health care. The application of big data in health care is a fast-growing field, with many new discoveries and methodologies published in the last five years. In this paper, we review and discuss big data application in four major biomedical subdisciplines: (1) bioinformatics, (2) clinical informatics, (3) imaging informatics, and (4) public health informatics. Specifically, in bioinformatics, high-throughput experiments facilitate the research of new genome-wide association studies of diseases, and with clinical informatics, the clinical field benefits from the vast amount of collected patient data for making intelligent decisions. Imaging informatics is now more rapidly integrated with cloud platforms to share medical image data and workflows, and public health informatics leverages big data techniques for predicting and monitoring infectious disease outbreaks, such as Ebola. In this paper, we review the recent progress and breakthroughs of big data applications in these health-care domains and summarize the challenges, gaps, and opportunities to improve and advance big data applications in health care.

  15. Big Data Application in Biomedical Research and Health Care: A Literature Review

    PubMed Central

    Luo, Jake; Wu, Min; Gopukumar, Deepika; Zhao, Yiqing

    2016-01-01

    Big data technologies are increasingly used for biomedical and health-care informatics research. Large amounts of biological and clinical data have been generated and collected at an unprecedented speed and scale. For example, the new generation of sequencing technologies enables the processing of billions of DNA sequence data per day, and the application of electronic health records (EHRs) is documenting large amounts of patient data. The cost of acquiring and analyzing biomedical data is expected to decrease dramatically with the help of technology upgrades, such as the emergence of new sequencing machines, the development of novel hardware and software for parallel computing, and the extensive expansion of EHRs. Big data applications present new opportunities to discover new knowledge and create novel methods to improve the quality of health care. The application of big data in health care is a fast-growing field, with many new discoveries and methodologies published in the last five years. In this paper, we review and discuss big data application in four major biomedical subdisciplines: (1) bioinformatics, (2) clinical informatics, (3) imaging informatics, and (4) public health informatics. Specifically, in bioinformatics, high-throughput experiments facilitate the research of new genome-wide association studies of diseases, and with clinical informatics, the clinical field benefits from the vast amount of collected patient data for making intelligent decisions. Imaging informatics is now more rapidly integrated with cloud platforms to share medical image data and workflows, and public health informatics leverages big data techniques for predicting and monitoring infectious disease outbreaks, such as Ebola. In this paper, we review the recent progress and breakthroughs of big data applications in these health-care domains and summarize the challenges, gaps, and opportunities to improve and advance big data applications in health care. PMID:26843812

  16. Functional magnetic resonance imaging of divergent and convergent thinking in Big-C creativity.

    PubMed

    Japardi, Kevin; Bookheimer, Susan; Knudsen, Kendra; Ghahremani, Dara G; Bilder, Robert M

    2018-02-15

    The cognitive and physiological processes underlying creativity remain unclear, and very few studies to date have attempted to identify the behavioral and brain characteristics that distinguish exceptional ("Big-C") from everyday ("little-c") creativity. The Big-C Project examined functional brain responses during tasks demanding divergent and convergent thinking in 35 Big-C Visual Artists (VIS), 41 Big-C Scientists (SCI), and 31 individuals in a "smart comparison group" (SCG) matched to the Big-C groups on parental educational attainment and estimated IQ. Functional MRI (fMRI) scans included two activation paradigms widely used in prior creativity research, the Alternate Uses Task (AUT) and Remote Associates Task (RAT), to assess brain function during divergent and convergent thinking, respectively. Task performance did not differ between groups. Functional MRI activation in Big-C and SCG groups differed during the divergent thinking task. No differences in activation were seen during the convergent thinking task. Big-C groups had less activation than SCG in frontal pole, right frontal operculum, left middle frontal gyrus, and bilaterally in occipital cortex. SCI displayed lower frontal and parietal activation relative to the SCG when generating alternate uses in the AUT, while VIS displayed lower frontal activation than SCI and SCG when generating typical qualities (the control condition in the AUT). VIS showed more activation in right inferior frontal gyrus and left supramarginal gyrus relative to SCI. All groups displayed considerable overlapping activation during the RAT. The results confirm substantial overlap in functional activation across groups, but suggest that exceptionally creative individuals may depend less on task-positive networks during tasks that demand divergent thinking. Published by Elsevier Ltd.

  17. Predictive Big Data Analytics: A Study of Parkinson’s Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations

    PubMed Central

    Dinov, Ivo D.; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W.; Price, Nathan D.; Van Horn, John D.; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M.; Dauer, William; Toga, Arthur W.

    2016-01-01

    Background A unique archive of Big Data on Parkinson’s Disease is collected, managed and disseminated by the Parkinson’s Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson’s disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data–large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources–all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Methods and Findings Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson’s disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Conclusions Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson’s disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer’s, Huntington’s, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications. PMID:27494614

  18. Chemical weathering as a mechanism for the climatic control of bedrock river incision

    NASA Astrophysics Data System (ADS)

    Murphy, Brendan P.; Johnson, Joel P. L.; Gasparini, Nicole M.; Sklar, Leonard S.

    2016-04-01

    Feedbacks between climate, erosion and tectonics influence the rates of chemical weathering reactions, which can consume atmospheric CO2 and modulate global climate. However, quantitative predictions for the coupling of these feedbacks are limited because the specific mechanisms by which climate controls erosion are poorly understood. Here we show that climate-dependent chemical weathering controls the erodibility of bedrock-floored rivers across a rainfall gradient on the Big Island of Hawai‘i. Field data demonstrate that the physical strength of bedrock in streambeds varies with the degree of chemical weathering, which increases systematically with local rainfall rate. We find that incorporating the quantified relationships between local rainfall and erodibility into a commonly used river incision model is necessary to predict the rates and patterns of downcutting of these rivers. In contrast to using only precipitation-dependent river discharge to explain the climatic control of bedrock river incision, the mechanism of chemical weathering can explain strong coupling between local climate and river incision.

  19. Volcanology: Petit spots go big

    NASA Astrophysics Data System (ADS)

    Snow, Jonathan E.

    2016-12-01

    Mantle enrichment processes were thought to be limited to parts of oceanic plates influenced by plumes and to continental interiors. Analyses of mantle fragments of the Pacific Plate suggest that such enrichment processes may operate everywhere.

  20. Evaluation of the Exceedance Rate of a Stationary Stochastic Process by Statistical Extrapolation Using the Envelope Peaks over Threshold (EPOT) Method

    DTIC Science & Technology

    2011-04-01

    this limitation the length of the windows needs to be shortened. It is also leads to a narrower confidence interval, see Figure 2.9. 82 The " big ...least one event will occur within the window. The windows are then grouped in sets of two and the process is reapeated for a window size twice as big ...0 505 T. Fu 1 506 D. Walden 1 508 J. Brown 1 55 T.Applebee 0 55 M. Dipper 1 551 T. Smith I 551 C. Bassler 3 3 551 V. Belenky 1 551 W. Belknap

  1. It's Not a Big Sky After All: Justification for a Close Approach Prediction and Risk Assessment Process

    NASA Technical Reports Server (NTRS)

    Newman, Lauri Kraft; Frigm, Ryan; McKinley, David

    2009-01-01

    There is often skepticism about the need for Conjunction Assessment from mission operators that invest in the "big sky theory", which states that the likelihood of a collision is so small that it can be neglected. On 10 February 2009, the collision between Iridium 3; and Cosmos 2251 provided an indication that this theory is invalid and that a CA process should be considered for all missions. This paper presents statistics of the effect of the Iridium/Cosmos collision on NASA's Earth Science Constellation as well as results of analyses which characterize the debris environment for NASA's robotic missions.

  2. The Widening Gulf between Genomics Data Generation and Consumption: A Practical Guide to Big Data Transfer Technology.

    PubMed

    Feltus, Frank A; Breen, Joseph R; Deng, Juan; Izard, Ryan S; Konger, Christopher A; Ligon, Walter B; Preuss, Don; Wang, Kuang-Ching

    2015-01-01

    In the last decade, high-throughput DNA sequencing has become a disruptive technology and pushed the life sciences into a distributed ecosystem of sequence data producers and consumers. Given the power of genomics and declining sequencing costs, biology is an emerging "Big Data" discipline that will soon enter the exabyte data range when all subdisciplines are combined. These datasets must be transferred across commercial and research networks in creative ways since sending data without thought can have serious consequences on data processing time frames. Thus, it is imperative that biologists, bioinformaticians, and information technology engineers recalibrate data processing paradigms to fit this emerging reality. This review attempts to provide a snapshot of Big Data transfer across networks, which is often overlooked by many biologists. Specifically, we discuss four key areas: 1) data transfer networks, protocols, and applications; 2) data transfer security including encryption, access, firewalls, and the Science DMZ; 3) data flow control with software-defined networking; and 4) data storage, staging, archiving and access. A primary intention of this article is to orient the biologist in key aspects of the data transfer process in order to frame their genomics-oriented needs to enterprise IT professionals.

  3. BIG DATA ANALYTICS AND PRECISION ANIMAL AGRICULTURE SYMPOSIUM: Data to decisions.

    PubMed

    White, B J; Amrine, D E; Larson, R L

    2018-04-14

    Big data are frequently used in many facets of business and agronomy to enhance knowledge needed to improve operational decisions. Livestock operations collect data of sufficient quantity to perform predictive analytics. Predictive analytics can be defined as a methodology and suite of data evaluation techniques to generate a prediction for specific target outcomes. The objective of this manuscript is to describe the process of using big data and the predictive analytic framework to create tools to drive decisions in livestock production, health, and welfare. The predictive analytic process involves selecting a target variable, managing the data, partitioning the data, then creating algorithms, refining algorithms, and finally comparing accuracy of the created classifiers. The partitioning of the datasets allows model building and refining to occur prior to testing the predictive accuracy of the model with naive data to evaluate overall accuracy. Many different classification algorithms are available for predictive use and testing multiple algorithms can lead to optimal results. Application of a systematic process for predictive analytics using data that is currently collected or that could be collected on livestock operations will facilitate precision animal management through enhanced livestock operational decisions.

  4. The Path of the Blind Watchmaker: A Model of Evolution

    DTIC Science & Technology

    2011-04-06

    computational biology has now reached the point that astronomy reached when it began to look backward in time to the Big Bang. Our goal is look backward in...treatment. We claim that computational biology has now reached the point that astronomy reached when it began to look backward in time to the Big...evolutionary process itself, in fact, created it. When astronomy reached a critical mass of theory, technology, and observational data, astronomers

  5. Big Data Challenges for Large Radio Arrays

    NASA Technical Reports Server (NTRS)

    Jones, Dayton L.; Wagstaff, Kiri; Thompson, David; D'Addario, Larry; Navarro, Robert; Mattmann, Chris; Majid, Walid; Lazio, Joseph; Preston, Robert; Rebbapragada, Umaa

    2012-01-01

    Future large radio astronomy arrays, particularly the Square Kilometre Array (SKA), will be able to generate data at rates far higher than can be analyzed or stored affordably with current practices. This is, by definition, a "big data" problem, and requires an end-to-end solution if future radio arrays are to reach their full scientific potential. Similar data processing, transport, storage, and management challenges face next-generation facilities in many other fields.

  6. Neutrino mixing and big bang nucleosynthesis

    NASA Astrophysics Data System (ADS)

    Bell, Nicole

    2003-04-01

    We analyse active-active neutrino mixing in the early universe and show that transformation of neutrino-antineutrino asymmetries between flavours is unavoidable when neutrino mixing angles are large. This process is a standard Mikheyev-Smirnov-Wolfenstein flavour transformation, modified by the synchronisation of momentum states which results from neutrino-neutrino forward scattering. The new constraints placed on neutrino asymmetries eliminate the possibility of degenerate big bang nucleosynthesis.Implications of active-sterile neutrino mixing will also be reviewed.

  7. A Rapid Turn-around, Scalable Big Data Processing Capability for the JPL Airborne Snow Observatory (ASO) Mission

    NASA Astrophysics Data System (ADS)

    Mattmann, C. A.

    2014-12-01

    The JPL Airborne Snow Observatory (ASO) is an integrated LIDAR and Spectrometer measuring snow depth and rate of snow melt in the Sierra Nevadas, specifically, the Tuolumne River Basin, Sierra Nevada, California above the O'Shaughnessy Dam of the Hetch Hetchy reservoir, and the Uncompahgre Basin, Colorado, amongst other sites. The ASO data was delivered to water resource managers from the California Department of Water Resources in under 24 hours from the time that the Twin Otter aircraft landed in Mammoth Lakes, CA to the time disks were plugged in to the ASO Mobile Compute System (MCS) deployed at the Sierra Nevada Aquatic Research Laboratory (SNARL) near the airport. ASO performed weekly flights and each flight took between 500GB to 1 Terabyte of raw data, which was then processed from level 0 data products all the way to full level 4 maps of Snow Water Equivalent, albedo mosaics, and snow depth from LIDAR. These data were produced by Interactive Data analysis Language (IDL) algorithms which were then unobtrusively and automatically integrated into an Apache OODT and Apache Tika based Big Data processing system. Data movement was both electronic and physical including novel uses of LaCie 1 and 2 TeraByte (TB) data bricks and deployment in rugged terrain. The MCS was controlled remotely from the Jet Propulsion Laboratory, California Institute of Technology (JPL) in Pasadena, California on behalf of the National Aeronautics and Space Administration (NASA). Communication was aided through the use of novel Internet Relay Chat (IRC) command and control mechanisms and through the use of the Notifico open source communication tools. This talk will describe the high powered, and light-weight Big Data processing system that we developed for ASO and its implications more broadly for airborne missions at NASA and throughout the government. The lessons learned from ASO show the potential to have a large impact in the development of Big Data processing systems in the years to come.

  8. The rise of artificial intelligence and the uncertain future for physicians.

    PubMed

    Krittanawong, C

    2018-02-01

    Physicians in everyday clinical practice are under pressure to innovate faster than ever because of the rapid, exponential growth in healthcare data. "Big data" refers to extremely large data sets that cannot be analyzed or interpreted using traditional data processing methods. In fact, big data itself is meaningless, but processing it offers the promise of unlocking novel insights and accelerating breakthroughs in medicine-which in turn has the potential to transform current clinical practice. Physicians can analyze big data, but at present it requires a large amount of time and sophisticated analytic tools such as supercomputers. However, the rise of artificial intelligence (AI) in the era of big data could assist physicians in shortening processing times and improving the quality of patient care in clinical practice. This editorial provides a glimpse at the potential uses of AI technology in clinical practice and considers the possibility of AI replacing physicians, perhaps altogether. Physicians diagnose diseases based on personal medical histories, individual biomarkers, simple scores (e.g., CURB-65, MELD), and their physical examinations of individual patients. In contrast, AI can diagnose diseases based on a complex algorithm using hundreds of biomarkers, imaging results from millions of patients, aggregated published clinical research from PubMed, and thousands of physician's notes from electronic health records (EHRs). While AI could assist physicians in many ways, it is unlikely to replace physicians in the foreseeable future. Let us look at the emerging uses of AI in medicine. Copyright © 2017 European Federation of Internal Medicine. Published by Elsevier B.V. All rights reserved.

  9. Big data prediction of durations for online collective actions based on peak's timing

    NASA Astrophysics Data System (ADS)

    Nie, Shizhao; Wang, Zheng; Pujia, Wangmo; Nie, Yuan; Lu, Peng

    2018-02-01

    Peak Model states that each collective action has a life circle, which contains four periods of "prepare", "outbreak", "peak", and "vanish"; and the peak determines the max energy and the whole process. The peak model's re-simulation indicates that there seems to be a stable ratio between the peak's timing (TP) and the total span (T) or duration of collective actions, which needs further validations through empirical data of collective actions. Therefore, the daily big data of online collective actions is applied to validate the model; and the key is to check the ratio between peak's timing and the total span. The big data is obtained from online data recording & mining of websites. It is verified by the empirical big data that there is a stable ratio between TP and T; furthermore, it seems to be normally distributed. This rule holds for both the general cases and the sub-types of collective actions. Given the distribution of the ratio, estimated probability density function can be obtained, and therefore the span can be predicted via the peak's timing. Under the scenario of big data, the instant span (how long the collective action lasts or when it ends) will be monitored and predicted in real-time. With denser data (Big Data), the estimation of the ratio's distribution gets more robust, and the prediction of collective actions' spans or durations will be more accurate.

  10. Scalability and Validation of Big Data Bioinformatics Software.

    PubMed

    Yang, Andrian; Troup, Michael; Ho, Joshua W K

    2017-01-01

    This review examines two important aspects that are central to modern big data bioinformatics analysis - software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.

  11. Making big sense from big data in toxicology by read-across.

    PubMed

    Hartung, Thomas

    2016-01-01

    Modern information technologies have made big data available in safety sciences, i.e., extremely large data sets that may be analyzed only computationally to reveal patterns, trends and associations. This happens by (1) compilation of large sets of existing data, e.g., as a result of the European REACH regulation, (2) the use of omics technologies and (3) systematic robotized testing in a high-throughput manner. All three approaches and some other high-content technologies leave us with big data--the challenge is now to make big sense of these data. Read-across, i.e., the local similarity-based intrapolation of properties, is gaining momentum with increasing data availability and consensus on how to process and report it. It is predominantly applied to in vivo test data as a gap-filling approach, but can similarly complement other incomplete datasets. Big data are first of all repositories for finding similar substances and ensure that the available data is fully exploited. High-content and high-throughput approaches similarly require focusing on clusters, in this case formed by underlying mechanisms such as pathways of toxicity. The closely connected properties, i.e., structural and biological similarity, create the confidence needed for predictions of toxic properties. Here, a new web-based tool under development called REACH-across, which aims to support and automate structure-based read-across, is presented among others.

  12. A Call to Investigate the Relationship Between Education and Health Outcomes Using Big Data.

    PubMed

    Chahine, Saad; Kulasegaram, Kulamakan Mahan; Wright, Sarah; Monteiro, Sandra; Grierson, Lawrence E M; Barber, Cassandra; Sebok-Syer, Stefanie S; McConnell, Meghan; Yen, Wendy; De Champlain, Andre; Touchie, Claire

    2018-06-01

    There exists an assumption that improving medical education will improve patient care. While seemingly logical, this premise has rarely been investigated. In this Invited Commentary, the authors propose the use of big data to test this assumption. The authors present a few example research studies linking education and patient care outcomes and argue that using big data may more easily facilitate the process needed to investigate this assumption. The authors also propose that collaboration is needed to link educational and health care data. They then introduce a grassroots initiative, inclusive of universities in one Canadian province and national licensing organizations that are working together to collect, organize, link, and analyze big data to study the relationship between pedagogical approaches to medical training and patient care outcomes. While the authors acknowledge the possible challenges and issues associated with harnessing big data, they believe that the benefits supersede these. There is a need for medical education research to go beyond the outcomes of training to study practice and clinical outcomes as well. Without a coordinated effort to harness big data, policy makers, regulators, medical educators, and researchers are left with sometimes costly guesses and assumptions about what works and what does not. As the social, time, and financial investments in medical education continue to increase, it is imperative to understand the relationship between education and health outcomes.

  13. A Columnar Storage Strategy with Spatiotemporal Index for Big Climate Data

    NASA Astrophysics Data System (ADS)

    Hu, F.; Bowen, M. K.; Li, Z.; Schnase, J. L.; Duffy, D.; Lee, T. J.; Yang, C. P.

    2015-12-01

    Large collections of observational, reanalysis, and climate model output data may grow to as large as a 100 PB in the coming years, so climate dataset is in the Big Data domain, and various distributed computing frameworks have been utilized to address the challenges by big climate data analysis. However, due to the binary data format (NetCDF, HDF) with high spatial and temporal dimensions, the computing frameworks in Apache Hadoop ecosystem are not originally suited for big climate data. In order to make the computing frameworks in Hadoop ecosystem directly support big climate data, we propose a columnar storage format with spatiotemporal index to store climate data, which will support any project in the Apache Hadoop ecosystem (e.g. MapReduce, Spark, Hive, Impala). With this approach, the climate data will be transferred into binary Parquet data format, a columnar storage format, and spatial and temporal index will be built and attached into the end of Parquet files to enable real-time data query. Then such climate data in Parquet data format could be available to any computing frameworks in Hadoop ecosystem. The proposed approach is evaluated using the NASA Modern-Era Retrospective Analysis for Research and Applications (MERRA) climate reanalysis dataset. Experimental results show that this approach could efficiently overcome the gap between the big climate data and the distributed computing frameworks, and the spatiotemporal index could significantly accelerate data querying and processing.

  14. True Randomness from Big Data.

    PubMed

    Papakonstantinou, Periklis A; Woodruff, David P; Yang, Guang

    2016-09-26

    Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests.

  15. A survey on platforms for big data analytics.

    PubMed

    Singh, Dilpreet; Reddy, Chandan K

    The primary purpose of this paper is to provide an in-depth analysis of different platforms available for performing big data analytics. This paper surveys different hardware platforms available for big data analytics and assesses the advantages and drawbacks of each of these platforms based on various metrics such as scalability, data I/O rate, fault tolerance, real-time processing, data size supported and iterative task support. In addition to the hardware, a detailed description of the software frameworks used within each of these platforms is also discussed along with their strengths and drawbacks. Some of the critical characteristics described here can potentially aid the readers in making an informed decision about the right choice of platforms depending on their computational needs. Using a star ratings table, a rigorous qualitative comparison between different platforms is also discussed for each of the six characteristics that are critical for the algorithms of big data analytics. In order to provide more insights into the effectiveness of each of the platform in the context of big data analytics, specific implementation level details of the widely used k-means clustering algorithm on various platforms are also described in the form pseudocode.

  16. True Randomness from Big Data

    NASA Astrophysics Data System (ADS)

    Papakonstantinou, Periklis A.; Woodruff, David P.; Yang, Guang

    2016-09-01

    Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests.

  17. From big data to deep insight in developmental science

    PubMed Central

    2016-01-01

    The use of the term ‘big data’ has grown substantially over the past several decades and is now widespread. In this review, I ask what makes data ‘big’ and what implications the size, density, or complexity of datasets have for the science of human development. A survey of existing datasets illustrates how existing large, complex, multilevel, and multimeasure data can reveal the complexities of developmental processes. At the same time, significant technical, policy, ethics, transparency, cultural, and conceptual issues associated with the use of big data must be addressed. Most big developmental science data are currently hard to find and cumbersome to access, the field lacks a culture of data sharing, and there is no consensus about who owns or should control research data. But, these barriers are dissolving. Developmental researchers are finding new ways to collect, manage, store, share, and enable others to reuse data. This promises a future in which big data can lead to deeper insights about some of the most profound questions in behavioral science. WIREs Cogn Sci 2016, 7:112–126. doi: 10.1002/wcs.1379 For further resources related to this article, please visit the WIREs website. PMID:26805777

  18. True Randomness from Big Data

    PubMed Central

    Papakonstantinou, Periklis A.; Woodruff, David P.; Yang, Guang

    2016-01-01

    Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests. PMID:27666514

  19. A Big Data and Learning Analytics Approach to Process-Level Feedback in Cognitive Simulations.

    PubMed

    Pecaric, Martin; Boutis, Kathy; Beckstead, Jason; Pusic, Martin

    2017-02-01

    Collecting and analyzing large amounts of process data for the purposes of education can be considered a big data/learning analytics (BD/LA) approach to improving learning. However, in the education of health care professionals, the application of BD/LA is limited to date. The authors discuss the potential advantages of the BD/LA approach for the process of learning via cognitive simulations. Using the lens of a cognitive model of radiograph interpretation with four phases (orientation, searching/scanning, feature detection, and decision making), they reanalyzed process data from a cognitive simulation of pediatric ankle radiography where 46 practitioners from three expertise levels classified 234 cases online. To illustrate the big data component, they highlight the data available in a digital environment (time-stamped, click-level process data). Learning analytics were illustrated using algorithmic computer-enabled approaches to process-level feedback.For each phase, the authors were able to identify examples of potentially useful BD/LA measures. For orientation, the trackable behavior of re-reviewing the clinical history was associated with increased diagnostic accuracy. For searching/scanning, evidence of skipping views was associated with an increased false-negative rate. For feature detection, heat maps overlaid on the radiograph can provide a metacognitive visualization of common novice errors. For decision making, the measured influence of sequence effects can reflect susceptibility to bias, whereas computer-generated path maps can provide insights into learners' diagnostic strategies.In conclusion, the augmented collection and dynamic analysis of learning process data within a cognitive simulation can improve feedback and prompt more precise reflection on a novice clinician's skill development.

  20. Complex wet-environments in electronic-structure calculations

    NASA Astrophysics Data System (ADS)

    Fisicaro, Giuseppe; Genovese, Luigi; Andreussi, Oliviero; Marzari, Nicola; Goedecker, Stefan

    The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions in the presence of an applied electrochemical potentials, including complex electrostatic screening coming from the solvent. In the present work we present a solver to handle both the Generalized Poisson and the Poisson-Boltzmann equation. A preconditioned conjugate gradient (PCG) method has been implemented for the Generalized Poisson and the linear regime of the Poisson-Boltzmann, allowing to solve iteratively the minimization problem with some ten iterations. On the other hand, a self-consistent procedure enables us to solve the Poisson-Boltzmann problem. The algorithms take advantage of a preconditioning procedure based on the BigDFT Poisson solver for the standard Poisson equation. They exhibit very high accuracy and parallel efficiency, and allow different boundary conditions, including surfaces. The solver has been integrated into the BigDFT and Quantum-ESPRESSO electronic-structure packages and it will be released as a independent program, suitable for integration in other codes. We present test calculations for large proteins to demonstrate efficiency and performances. This work was done within the PASC and NCCR MARVEL projects. Computer resources were provided by the Swiss National Supercomputing Centre (CSCS) under Project ID s499. LG acknowledges also support from the EXTMOS EU project.

  1. ANTIOXIDANT CAPACITY OF WYOMING BIG SAGEBRUSH (ARTEMISIA TRIDENTATA SSP. WYOMINGENSIS) VARIES SPATIALLY AND IS NOT RELATED TO THE PRESENCE OF A SAGEBRUSH DIETARY SPECIALIST

    PubMed Central

    Pu, Xinzhu; Lam, Lisa; Gehlken, Kristina; Ulappa, Amy C.; Rachlow, Janet L.; Forbey, Jennifer Sorensen

    2015-01-01

    Sagebrush (Artemisia spp.) in North America is an abundant native plant species that is ecologically and evolutionarily adapted to have a diverse array of biologically active chemicals. Several of these chemicals, specifically polyphenols, have antioxidant activity that may act as biomarkers of biotic or abiotic stress. This study investigated the spatial variation of antioxidant capacity, as well as the relationship between a mammalian herbivore and antioxidant capacity in Wyoming big sagebrush (Artemisia tridentata wyomingensis). We quantified and compared total polyphenols and antioxidant capacity of leaf extracts from sagebrush plants from different spatial scales and at different levels of browsing by a specialist mammalian herbivore, the pygmy rabbit (Brachylagus idahoensis). We found that antioxidant capacity of sagebrush extracts was positively correlated with total polyphenol content. Antioxidant capacity varied spatially within and among plants. Antioxidant capacity in sagebrush was not related to either browsing intensity or duration of association with rabbits. We propose that the patterns of antioxidant capacity observed in sagebrush may be a result of spatial variation in abiotic stress experienced by sagebrush. Antioxidants could therefore provide a biomarker of environmental stress for sagebrush that could aid in management and conservation of this plant in the threatened sagebrush steppe. PMID:26582971

  2. What is big data? A consensual definition and a review of key research topics

    NASA Astrophysics Data System (ADS)

    De Mauro, Andrea; Greco, Marco; Grimaldi, Michele

    2015-02-01

    Although Big Data is a trending buzzword in both academia and the industry, its meaning is still shrouded by much conceptual vagueness. The term is used to describe a wide range of concepts: from the technological ability to store, aggregate, and process data, to the cultural shift that is pervasively invading business and society, both drowning in information overload. The lack of a formal definition has led research to evolve into multiple and inconsistent paths. Furthermore, the existing ambiguity among researchers and practitioners undermines an efficient development of the subject. In this paper we have reviewed the existing literature on Big Data and analyzed its previous definitions in order to pursue two results: first, to provide a summary of the key research areas related to the phenomenon, identifying emerging trends and suggesting opportunities for future development; second, to provide a consensual definition for Big Data, by synthesizing common themes of existing works and patterns in previous definitions.

  3. Knowledge Discovery for Smart Grid Operation, Control, and Situation Awareness -- A Big Data Visualization Platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gu, Yi; Jiang, Huaiguang; Zhang, Yingchen

    In this paper, a big data visualization platform is designed to discover the hidden useful knowledge for smart grid (SG) operation, control and situation awareness. The spawn of smart sensors at both grid side and customer side can provide large volume of heterogeneous data that collect information in all time spectrums. Extracting useful knowledge from this big-data poll is still challenging. In this paper, the Apache Spark, an open source cluster computing framework, is used to process the big-data to effectively discover the hidden knowledge. A high-speed communication architecture utilizing the Open System Interconnection (OSI) model is designed to transmitmore » the data to a visualization platform. This visualization platform uses Google Earth, a global geographic information system (GIS) to link the geological information with the SG knowledge and visualize the information in user defined fashion. The University of Denver's campus grid is used as a SG test bench and several demonstrations are presented for the proposed platform.« less

  4. Big Data Technologies: New Opportunities for Diabetes Management.

    PubMed

    Bellazzi, Riccardo; Dagliati, Arianna; Sacchi, Lucia; Segagni, Daniele

    2015-04-24

    The so-called big data revolution provides substantial opportunities to diabetes management. At least 3 important directions are currently of great interest. First, the integration of different sources of information, from primary and secondary care to administrative information, may allow depicting a novel view of patient's care processes and of single patient's behaviors, taking into account the multifaceted nature of chronic care. Second, the availability of novel diabetes technologies, able to gather large amounts of real-time data, requires the implementation of distributed platforms for data analysis and decision support. Finally, the inclusion of geographical and environmental information into such complex IT systems may further increase the capability of interpreting the data gathered and extract new knowledge from them. This article reviews the main concepts and definitions related to big data, it presents some efforts in health care, and discusses the potential role of big data in diabetes care. Finally, as an example, it describes the research efforts carried on in the MOSAIC project, funded by the European Commission. © 2015 Diabetes Technology Society.

  5. Big Data Technologies

    PubMed Central

    Bellazzi, Riccardo; Dagliati, Arianna; Sacchi, Lucia; Segagni, Daniele

    2015-01-01

    The so-called big data revolution provides substantial opportunities to diabetes management. At least 3 important directions are currently of great interest. First, the integration of different sources of information, from primary and secondary care to administrative information, may allow depicting a novel view of patient’s care processes and of single patient’s behaviors, taking into account the multifaceted nature of chronic care. Second, the availability of novel diabetes technologies, able to gather large amounts of real-time data, requires the implementation of distributed platforms for data analysis and decision support. Finally, the inclusion of geographical and environmental information into such complex IT systems may further increase the capability of interpreting the data gathered and extract new knowledge from them. This article reviews the main concepts and definitions related to big data, it presents some efforts in health care, and discusses the potential role of big data in diabetes care. Finally, as an example, it describes the research efforts carried on in the MOSAIC project, funded by the European Commission. PMID:25910540

  6. Could Blobs Fuel Storage-Based Convergence between HPC and Big Data?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matri, Pierre; Alforov, Yevhen; Brandon, Alvaro

    The increasingly growing data sets processed on HPC platforms raise major challenges for the underlying storage layer. A promising alternative to POSIX-IO- compliant file systems are simpler blobs (binary large objects), or object storage systems. Such systems offer lower overhead and better performance at the cost of largely unused features such as file hierarchies or permissions. Similarly, blobs are increasingly considered for replacing distributed file systems for big data analytics or as a base for storage abstractions such as key-value stores or time-series databases. This growing interest in such object storage on HPC and big data platforms raises the question:more » Are blobs the right level of abstraction to enable storage-based convergence between HPC and Big Data? In this paper we study the impact of blob-based storage for real-world applications on HPC and cloud environments. The results show that blobbased storage convergence is possible, leading to a significant performance improvement on both platforms« less

  7. Hydrologic, vegetation, and soil data collected in selected wetlands of the Big River Management area, Rhode Island, from 2008 through 2010

    USGS Publications Warehouse

    Borenstein, Meredith S.; Golet, Francis C.; Armstrong, David S.; Breault, Robert F.; McCobb, Timothy D.; Weiskel, Peter K.

    2012-01-01

    The Rhode Island Water Resources Board planned to develop public water-supply wells in the Big River Management Area in Kent County, Rhode Island. Research in the United States and abroad indicates that groundwater withdrawal has the potential to affect wetland hydrology and related processes. In May 2008, the Rhode Island Water Resources Board, the U.S. Geological Survey, and the University of Rhode Island formed a partnership to establish baseline conditions at selected Big River wetland study sites and to develop an approach for monitoring potential impacts once pumping begins. In 2008 and 2009, baseline data were collected on the hydrology, vegetation, and soil characteristics at five forested wetland study sites in the Big River Management Area. Four of the sites were located in areas of potential drawdown associated with the projected withdrawals. The fifth site was located outside the area of projected drawdown and served as a control site. The data collected during this study are presented in this report.

  8. Lighting innovations in concept cars

    NASA Astrophysics Data System (ADS)

    Berlitz, Stephan; Huhn, Wolfgang

    2005-02-01

    Concept cars have their own styling process. Because of the big media interest they give a big opportunity to bring newest technology with styling ideas to different fairgrounds. The LED technology in the concept cars Audi Pikes Peak, Nuvolari and Le Mans will be explained. Further outlook for the Audi LED strategy starting with LED Daytime Running Lamp will be given. The close work between styling and technical engineers results in those concept cars and further technical innovations based on LED technologies.

  9. Solar Data Mining at Georgia State University

    NASA Astrophysics Data System (ADS)

    Angryk, R.; Martens, P. C.; Schuh, M.; Aydin, B.; Kempton, D.; Banda, J.; Ma, R.; Naduvil-Vadukootu, S.; Akkineni, V.; Küçük, A.; Filali Boubrahimi, S.; Hamdi, S. M.

    2016-12-01

    In this talk we give an overview of research projects related to solar data analysis that are conducted at Georgia State University. We will provide update on multiple advances made by our research team on the analysis of image parameters, spatio-temporal patterns mining, temporal data analysis and our experiences with big, heterogeneous solar data visualization, analysis, processing and storage. We will talk about up-to-date data mining methodologies, and their importance for big data-driven solar physics research.

  10. Reliable, Memory Speed Storage for Cluster Computing Frameworks

    DTIC Science & Technology

    2014-06-16

    specification API that can capture computations in many of today’s popular data -parallel computing models, e.g., MapReduce and SQL. We also ported the Hadoop ...today’s big data workloads: • Immutable data : Data is immutable once written, since dominant underlying storage systems, such as HDFS [3], only support...network transfers, so reads can be data -local. • Program size vs. data size: In big data processing, the same operation is repeatedly applied on massive

  11. EHR Big Data Deep Phenotyping

    PubMed Central

    Lenert, L.; Lopez-Campos, G.

    2014-01-01

    Summary Objectives Given the quickening speed of discovery of variant disease drivers from combined patient genotype and phenotype data, the objective is to provide methodology using big data technology to support the definition of deep phenotypes in medical records. Methods As the vast stores of genomic information increase with next generation sequencing, the importance of deep phenotyping increases. The growth of genomic data and adoption of Electronic Health Records (EHR) in medicine provides a unique opportunity to integrate phenotype and genotype data into medical records. The method by which collections of clinical findings and other health related data are leveraged to form meaningful phenotypes is an active area of research. Longitudinal data stored in EHRs provide a wealth of information that can be used to construct phenotypes of patients. We focus on a practical problem around data integration for deep phenotype identification within EHR data. The use of big data approaches are described that enable scalable markup of EHR events that can be used for semantic and temporal similarity analysis to support the identification of phenotype and genotype relationships. Conclusions Stead and colleagues’ 2005 concept of using light standards to increase the productivity of software systems by riding on the wave of hardware/processing power is described as a harbinger for designing future healthcare systems. The big data solution, using flexible markup, provides a route to improved utilization of processing power for organizing patient records in genotype and phenotype research. PMID:25123744

  12. Role of endothelin-converting enzyme, chymase and neutral endopeptidase in the processing of big ET-1, ET-1(1-21) and ET-1(1-31) in the trachea of allergic mice.

    PubMed

    De Campo, Benjamin A; Goldie, Roy G; Jeng, Arco Y; Henry, Peter J

    2002-08-01

    The present study examined the roles of endothelin-converting enzyme (ECE), neutral endopeptidase (NEP) and mast cell chymase as processors of the endothelin (ET) analogues ET-1(1-21), ET-1(1-31) and big ET-1 in the trachea of allergic mice. Male CBA/CaH mice were sensitized with ovalbumin (10 microg) delivered intraperitoneal on days 1 and 14, and exposed to aerosolized ovalbumin on days 14, 25, 26 and 27 (OVA mice). Mice were killed and the trachea excised for histological analysis and contraction studies on day 28. Tracheae from OVA mice had 40% more mast cells than vehicle-sensitized mice (sham mice). Ovalbumin (10 microg/ml) induced transient contractions (15+/-3% of the C(max)) in tracheae from OVA mice. The ECE inhibitor CGS35066 (10 microM) inhibited contractions induced by big ET-1 (4.8-fold rightward shift of dose-response curve; P<0.05), but not those induced by either ET-1(1-21) or ET-1(1-31). The chymase inhibitors chymostatin (10 microM) and Bowman-Birk inhibitor (10 microM) had no effect on contractions induced by any of the ET analogues used. The NEP inhibitor CGS24592 (10 microM) inhibited contractions induced by ET-1(1-31) (6.2-fold rightward shift; P<0.05) but not ET-1(1-21) or big ET-1. These data suggest that big ET-1 is processed predominantly by a CGS35066-sensitive ECE within allergic airways rather than by mast cell-derived proteases such as chymase. If endogenous ET-1(1-31) is formed within allergic airways, it is likely to undergo further conversion by NEP to more active products.

  13. Big Bang Day : The Great Big Particle Adventure - 1. Atom

    ScienceCinema

    None

    2017-12-09

    In this series, comedian and physicist Ben Miller asks the CERN scientists what they hope to find. The notion of atoms dates back to Greek philosophers who sought a natural mechanical explanation of the Universe, as opposed to a divine one. The existence what we call chemical atoms, the constituents of all we see around us, wasn't proved until a hundred years ago, but almost simultaneously it was realised these weren't the indivisible constituents the Greeks envisaged. Much of the story of physics since then has been the ever-deeper probing of matter until, at the end of the 20th century, a complete list of fundamental ingredients had been identified, apart from one, the much discussed Higgs particle. In this programme, Ben finds out why this last particle is so pivotal, not just to atomic theory, but to our very existence - and how hopeful the scientists are of proving its existence.

  14. Morphological Changes in Skin Glands During Development in Rhinella Arenarum (Anura: Bufonidae).

    PubMed

    Regueira, Eleonora; Dávila, Camila; Hermida, Gladys N

    2016-01-01

    Avoiding predation is critical to survival of animals; chemical defenses represent a common strategy among amphibians. In this study, we examined histologically the morphology of skin glands and types of secretions related to chemical skin defense during ontogeny of Rhinella arenarum. Prior to metamorphic climax the epidermis contains typical bufonid giant cells producing a mucous substance supposedly involved in triggering a flight reaction of the tadpole school. An apical layer of alcianophilic mucus covers the epidermis, which could produce the unpleasant taste of bufonid tadpoles. Giant cells disappear by onset of metamorphic climax, when multicellular glands start developing, but the apical mucous layer remains. By the end of climax, neither the granular glands of the dorsum nor the parotoid regions are completely developed. Conversely, by the end of metamorphosis the mucous glands are partially developed and secrete mucus. Adults have at least three types of granular glands, which we designate type A (acidophilic), type B (basophilic) and ventral (mucous). Polymorphic granular glands distribute differently in the body: dorsal granular glands between warts and in the periphery of parotoids contain protein; granular glands of big warts and in the central region of parotoids contain catecholamines, lipids, and glycoconjugates, whereas ventral granular glands produce acidic glycoconjugates. Mucous glands produce both mucus and proteins. Results suggest that in early juveniles the chemical skin defense mechanisms are not functional. Topographical differences in adult skin secretions suggest that granular glands from the big warts in the skin produce similar toxins to the parotoid glands. © 2015 Wiley Periodicals, Inc.

  15. Genetic structure, nestmate recognition and behaviour of two cryptic species of the invasive big-headed ant Pheidole megacephala.

    PubMed

    Fournier, Denis; Tindo, Maurice; Kenne, Martin; Mbenoun Masse, Paul Serge; Van Bossche, Vanessa; De Coninck, Eliane; Aron, Serge

    2012-01-01

    Biological invasions are recognized as a major cause of biodiversity decline and have considerable impact on the economy and human health. The African big-headed ant Pheidole megacephala is considered one of the world's most harmful invasive species. To better understand its ecological and demographic features, we combined behavioural (aggression tests), chemical (quantitative and qualitative analyses of cuticular lipids) and genetic (mitochondrial divergence and polymorphism of DNA microsatellite markers) data obtained for eight populations in Cameroon. Molecular data revealed two cryptic species of P. megacephala, one inhabiting urban areas and the other rainforests. Urban populations belong to the same phylogenetic group than those introduced in Australia and in other parts of the world. Behavioural analyses show that the eight populations sampled make up four mutually aggressive supercolonies. The maximum distance between nests from the same supercolony was 49 km and the closest distance between two nests belonging to two different supercolonies was 46 m. The genetic data and chemical analyses confirmed the behavioural tests as all of the nests were correctly assigned to their supercolony. Genetic diversity appears significantly greater in Africa than in introduced populations in Australia; by contrast, urban and Australian populations are characterized by a higher chemical diversity than rainforest ones. Overall, our study shows that populations of P. megacephala in Cameroon adopt a unicolonial social structure, like invasive populations in Australia. However, the size of the supercolonies appears several orders of magnitude smaller in Africa. This implies competition between African supercolonies and explains why they persist over evolutionary time scales.

  16. MMX-I: A data-processing software for multi-modal X-ray imaging and tomography

    NASA Astrophysics Data System (ADS)

    Bergamaschi, A.; Medjoubi, K.; Messaoudi, C.; Marco, S.; Somogyi, A.

    2017-06-01

    Scanning hard X-ray imaging allows simultaneous acquisition of multimodal information, including X-ray fluorescence, absorption, phase and dark-field contrasts, providing structural and chemical details of the samples. Combining these scanning techniques with the infrastructure developed for fast data acquisition at Synchrotron Soleil permits to perform multimodal imaging and tomography during routine user experiments at the Nanoscopium beamline. A main challenge of such imaging techniques is the online processing and analysis of the generated very large volume (several hundreds of Giga Bytes) multimodal data-sets. This is especially important for the wide user community foreseen at the user oriented Nanoscopium beamline (e.g. from the fields of Biology, Life Sciences, Geology, Geobiology), having no experience in such data-handling. MMX-I is a new multi-platform open-source freeware for the processing and reconstruction of scanning multi-technique X-ray imaging and tomographic datasets. The MMX-I project aims to offer, both expert users and beginners, the possibility of processing and analysing raw data, either on-site or off-site. Therefore we have developed a multi-platform (Mac, Windows and Linux 64bit) data processing tool, which is easy to install, comprehensive, intuitive, extendable and user-friendly. MMX-I is now routinely used by the Nanoscopium user community and has demonstrated its performance in treating big data.

  17. Accelerating Biomedical Signal Processing Using GPU: A Case Study of Snore Sound Feature Extraction.

    PubMed

    Guo, Jian; Qian, Kun; Zhang, Gongxuan; Xu, Huijie; Schuller, Björn

    2017-12-01

    The advent of 'Big Data' and 'Deep Learning' offers both, a great challenge and a huge opportunity for personalised health-care. In machine learning-based biomedical data analysis, feature extraction is a key step for 'feeding' the subsequent classifiers. With increasing numbers of biomedical data, extracting features from these 'big' data is an intensive and time-consuming task. In this case study, we employ a Graphics Processing Unit (GPU) via Python to extract features from a large corpus of snore sound data. Those features can subsequently be imported into many well-known deep learning training frameworks without any format processing. The snore sound data were collected from several hospitals (20 subjects, with 770-990 MB per subject - in total 17.20 GB). Experimental results show that our GPU-based processing significantly speeds up the feature extraction phase, by up to seven times, as compared to the previous CPU system.

  18. Spectroscopic Analysis of Temporal Changes in Leaf Moisture and Dry Matter Content

    NASA Astrophysics Data System (ADS)

    Qi, Y.; Dennison, P. E.; Brewer, S.; Jolly, W. M.; Kropp, R.

    2013-12-01

    Live fuel moisture (LFM), the ratio of water content to dry matter content (DMC) in live fuel, is critical for determining fire danger and behavior. Remote sensing estimation of LFM often relies on an assumption of changing water content and stable DMC over time. In order to advance understanding of temporal variation in LFM and DMC, we collected field samples and spectroscopic data for two species, lodgepole pine (Pinus contorta) and big sagebrush (Artemisia tridentata), to explore seasonal trends and spectral expression of these trends. New and old needles were measured separately for lodgepole pine. All samples were measured using a visible/NIR/SWIR spectrometer, and coincident samples were processed to provide LFM, DMC, water content and chemical components including structural and non-structural carbohydrates. New needles initially exhibited higher LFM and a smaller proportion of DMC, but differences between new and old needles converged as the new needles hardened. DMC explained more variation in LFM than water content for new pine needles and sagebrush leaves. Old pine needles transported non-structural carbohydrates to new needles to accumulate DMC during the growth season, resulting decreasing LFM in new needles. DMC and water content co-varied with vegetation chemical components and physical structure. Spectral variation in response to changing DMC is difficulty to isolate from the spectral signatures of multiple chemical components. Partial least square regression combined with hyperspectral data may increase modeling performance in LFM estimation.

  19. SraTailor: graphical user interface software for processing and visualizing ChIP-seq data.

    PubMed

    Oki, Shinya; Maehara, Kazumitsu; Ohkawa, Yasuyuki; Meno, Chikara

    2014-12-01

    Raw data from ChIP-seq (chromatin immunoprecipitation combined with massively parallel DNA sequencing) experiments are deposited in public databases as SRAs (Sequence Read Archives) that are publically available to all researchers. However, to graphically visualize ChIP-seq data of interest, the corresponding SRAs must be downloaded and converted into BigWig format, a process that involves complicated command-line processing. This task requires users to possess skill with script languages and sequence data processing, a requirement that prevents a wide range of biologists from exploiting SRAs. To address these challenges, we developed SraTailor, a GUI (Graphical User Interface) software package that automatically converts an SRA into a BigWig-formatted file. Simplicity of use is one of the most notable features of SraTailor: entering an accession number of an SRA and clicking the mouse are the only steps required to obtain BigWig-formatted files and to graphically visualize the extents of reads at given loci. SraTailor is also able to make peak calls, generate files of other formats, process users' own data, and accept various command-line-like options. Therefore, this software makes ChIP-seq data fully exploitable by a wide range of biologists. SraTailor is freely available at http://www.devbio.med.kyushu-u.ac.jp/sra_tailor/, and runs on both Mac and Windows machines. © 2014 The Authors Genes to Cells © 2014 by the Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.

  20. Searching for the Prosocial Personality: A Big Five Approach to Linking Personality and Prosocial Behavior.

    PubMed

    Habashi, Meara M; Graziano, William G; Hoover, Ann E

    2016-09-01

    The search for the prosocial personality has been long and controversial. The current research explores the general patterns underlying prosocial decisions, linking personality, emotion, and overt prosocial behavior. Using a multimethod approach, we explored the links between the Big Five dimensions of personality and prosocial responding. Across three studies, we found that agreeableness was the dimension of personality most closely associated with emotional reactions to victims in need of help, and subsequent decisions to help those individuals. Results suggest that prosocial processes, including emotions, cognitions, and behaviors, may be part of a more general motivational process linked to personality. © 2016 by the Society for Personality and Social Psychology, Inc.

  1. Astrophysics and Big Data: Challenges, Methods, and Tools

    NASA Astrophysics Data System (ADS)

    Garofalo, Mauro; Botta, Alessio; Ventre, Giorgio

    2017-06-01

    Nowadays there is no field research which is not flooded with data. Among the sciences, astrophysics has always been driven by the analysis of massive amounts of data. The development of new and more sophisticated observation facilities, both ground-based and spaceborne, has led data more and more complex (Variety), an exponential growth of both data Volume (i.e., in the order of petabytes), and Velocity in terms of production and transmission. Therefore, new and advanced processing solutions will be needed to process this huge amount of data. We investigate some of these solutions, based on machine learning models as well as tools and architectures for Big Data analysis that can be exploited in the astrophysical context.

  2. Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine.

    PubMed

    Bao, Shunxing; Weitendorf, Frederick D; Plassard, Andrew J; Huo, Yuankai; Gokhale, Aniruddha; Landman, Bennett A

    2017-02-11

    The field of big data is generally concerned with the scale of processing at which traditional computational paradigms break down. In medical imaging, traditional large scale processing uses a cluster computer that combines a group of workstation nodes into a functional unit that is controlled by a job scheduler. Typically, a shared-storage network file system (NFS) is used to host imaging data. However, data transfer from storage to processing nodes can saturate network bandwidth when data is frequently uploaded/retrieved from the NFS, e.g., "short" processing times and/or "large" datasets. Recently, an alternative approach using Hadoop and HBase was presented for medical imaging to enable co-location of data storage and computation while minimizing data transfer. The benefits of using such a framework must be formally evaluated against a traditional approach to characterize the point at which simply "large scale" processing transitions into "big data" and necessitates alternative computational frameworks. The proposed Hadoop system was implemented on a production lab-cluster alongside a standard Sun Grid Engine (SGE). Theoretical models for wall-clock time and resource time for both approaches are introduced and validated. To provide real example data, three T1 image archives were retrieved from a university secure, shared web database and used to empirically assess computational performance under three configurations of cluster hardware (using 72, 109, or 209 CPU cores) with differing job lengths. Empirical results match the theoretical models. Based on these data, a comparative analysis is presented for when the Hadoop framework will be relevant and non-relevant for medical imaging.

  3. Theoretical and empirical comparison of big data image processing with Apache Hadoop and Sun Grid Engine

    NASA Astrophysics Data System (ADS)

    Bao, Shunxing; Weitendorf, Frederick D.; Plassard, Andrew J.; Huo, Yuankai; Gokhale, Aniruddha; Landman, Bennett A.

    2017-03-01

    The field of big data is generally concerned with the scale of processing at which traditional computational paradigms break down. In medical imaging, traditional large scale processing uses a cluster computer that combines a group of workstation nodes into a functional unit that is controlled by a job scheduler. Typically, a shared-storage network file system (NFS) is used to host imaging data. However, data transfer from storage to processing nodes can saturate network bandwidth when data is frequently uploaded/retrieved from the NFS, e.g., "short" processing times and/or "large" datasets. Recently, an alternative approach using Hadoop and HBase was presented for medical imaging to enable co-location of data storage and computation while minimizing data transfer. The benefits of using such a framework must be formally evaluated against a traditional approach to characterize the point at which simply "large scale" processing transitions into "big data" and necessitates alternative computational frameworks. The proposed Hadoop system was implemented on a production lab-cluster alongside a standard Sun Grid Engine (SGE). Theoretical models for wall-clock time and resource time for both approaches are introduced and validated. To provide real example data, three T1 image archives were retrieved from a university secure, shared web database and used to empirically assess computational performance under three configurations of cluster hardware (using 72, 109, or 209 CPU cores) with differing job lengths. Empirical results match the theoretical models. Based on these data, a comparative analysis is presented for when the Hadoop framework will be relevant and nonrelevant for medical imaging.

  4. Can EO afford big data - an assessment of the temporal and monetary costs of existing and emerging big data workflows

    NASA Astrophysics Data System (ADS)

    Clements, Oliver; Walker, Peter

    2014-05-01

    The cost of working with extremely large data sets is an increasingly important issue within the Earth Observation community. From global coverage data at any resolution to small coverage data at extremely high resolution, the community has always produced big data. This will only increase as new sensors are deployed and their data made available. Over time standard workflows have emerged. These have been facilitated by the production and adoption of standard technologies. Groups such as the International Organisation for Standardisation (ISO) and the Open Geospatial Consortium (OGC) have been a driving force in this area for many years. The production of standard protocols and interfaces such as OPeNDAP, Web Coverage Service (WCS), Web Processing Service (WPS) and the newer emerging standards such as Web Coverage Processing Service (WCPS) have helped to galvanise these workflows. An example of a traditional workflow, assume a researcher wants to assess the temporal trend in chlorophyll concentration. This would involve a discovery phase, an acquisition phase, a processing phase and finally a derived product or analysis phase. Each element of this workflow has an associated temporal and monetary cost. Firstly the researcher would require a high bandwidth connection or the acquisition phase would take too long. Secondly the researcher must have their own expensive equipment for use in the processing phase. Both of these elements cost money and time. This can make the whole process prohibitive to scientists from the developing world or "citizen scientists" that do not have the processing infrastructure necessary. The use of emerging technologies can help improve both the monetary and time costs associated with these existing workflows. By utilising a WPS that is hosted at the same location as the data a user is able to apply processing to the data without needing their own processing infrastructure. This however limits the user to predefined processes that are made available by the data provider. The emerging OGC WCPS standard combined with big data analytics engines may provide a mechanism to improve this situation. The technology allows users to create their own queries using an SQL like query language and apply them over available large data archive, once again at the data providers end. This not only removes the processing cost whilst still allowing user defined processes it also reduces the bandwidth required, as only the final analysis or derived product needs to be downloaded. The maturity of the new technologies is a stage where their use should be justified by a quantitative assessment rather than simply by the fact that they are new developments. We will present a study of the time and cost requirements for a selection of existing workflows and then show how new/emerging standards and technologies can help to both reduce the cost to the user by shifting processing to the data, and reducing the required bandwidth for analysing large datasets, making analysis of big-data archives possible for a greater and more diverse audience.

  5. Self-Assembly of ZnO Nanoplatelets into Hierarchical Mesocrystals and Their Photocatalytic Property

    NASA Astrophysics Data System (ADS)

    Yang, Yongqiang; Wang, Qinsheng; Liu, Zheng; Jin, Ling; Ou, Bingxian; Han, Pengju; Wang, Qun; Cheng, Xiaobao; Liu, Wenjun; Wen, Yu; Liu, Yuan; Zhao, Weifang

    2018-03-01

    In this work, a simple chemical procedure was developed for the preparation of mesocrystals consisiting of ZnO nanoplateletes. By simple mixing the aqueous solutions Zn(NO3)2, NaOH and ethanol at certain temperatures, the hierarchical mesocrystals with big at both ends but small in the middle were obtained. After being annealed in air at certain temperatures, the same structured ZnO mesocrystals were generated. The morphology, crystalline structure and chemical composition were characterized using SEM, XRD FT-IR and Raman. The photocatalytic properties of the ZnO mesocrystals were also investigated. It was illustrated that the ZnO mesocrystals show decent photocatalytic performance to the photodegradation of methyl blue.

  6. A new paradigm of quantifying ecosystem stress through chemical signatures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kravitz, Ben; Guenther, Alex B.; Gu, Lianhong

    Stress-induced emissions of biogenic volatile organic compounds (VOCs) from terrestrial ecosystems may be one of the dominant sources of VOC emissions world-wide. Understanding the ecosystem stress response could reveal how ecosystems will respond and adapt to climate change and, in turn, quantify changes in the atmospheric burden of VOC oxidants and secondary organic aerosols. Here we argue, based on preliminary evidence from several opportunistic measurement sources, that chemical signatures of stress can be identified and quantified at the ecosystem scale. We also outline future endeavors that we see as next steps toward uncovering quantitative signatures of stress, including new advancesmore » in both VOC data collection and analysis of "big data."« less

  7. Stochastic differential game formulation on the reinsurance and investment problem

    NASA Astrophysics Data System (ADS)

    Li, Danping; Rong, Ximin; Zhao, Hui

    2015-09-01

    This paper focuses on a stochastic differential game between two insurance companies, a big one and a small one. The big company has sufficient asset to invest in a risk-free asset and a risky asset and is allowed to purchase proportional reinsurance or acquire new business, and the small company can transfer part of the risk to a reinsurer via proportional reinsurance. The game studied here is zero-sum, where the big company is trying to maximise the expected exponential utility of the difference between two insurance companies' surpluses at the terminal time to keep its advantage on surplus, while simultaneously the small company is trying to minimise the same quantity to reduce its disadvantage. Particularly, the relationships between the surplus processes and the price process of the risky asset are considered. By applying stochastic control theory, we provide and prove the verification theorem and obtain the Nash equilibrium strategy of the game, explicitly. Furthermore, numerical simulations are presented to illustrate the effects of parameters on the equilibrium strategy as well as the economic meanings behind.

  8. Semantic size of abstract concepts: it gets emotional when you can't see it.

    PubMed

    Yao, Bo; Vasiljevic, Milica; Weick, Mario; Sereno, Margaret E; O'Donnell, Patrick J; Sereno, Sara C

    2013-01-01

    Size is an important visuo-spatial characteristic of the physical world. In language processing, previous research has demonstrated a processing advantage for words denoting semantically "big" (e.g., jungle) versus "small" (e.g., needle) concrete objects. We investigated whether semantic size plays a role in the recognition of words expressing abstract concepts (e.g., truth). Semantically "big" and "small" concrete and abstract words were presented in a lexical decision task. Responses to "big" words, regardless of their concreteness, were faster than those to "small" words. Critically, we explored the relationship between semantic size and affective characteristics of words as well as their influence on lexical access. Although a word's semantic size was correlated with its emotional arousal, the temporal locus of arousal effects may depend on the level of concreteness. That is, arousal seemed to have an earlier (lexical) effect on abstract words, but a later (post-lexical) effect on concrete words. Our findings provide novel insights into the semantic representations of size in abstract concepts and highlight that affective attributes of words may not always index lexical access.

  9. Construction of a groundwater-flow model for the Big Sioux Aquifer using airborne electromagnetic methods, Sioux Falls, South Dakota

    USGS Publications Warehouse

    Valder, Joshua F.; Delzer, Gregory C.; Carter, Janet M.; Smith, Bruce D.; Smith, David V.

    2016-09-28

    The city of Sioux Falls is the fastest growing community in South Dakota. In response to this continued growth and planning for future development, Sioux Falls requires a sustainable supply of municipal water. Planning and managing sustainable groundwater supplies requires a thorough understanding of local groundwater resources. The Big Sioux aquifer consists of glacial outwash sands and gravels and is hydraulically connected to the Big Sioux River, which provided about 90 percent of the city’s source-water production in 2015. Managing sustainable groundwater supplies also requires an understanding of groundwater availability. An effective mechanism to inform water management decisions is the development and utilization of a groundwater-flow model. A groundwater-flow model provides a quantitative framework for synthesizing field information and conceptualizing hydrogeologic processes. These groundwater-flow models can support decision making processes by mapping and characterizing the aquifer. Accordingly, the city of Sioux Falls partnered with the U.S. Geological Survey to construct a groundwater-flow model. Model inputs will include data from advanced geophysical techniques, specifically airborne electromagnetic methods.

  10. [Algorithms, machine intelligence, big data : general considerations].

    PubMed

    Radermacher, F J

    2015-08-01

    We are experiencing astonishing developments in the areas of big data and artificial intelligence. They follow a pattern that we have now been observing for decades: according to Moore's Law,the performance and efficiency in the area of elementary arithmetic operations increases a thousand-fold every 20 years. Although we have not achieved the status where in the singular sense machines have become as "intelligent" as people, machines are becoming increasingly better. The Internet of Things has again helped to massively increase the efficiency of machines. Big data and suitable analytics do the same. If we let these processes simply continue, our civilization may be endangerd in many instances. If the "containment" of these processes succeeds in the context of a reasonable political global governance, a worldwide eco-social market economy, andan economy of green and inclusive markets, many desirable developments that are advantageous for our future may result. Then, at some point in time, the constant need for more and faster innovation may even stop. However, this is anything but certain. We are facing huge challenges.

  11. Towards adaptive, streaming analysis of x-ray tomography data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thomas, Mathew; Kleese van Dam, Kerstin; Marshall, Matthew J.

    2015-03-04

    Temporal and spatial resolution of chemical imaging methodologies such as x-ray tomography are rapidly increasing, leading to more complex experimental procedures and fast growing data volumes. Automated analysis pipelines and big data analytics are becoming essential to effectively evaluate the results of such experiments. Offering those data techniques in an adaptive, streaming environment can further substantially improve the scientific discovery process, by enabling experimental control and steering based on the evaluation of emerging phenomena as they are observed by the experiment. Pacific Northwest National Laboratory (PNNL)’ Chemical Imaging Initiative (CII - http://imaging.pnnl.gov/ ) has worked since 2011 towards developing amore » framework that allows users to rapidly compose and customize high throughput experimental analysis pipelines for multiple instrument types. The framework, named ‘Rapid Experimental Analysis’ (REXAN) Framework [1], is based on the idea of reusable component libraries and utilizes the PNNL developed collaborative data management and analysis environment ‘Velo’, to provide a user friendly analysis and data management environment for experimental facilities. This article will, discuss the capabilities established for X-Ray tomography, discuss lessons learned, and provide an overview of our more recent work in the Analysis in Motion Initiative (AIM - http://aim.pnnl.gov/ ) at PNNL to provide REXAN capabilities in a streaming environment.« less

  12. Early Probe and Drug Discovery in Academia: A Minireview.

    PubMed

    Roy, Anuradha

    2018-02-09

    Drug discovery encompasses processes ranging from target selection and validation to the selection of a development candidate. While comprehensive drug discovery work flows are implemented predominantly in the big pharma domain, early discovery focus in academia serves to identify probe molecules that can serve as tools to study targets or pathways. Despite differences in the ultimate goals of the private and academic sectors, the same basic principles define the best practices in early discovery research. A successful early discovery program is built on strong target definition and validation using a diverse set of biochemical and cell-based assays with functional relevance to the biological system being studied. The chemicals identified as hits undergo extensive scaffold optimization and are characterized for their target specificity and off-target effects in in vitro and in animal models. While the active compounds from screening campaigns pass through highly stringent chemical and Absorption, Distribution, Metabolism, and Excretion (ADME) filters for lead identification, the probe discovery involves limited medicinal chemistry optimization. The goal of probe discovery is identification of a compound with sub-µM activity and reasonable selectivity in the context of the target being studied. The compounds identified from probe discovery can also serve as starting scaffolds for lead optimization studies.

  13. Thermal Stability of Oil Palm Empty Fruit Bunch (OPEFB) Nanocrystalline Cellulose: Effects of post-treatment of oven drying and solvent exchange techniques

    NASA Astrophysics Data System (ADS)

    Indarti, E.; Marwan; Wanrosli, W. D.

    2015-06-01

    Nanocrystallinecellulose (NCC) from biomass is a promising material with huge potentials in various applications. A big challenge in its utilization is the agglomeration of the NCC's during processing due to hydrogen bonding among the cellulose chains when in close proximity to each other. Obtaining NCC's in a non-agglomerated and non-aqueous condition is challenging. In the present work NCC's was isolated from oil palm empty fruit bunch (OPEFB) using TEMPO-oxidation reaction method. To obtain non-agglomerated and non-aqueous products, the NCC's underwent post-treatment using oven drying (OD) and solvent exchanged (SE) techniques. The thermal stability of all samples was determined from TGA and DTG profiles whilst FTIR was used to analyzethe chemical modifications that occurred under these conditions. NCC-SE has better thermal stability than the NCC-OD and its on-set degradation temperature and residue are also higher. FTIR analysis shows that NCC-SE has a slightly different chemical composition whereby the absorption band at 1300 cm-1 (due to C-O symmetric stretching) is absent as compared to NCC-OD indicating that in NCC-SE the carboxylate group is in acid form which contribute to its thermal stability

  14. Disk mass determination through CO isotopologues

    NASA Astrophysics Data System (ADS)

    Miotello, Anna; Kama, Mihkel; van Dishoeck, Ewine

    2015-08-01

    One of the key properties for understanding how disks evolve to planetary systems is their overall mass, combined with their surface density distribution. So far, virtually all disk mass determinations are based on observations of the millimeter continuum dust emission.To derive the total gas + dust disk mass from these data involves however several big assumptions. The alternative method is to directly derive the gas mass through the detection of carbon monoxide (CO) and its less abundant isotopologues. CO chemistry is well studied and easily implemented in chemical models, provided that isotope-selective processes are properly accounted for.CO isotope-selective photodissociation was implemented for the first time in a full physical-chemical code in Miotello et al. (2014). The main result is that if isotope-selective effects are not considered in the data analysis, disk masses can be underestimated by an order of magnitude or more. For example, the mass discrepancy found for the renowned TW Hya disk may be explained or at least mitigated by this implementation. In this poster, we present new results for a large grid of disk models. We derive mass correction factors for different disk, stellar and grain properties in order to account for isotope-selective effects in analyzing ALMA data of CO isotopologues (Miotello et al., in prep.).

  15. Identifying Military Impacts on Archaeological Deposits Based on Differences in Soil Organic Carbon and Chemical Elements at Soil Horizon Interfaces

    DTIC Science & Technology

    2012-03-01

    between disturbed and undisturbed sites, resulting in plant communities dominated by annual species and perennial species or grass/forb and shrub/ tree ...serve as non-habitation site controls. Each archaeological site and adjacent non-site area was then surveyed to provide a floristic species and...native tallgrass prairie species such as Indiangrass (Sorghastrum nutans), big bluestem (Andropogon gerardii), switchgrass (Panicum virgatum), and

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Song, Jong-Won; Hirao, Kimihiko

    Long-range corrected density functional theory (LC-DFT) attracts many chemists’ attentions as a quantum chemical method to be applied to large molecular system and its property calculations. However, the expensive time cost to evaluate the long-range HF exchange is a big obstacle to be overcome to be applied to the large molecular systems and the solid state materials. Upon this problem, we propose a linear-scaling method of the HF exchange integration, in particular, for the LC-DFT hybrid functional.

  17. A primer on theory-driven web scraping: Automatic extraction of big data from the Internet for use in psychological research.

    PubMed

    Landers, Richard N; Brusso, Robert C; Cavanaugh, Katelyn J; Collmus, Andrew B

    2016-12-01

    The term big data encompasses a wide range of approaches of collecting and analyzing data in ways that were not possible before the era of modern personal computing. One approach to big data of great potential to psychologists is web scraping, which involves the automated collection of information from webpages. Although web scraping can create massive big datasets with tens of thousands of variables, it can also be used to create modestly sized, more manageable datasets with tens of variables but hundreds of thousands of cases, well within the skillset of most psychologists to analyze, in a matter of hours. In this article, we demystify web scraping methods as currently used to examine research questions of interest to psychologists. First, we introduce an approach called theory-driven web scraping in which the choice to use web-based big data must follow substantive theory. Second, we introduce data source theories , a term used to describe the assumptions a researcher must make about a prospective big data source in order to meaningfully scrape data from it. Critically, researchers must derive specific hypotheses to be tested based upon their data source theory, and if these hypotheses are not empirically supported, plans to use that data source should be changed or eliminated. Third, we provide a case study and sample code in Python demonstrating how web scraping can be conducted to collect big data along with links to a web tutorial designed for psychologists. Fourth, we describe a 4-step process to be followed in web scraping projects. Fifth and finally, we discuss legal, practical and ethical concerns faced when conducting web scraping projects. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  18. Image Mosaicking Approach for a Double-Camera System in the GaoFen2 Optical Remote Sensing Satellite Based on the Big Virtual Camera.

    PubMed

    Cheng, Yufeng; Jin, Shuying; Wang, Mi; Zhu, Ying; Dong, Zhipeng

    2017-06-20

    The linear array push broom imaging mode is widely used for high resolution optical satellites (HROS). Using double-cameras attached by a high-rigidity support along with push broom imaging is one method to enlarge the field of view while ensuring high resolution. High accuracy image mosaicking is the key factor of the geometrical quality of complete stitched satellite imagery. This paper proposes a high accuracy image mosaicking approach based on the big virtual camera (BVC) in the double-camera system on the GaoFen2 optical remote sensing satellite (GF2). A big virtual camera can be built according to the rigorous imaging model of a single camera; then, each single image strip obtained by each TDI-CCD detector can be re-projected to the virtual detector of the big virtual camera coordinate system using forward-projection and backward-projection to obtain the corresponding single virtual image. After an on-orbit calibration and relative orientation, the complete final virtual image can be obtained by stitching the single virtual images together based on their coordinate information on the big virtual detector image plane. The paper subtly uses the concept of the big virtual camera to obtain a stitched image and the corresponding high accuracy rational function model (RFM) for concurrent post processing. Experiments verified that the proposed method can achieve seamless mosaicking while maintaining the geometric accuracy.

  19. Trojan Horse Method for neutrons-induced reaction studies

    NASA Astrophysics Data System (ADS)

    Gulino, M.; Asfin Collaboration

    2017-09-01

    Neutron-induced reactions play an important role in nuclear astrophysics in several scenario, such as primordial Big Bang Nucleosynthesis, Inhomogeneous Big Bang Nucleosynthesis, heavy-element production during the weak component of the s-process, explosive stellar nucleosynthesis. To overcome the experimental problems arising from the production of a neutron beam, the possibility to use the Trojan Horse Method to study neutron-induced reactions has been investigated. The application is of particular interest for reactions involving radioactive nuclei having short lifetime.

  20. Translating the Science of Measuring Ecosystems at a National Scale: Developing NEON's Online Learning Portal

    NASA Astrophysics Data System (ADS)

    Wasser, L. A.; Gram, W.; Goehring, L.

    2014-12-01

    "Big Data" are becoming increasingly common in many fields. The National Ecological Observatory Network (NEON) will be collecting data over the 30 years, using consistent, standardized methods across the United States. These freely available new data provide an opportunity for increased understanding of continental- and global scale processes such as changes in vegetation structure and condition, biodiversity and landuse. However, while "big data" are becoming more accessible and available, integrating big data into the university courses is challenging. New and potentially unfamiliar data types and associated processing methods, required to work with a growing diversity of available data, may warrant time and resources that present a barrier to classroom integration. Analysis of these big datasets may further present a challenge given large file sizes, and uncertainty regarding best methods to properly statistically summarize and analyze results. Finally, teaching resources, in the form of demonstrative illustrations, and other supporting media that might help teach key data concepts, take time to find and more time to develop. Available resources are often spread widely across multi-online spaces. This presentation will overview the development of NEON's collaborative University-focused online education portal. Portal content will include 1) videos and supporting graphics that explain key concepts related to NEON data products including collection methods, key metadata to consider and consideration of potential error and uncertainty surrounding data analysis; and 2) packaged "lab" activities that include supporting data to be used in an ecology, biology or earth science classroom. To facilitate broad use in classrooms, lab activities will take advantage of freely and commonly available processing tools, techniques and scripts. All NEON materials are being developed in collaboration with existing labs and organizations.

  1. The Widening Gulf between Genomics Data Generation and Consumption: A Practical Guide to Big Data Transfer Technology

    PubMed Central

    Feltus, Frank A.; Breen, Joseph R.; Deng, Juan; Izard, Ryan S.; Konger, Christopher A.; Ligon, Walter B.; Preuss, Don; Wang, Kuang-Ching

    2015-01-01

    In the last decade, high-throughput DNA sequencing has become a disruptive technology and pushed the life sciences into a distributed ecosystem of sequence data producers and consumers. Given the power of genomics and declining sequencing costs, biology is an emerging “Big Data” discipline that will soon enter the exabyte data range when all subdisciplines are combined. These datasets must be transferred across commercial and research networks in creative ways since sending data without thought can have serious consequences on data processing time frames. Thus, it is imperative that biologists, bioinformaticians, and information technology engineers recalibrate data processing paradigms to fit this emerging reality. This review attempts to provide a snapshot of Big Data transfer across networks, which is often overlooked by many biologists. Specifically, we discuss four key areas: 1) data transfer networks, protocols, and applications; 2) data transfer security including encryption, access, firewalls, and the Science DMZ; 3) data flow control with software-defined networking; and 4) data storage, staging, archiving and access. A primary intention of this article is to orient the biologist in key aspects of the data transfer process in order to frame their genomics-oriented needs to enterprise IT professionals. PMID:26568680

  2. Big data and high-performance analytics in structural health monitoring for bridge management

    NASA Astrophysics Data System (ADS)

    Alampalli, Sharada; Alampalli, Sandeep; Ettouney, Mohammed

    2016-04-01

    Structural Health Monitoring (SHM) can be a vital tool for effective bridge management. Combining large data sets from multiple sources to create a data-driven decision-making framework is crucial for the success of SHM. This paper presents a big data analytics framework that combines multiple data sets correlated with functional relatedness to convert data into actionable information that empowers risk-based decision-making. The integrated data environment incorporates near real-time streams of semi-structured data from remote sensors, historical visual inspection data, and observations from structural analysis models to monitor, assess, and manage risks associated with the aging bridge inventories. Accelerated processing of dataset is made possible by four technologies: cloud computing, relational database processing, support from NOSQL database, and in-memory analytics. The framework is being validated on a railroad corridor that can be subjected to multiple hazards. The framework enables to compute reliability indices for critical bridge components and individual bridge spans. In addition, framework includes a risk-based decision-making process that enumerate costs and consequences of poor bridge performance at span- and network-levels when rail networks are exposed to natural hazard events such as floods and earthquakes. Big data and high-performance analytics enable insights to assist bridge owners to address problems faster.

  3. Using Big Data to Discover Diagnostics and Therapeutics for Gastrointestinal and Liver Diseases

    PubMed Central

    Wooden, Benjamin; Goossens, Nicolas; Hoshida, Yujin; Friedman, Scott L.

    2016-01-01

    Technologies such as genome sequencing, gene expression profiling, proteomic and metabolomic analyses, electronic medical records, and patient-reported health information have produced large amounts of data, from various populations, cell types, and disorders (big data). However, these data must be integrated and analyzed if they are to produce models or concepts about physiologic function or mechanisms of pathogenesis. Many of these data are available to the public, allowing researchers anywhere to search for markers of specific biologic processes or therapeutic targets for specific diseases or patient types. We review recent advances in the fields of computational and systems biology, and highlight opportunities for researchers to use big data sets in the fields of gastroenterology and hepatology, to complement traditional means of diagnostic and therapeutic discovery. PMID:27773806

  4. Visualizing Big Data Outliers through Distributed Aggregation.

    PubMed

    Wilkinson, Leland

    2017-08-29

    Visualizing outliers in massive datasets requires statistical pre-processing in order to reduce the scale of the problem to a size amenable to rendering systems like D3, Plotly or analytic systems like R or SAS. This paper presents a new algorithm, called hdoutliers, for detecting multidimensional outliers. It is unique for a) dealing with a mixture of categorical and continuous variables, b) dealing with big-p (many columns of data), c) dealing with big-n (many rows of data), d) dealing with outliers that mask other outliers, and e) dealing consistently with unidimensional and multidimensional datasets. Unlike ad hoc methods found in many machine learning papers, hdoutliers is based on a distributional model that allows outliers to be tagged with a probability. This critical feature reduces the likelihood of false discoveries.

  5. EarthServer: a Summary of Achievements in Technology, Services, and Standards

    NASA Astrophysics Data System (ADS)

    Baumann, Peter

    2015-04-01

    Big Data in the Earth sciences, the Tera- to Exabyte archives, mostly are made up from coverage data, according to ISO and OGC defined as the digital representation of some space-time varying phenomenon. Common examples include 1-D sensor timeseries, 2-D remote sensing imagery, 3D x/y/t image timese ries and x/y/z geology data, and 4-D x/y/z/t atmosphere and ocean data. Analytics on such data requires on-demand processing of sometimes significant complexity, such as getting the Fourier transform of satellite images. As network bandwidth limits prohibit transfer of such Big Data it is indispensable to devise protocols allowing clients to task flexible and fast processing on the server. The transatlantic EarthServer initiative, running from 2011 through 2014, has united 11 partners to establish Big Earth Data Analytics. A key ingredient has been flexibility for users to ask whatever they want, not impeded and complicated by system internals. The EarthServer answer to this is to use high-level, standards-based query languages which unify data and metadata search in a simple, yet powerful way. A second key ingredient is scalability. Without any doubt, scalability ultimately can only be achieved through parallelization. In the past, parallelizing cod e has been done at compile time and usually with manual intervention. The EarthServer approach is to perform a samentic-based dynamic distribution of queries fragments based on networks optimization and further criteria. The EarthServer platform is comprised by rasdaman, the pioneer and leading Array DBMS built for any-size multi-dimensional raster data being extended with support for irregular grids and general meshes; in-situ retrieval (evaluation of database queries on existing archive structures, avoiding data import and, hence, duplication); the aforementioned distributed query processing. Additionally, Web clients for multi-dimensional data visualization are being established. Client/server interfaces are strictly based on OGC and W3C standards, in particular the Web Coverage Processing Service (WCPS) which defines a high-level coverage query language. Reviewers have attested EarthServer that "With no doubt the project has been shaping the Big Earth Data landscape through the standardization activities within OGC, ISO and beyond". We present the project approach, its outcomes and impact on standardization and Big Data technology, and vistas for the future.

  6. A Big Spatial Data Processing Framework Applying to National Geographic Conditions Monitoring

    NASA Astrophysics Data System (ADS)

    Xiao, F.

    2018-04-01

    In this paper, a novel framework for spatial data processing is proposed, which apply to National Geographic Conditions Monitoring project of China. It includes 4 layers: spatial data storage, spatial RDDs, spatial operations, and spatial query language. The spatial data storage layer uses HDFS to store large size of spatial vector/raster data in the distributed cluster. The spatial RDDs are the abstract logical dataset of spatial data types, and can be transferred to the spark cluster to conduct spark transformations and actions. The spatial operations layer is a series of processing on spatial RDDs, such as range query, k nearest neighbor and spatial join. The spatial query language is a user-friendly interface which provide people not familiar with Spark with a comfortable way to operation the spatial operation. Compared with other spatial frameworks, it is highlighted that comprehensive technologies are referred for big spatial data processing. Extensive experiments on real datasets show that the framework achieves better performance than traditional process methods.

  7. The Big Five personality dimensions and mental health: The mediating role of alexithymia.

    PubMed

    Atari, Mohammad; Yaghoubirad, Mahsa

    2016-12-01

    The role of personality constructs on mental health has attracted research attention in the last few decades. The Big Five personality traits have been introduced as parsimonious dimensions of non-pathological traits. The five-factor model of personality includes neuroticism, agreeableness, conscientiousness, extraversion, and openness to experience. The present study aimed to examine the relationship between the Big Five dimensions and mental health considering the mediating role of alexithymia as an important emotional-processing construct. A total of 257 participants were recruited from non-clinical settings in the general population. All participants completed the Ten-Item Personality Inventory (TIPI), 20-item Toronto Alexithymia Scale (TAS-20), and General Health Questionnaire-28 (GHQ-28). Structural equation modeling was utilized to examine the hypothesized mediated model. Findings indicated that the Big Five personality dimensions could significantly predict scores of alexithymia. Moreover, alexithymia could predict mental health scores as measured by indices of depression, anxiety, social functioning, and somatic symptoms. The fit indices (GFI=0.94; CFI=0.91; TLI=0.90; RMSEA=0.071; CMIN/df=2.29) indicated that the model fits the data. Therefore, the relationship between the Big Five personality dimensions and mental health is mediated by alexithymia. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies.

    PubMed

    de Brevern, Alexandre G; Meyniel, Jean-Philippe; Fairhead, Cécile; Neuvéglise, Cécile; Malpertuy, Alain

    2015-01-01

    Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries.

  9. Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies

    PubMed Central

    de Brevern, Alexandre G.; Meyniel, Jean-Philippe; Fairhead, Cécile; Neuvéglise, Cécile; Malpertuy, Alain

    2015-01-01

    Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries. PMID:26125026

  10. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    PubMed

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.

  11. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth

    PubMed Central

    Zhang, Zhaoyang; Wang, Honggang

    2016-01-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering is more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063

  12. Simulated big sagebrush regeneration supports predicted changes at the trailing and leading edges of distribution shifts

    USGS Publications Warehouse

    Schlaepfer, Daniel R.; Taylor, Kyle A.; Pennington, Victoria E.; Nelson, Kellen N.; Martin, Trace E.; Rottler, Caitlin M.; Lauenroth, William K.; Bradford, John B.

    2015-01-01

    Many semi-arid plant communities in western North America are dominated by big sagebrush. These ecosystems are being reduced in extent and quality due to economic development, invasive species, and climate change. These pervasive modifications have generated concern about the long-term viability of sagebrush habitat and sagebrush-obligate wildlife species (notably greater sage-grouse), highlighting the need for better understanding of the future big sagebrush distribution, particularly at the species' range margins. These leading and trailing edges of potential climate-driven sagebrush distribution shifts are likely to be areas most sensitive to climate change. We used a process-based regeneration model for big sagebrush, which simulates potential germination and seedling survival in response to climatic and edaphic conditions and tested expectations about current and future regeneration responses at trailing and leading edges that were previously identified using traditional species distribution models. Our results confirmed expectations of increased probability of regeneration at the leading edge and decreased probability of regeneration at the trailing edge below current levels. Our simulations indicated that soil water dynamics at the leading edge became more similar to the typical seasonal ecohydrological conditions observed within the current range of big sagebrush ecosystems. At the trailing edge, an increased winter and spring dryness represented a departure from conditions typically supportive of big sagebrush. Our results highlighted that minimum and maximum daily temperatures as well as soil water recharge and summer dry periods are important constraints for big sagebrush regeneration. Overall, our results confirmed previous predictions, i.e., we see consistent changes in areas identified as trailing and leading edges; however, we also identified potential local refugia within the trailing edge, mostly at sites at higher elevation. Decreasing regeneration probability at the trailing edge underscores the Schlaepfer et al. Future regeneration potential of big sagebrush potential futility of efforts to preserve and/or restore big sagebrush in these areas. Conversely, increasing regeneration probability at the leading edge suggest a growing potential for conflicts in management goals between maintaining existing grasslands by preventing sagebrush expansion versus accepting a shift in plant community composition to sagebrush dominance.

  13. Light element production in the big bang and the synthesis of heavy elements in 3D MHD jets from core-collapse supernovae

    NASA Astrophysics Data System (ADS)

    Winteler, Christian

    2014-02-01

    In this dissertation we present the main features of a new nuclear reaction network evolution code. This new code allows nucleosynthesis calculations for large numbers of nuclides. The main results in this dissertation are all obtained using this new code. The strength of standard big bang nucleosynthesis is, that all primordial abundances are determined by only one free parameter, the baryon-to-photon ratio η. We perform self consistent nucleosynthesis calculations for the latest WMAP value η = (6.16±0.15)×10^-10 . We predict primordial light element abundances: D/H = (2.84 ± 0.23)×10^-5, 3He/H = (1.07 ± 0.09)×10^-5, Yp = 0.2490±0.0005 and 7Li/H = (4.57 ± 0.55)×10^-10, in agreement with current observations and other predictions. We investigate the influence of the main production rate on the 6 Li abundance, but find no significant increase of the predicted value, which is known to be orders of magnitude lower than the observed. The r-process is responsible for the formation of about half of the elements heavier than iron in our solar system. This neutron capture process requires explosive environments with large neutron densities. The exact astrophysical site where the r-process occurs has not yet been identified. We explore jets from magnetorotational core collapse supernovae (MHD jets) as possible r-process site. In a parametric study, assuming adiabatic expansion, we find good agreement with solar system abundances for a superposition of components with different electron fraction (Ye ), ranging from Ye = 0.1 to Ye = 0.3. Fission is found to be important only for Ye ≤ 0.17. The first postprocessing calculations with data from 3D MHD core collapse supernova simulations are performed for two different simulations. Calculations are based on two different methods to extract data from the simulation: tracer particles and a two dimensional, mass weighted histogram. Both results yield almost identical results. We find that both simulations can reproduce the global solar r-process abundance pattern. The ejected mass is found to be in agreement with galactic chemical evolution for a rare event rate of one MHD jet every hundredth to thousandth supernova.

  14. Big Memory Elegance: HyperCard Information Processing and Desktop Publishing.

    ERIC Educational Resources Information Center

    Bitter, Gary G.; Gerson, Charles W., Jr.

    1991-01-01

    Discusses hardware requirements, functions, and applications of five information processing and desktop publishing software packages for the Macintosh: HyperCard, PageMaker, Cricket Presents, Power Point, and Adobe illustrator. Benefits of these programs for schools are considered. (MES)

  15. Protocols for sagebrush seed processing and seedling production at the Lucky Peak Nursery

    Treesearch

    Clark D. Fleege

    2010-01-01

    This paper presents the production protocols currently practiced at the USDA Forest Service Lucky Peak Nursery (Boise, ID) for seed processing and bareroot and container seedling production for three subspecies of big sagebrush (Artemisia tridentata).

  16. Pollutant removal characteristics of a two-influent-line BNR process performing denitrifying phosphorus removal: role of sludge recycling ratios.

    PubMed

    Liu, Hongbo; Leng, Feng; Chen, Piao; Kueppers, Stephan

    2016-11-01

    This paper studied denitrifying phosphorus removal of a novel two-line biological nutrient removal process treating low strength domestic wastewater under different sludge recycling ratios. Mass balance of intracellular compounds including polyhydroxyvalerate, polyhydroxybutyrate and glycogen was investigated together with total nitrogen (TN) and total phosphorus (TP). Results showed that sludge recycling ratios had a significant influence on the use of organics along bioreactors and 73.6% of the average removal efficiency was obtained when the influent chemical oxygen demand (COD) ranged from 175.9 mgL -1 to 189.9 mgL -1 . The process performed better under a sludge recycling ratio of 100% compared to 25% and 50% in terms of ammonia and COD removal rates. Overall, TN removal efficiency for 50% and 100% sludge recycling ratios were 56.4% and 61.9%, respectively, unlike the big gap for carbon utilization and the TP removal rates, indicating that the effect of sludge recycling ratio on the anaerobic compartments had been counteracted by change in the efficiency of other compartments. The higher ratio of sludge recycling was conducive to the removal of TN, not in favor of TP, and less influence on COD. Thus, 25% was considered to be the optimal sludge recycling ratio.

  17. Evaluation of the effects of coal-mine reclamation on water quality in Big Four Hollow near Lake Hope, southeastern Ohio

    USGS Publications Warehouse

    Nichols, V.E.

    1985-01-01

    A subsurface clay dike and mine-entrance hydraulic seals were constructed from July 1979 through May 1980 by the Ohio Department if Natural Resources, Division of Reclamation to reduce acidic mine drainage from abandoned drift-mine complex 88 into Big Four Hollow Creek. Big Four Hollow Creek flows into Sandy Run--the major tributary to Lake Hope. A data-collection program was established in 1979 by the U.S. Geological Survey to evaluate effects of drift-mine sealing on surface-water systems of the Big Four Hollow Creek and Sandy Run area just below the mine. Data collected by private consultants from 1970 through 1971 near the mouth of Big Four Hollow Creek (U.S. Geological Survey station 03201700) show that pH ranged from 2.7 to 4.8, with a median of 3.1. The calculated iron load was 50 pounds per day. Data collecetd near the mouth of Big Four Hollow Creek (station 03201700) from 1971 through 1979 (before dike construction) show the daily pH ranged from 2.1 to 6.7; the median was 3.6. The daily specific conduction ranged from 72 to 3,500 microsiements per centimeter at 25? Celsius and averaged 770. The estimated loads of chemical constituents were: Sulfate, 1,100 pounds per day: iron, 54 pounds per day: and manganese, 12 pounds per day. All postconstruction data collected at station 03201700 through the end of the project, May 1980 through June 30, 1983, show that the daily pH ranged from 2.4 to 7.7, with a median of 3.7. Daily specific conductance ranged from 87 to 3,200 microsiemens per centimeter and averaged 1,200. The estimated loads of chemical constituents for this period were: Sulfate, 1,000 pounds per day: iron, 44 pounds per day: and manganese, 16 pounds per day. Standard nonparametric statistical tests were performed on the data collected before and after reclamation. Differences at the 95-percent confidence level were found in the before- and after-reclamation data sets for specific conductance, aluminum, and manganese at station 03201700. Data collected during the first 6 months after reclamation indicated moderate improvement in water quality only because no highly mineralized water was leaking from the closed mine. Later, perhaps in Sepember 1980 increased hydraulic head behind the clay dike caused the mine water to seep out and degrade the stream-water quality. In order to investigate leakages, dye was injected into two wells that penetrated the closed mine complex 88. One injection revealed that the dye moved to a discharge point at a nearby mine entrance known to be connected to complex 88. No discharge of dye was detected as a result of dye injection into the other well during the project. Acidic mine water continues to seep from the closed mine complex 88. A definitive evaluation of the effects of reclamation on the area's water quality cannot be made until the hydrologic system stabilizes.

  18. What does it mean to be green?

    PubMed

    Kleiner, A

    1991-01-01

    Today a company is not considered environmentalist unless it moves beyond mere compliance with government regulations to behavior its competitors, and even customers, do not expect. How should it set its agenda? Author Art Kleiner proposes that, to be green, a company must ask three questions: What products should we bring to market? How much disclosure of pollution information should we support? And how can we reduce waste at its source? These questions can't be answered, Kleiner says, unless managers insist on sustainable growth. In this sense, a big investment in environmentalism is like a big one in R&D--both presuppose patient capital and managerial maturity. What are green products? Kleiner cautions against giving in to misinformed public opinion--as McDonald's did in giving up its styrene "clamshells," which were more recyclable than the composite papers it switched to. Rather, companies should rely on literature that analyzes the product life cycle. As for public disclosure, the benefits may be unexpected. Federal legislation requiring companies to report the emission of potentially hazardous waste to a central data bank has not made environmentalists attack them. Rather, it has forced companies to learn what chemicals they inadvertently produce and how much--knowledge that helps them improve production processes. Sharing it helps ecological researchers study the combined effects of plant emissions. As for pollution prevention, Kleiner notes the analogy to quality and observes that it is better to design harmful waste products out of the system than catch them at the end of the line.(ABSTRACT TRUNCATED AT 250 WORDS)

  19. Mid-level perceptual features distinguish objects of different real-world sizes.

    PubMed

    Long, Bria; Konkle, Talia; Cohen, Michael A; Alvarez, George A

    2016-01-01

    Understanding how perceptual and conceptual representations are connected is a fundamental goal of cognitive science. Here, we focus on a broad conceptual distinction that constrains how we interact with objects--real-world size. Although there appear to be clear perceptual correlates for basic-level categories (apples look like other apples, oranges look like other oranges), the perceptual correlates of broader categorical distinctions are largely unexplored, i.e., do small objects look like other small objects? Because there are many kinds of small objects (e.g., cups, keys), there may be no reliable perceptual features that distinguish them from big objects (e.g., cars, tables). Contrary to this intuition, we demonstrated that big and small objects have reliable perceptual differences that can be extracted by early stages of visual processing. In a series of visual search studies, participants found target objects faster when the distractor objects differed in real-world size. These results held when we broadly sampled big and small objects, when we controlled for low-level features and image statistics, and when we reduced objects to texforms--unrecognizable textures that loosely preserve an object's form. However, this effect was absent when we used more basic textures. These results demonstrate that big and small objects have reliably different mid-level perceptual features, and suggest that early perceptual information about broad-category membership may influence downstream object perception, recognition, and categorization processes. (c) 2015 APA, all rights reserved).

  20. The Sociology of Traditional, Complementary and Alternative Medicine

    PubMed Central

    Gale, Nicola

    2014-01-01

    Complementary and alternative medicine (CAM) and traditional medicine (TM) are important social phenomena. This article reviews the sociological literature on the topic. First, it addresses the question of terminology, arguing that the naming process is a glimpse into the complexities of power and history that characterize the field. Second, focusing on the last 15 years of scholarship, it considers how sociological research on users and practitioners of TM/CAM has developed in that time. Third, it addresses two newer strands of work termed here the ‘big picture’ and the ‘big question’. The big picture includes concepts that offer interpretation of what is happening at a societal level to constrain and enable observed patterns of social practice (pluralism, integration, hybridity and activism). The big question, ‘Does it work?’, is one of epistemology and focuses on two developing fields of critical enquiry – first, social critiques of medical science knowledge production and, second, attempts to explain the nature of interventions, i.e. how they work. Finally, the article examines the role of sociology moving forward. PMID:25177359

  1. Finding the traces of behavioral and cognitive processes in big data and naturally occurring datasets.

    PubMed

    Paxton, Alexandra; Griffiths, Thomas L

    2017-10-01

    Today, people generate and store more data than ever before as they interact with both real and virtual environments. These digital traces of behavior and cognition offer cognitive scientists and psychologists an unprecedented opportunity to test theories outside the laboratory. Despite general excitement about big data and naturally occurring datasets among researchers, three "gaps" stand in the way of their wider adoption in theory-driven research: the imagination gap, the skills gap, and the culture gap. We outline an approach to bridging these three gaps while respecting our responsibilities to the public as participants in and consumers of the resulting research. To that end, we introduce Data on the Mind ( http://www.dataonthemind.org ), a community-focused initiative aimed at meeting the unprecedented challenges and opportunities of theory-driven research with big data and naturally occurring datasets. We argue that big data and naturally occurring datasets are most powerfully used to supplement-not supplant-traditional experimental paradigms in order to understand human behavior and cognition, and we highlight emerging ethical issues related to the collection, sharing, and use of these powerful datasets.

  2. Unsupervised learning in probabilistic neural networks with multi-state metal-oxide memristive synapses

    NASA Astrophysics Data System (ADS)

    Serb, Alexander; Bill, Johannes; Khiat, Ali; Berdan, Radu; Legenstein, Robert; Prodromakis, Themis

    2016-09-01

    In an increasingly data-rich world the need for developing computing systems that cannot only process, but ideally also interpret big data is becoming continuously more pressing. Brain-inspired concepts have shown great promise towards addressing this need. Here we demonstrate unsupervised learning in a probabilistic neural network that utilizes metal-oxide memristive devices as multi-state synapses. Our approach can be exploited for processing unlabelled data and can adapt to time-varying clusters that underlie incoming data by supporting the capability of reversible unsupervised learning. The potential of this work is showcased through the demonstration of successful learning in the presence of corrupted input data and probabilistic neurons, thus paving the way towards robust big-data processors.

  3. [Degradation of p-nitrophenol by high voltage pulsed discharge and ozone processes].

    PubMed

    Pan, Li-li; Yan, Guo-qi; Zheng, Fei-yan; Liang, Guo-wei; Fu, Jian-jun

    2005-11-01

    The vigorous oxidation by ozone and the high energy by pulsed discharge are utilized to degrade the big hazardous molecules. And these big hazardous molecules become small and less hazardous by this process in order to improve the biodegradability. When pH value is 8-9, the concentration of p-nitrophenol solution can be degraded by 96.8% and the degradation efficiency of TOC is 38.6% by ozone and pulsed discharge treatment for 30 mins. The comparison results show that the combination treatment efficiency is higher than the separate, so the combination of ozone and pulsed discharge has high synergism. It is approved that the phenyl degradation efficiency is high and the degradation efficiency of linear molecules is relative low.

  4. Mapping of Drug-like Chemical Universe with Reduced Complexity Molecular Frameworks.

    PubMed

    Kontijevskis, Aleksejs

    2017-04-24

    The emergence of the DNA-encoded chemical libraries (DEL) field in the past decade has attracted the attention of the pharmaceutical industry as a powerful mechanism for the discovery of novel drug-like hits for various biological targets. Nuevolution Chemetics technology enables DNA-encoded synthesis of billions of chemically diverse drug-like small molecule compounds, and the efficient screening and optimization of these, facilitating effective identification of drug candidates at an unprecedented speed and scale. Although many approaches have been developed by the cheminformatics community for the analysis and visualization of drug-like chemical space, most of them are restricted to the analysis of a maximum of a few millions of compounds and cannot handle collections of 10 8 -10 12 compounds typical for DELs. To address this big chemical data challenge, we developed the Reduced Complexity Molecular Frameworks (RCMF) methodology as an abstract and very general way of representing chemical structures. By further introducing RCMF descriptors, we constructed a global framework map of drug-like chemical space and demonstrated how chemical space occupied by multi-million-member drug-like Chemetics DNA-encoded libraries and virtual combinatorial libraries with >10 12 members could be analyzed and mapped without a need for library enumeration. We further validate the approach by performing RCMF-based searches in a drug-like chemical universe and mapping Chemetics library selection outputs for LSD1 targets on a global framework chemical space map.

  5. A Spatiotemporal Indexing Approach for Efficient Processing of Big Array-Based Climate Data with MapReduce

    NASA Technical Reports Server (NTRS)

    Li, Zhenlong; Hu, Fei; Schnase, John L.; Duffy, Daniel Q.; Lee, Tsengdar; Bowen, Michael K.; Yang, Chaowei

    2016-01-01

    Climate observations and model simulations are producing vast amounts of array-based spatiotemporal data. Efficient processing of these data is essential for assessing global challenges such as climate change, natural disasters, and diseases. This is challenging not only because of the large data volume, but also because of the intrinsic high-dimensional nature of geoscience data. To tackle this challenge, we propose a spatiotemporal indexing approach to efficiently manage and process big climate data with MapReduce in a highly scalable environment. Using this approach, big climate data are directly stored in a Hadoop Distributed File System in its original, native file format. A spatiotemporal index is built to bridge the logical array-based data model and the physical data layout, which enables fast data retrieval when performing spatiotemporal queries. Based on the index, a data-partitioning algorithm is applied to enable MapReduce to achieve high data locality, as well as balancing the workload. The proposed indexing approach is evaluated using the National Aeronautics and Space Administration (NASA) Modern-Era Retrospective Analysis for Research and Applications (MERRA) climate reanalysis dataset. The experimental results show that the index can significantly accelerate querying and processing (10 speedup compared to the baseline test using the same computing cluster), while keeping the index-to-data ratio small (0.0328). The applicability of the indexing approach is demonstrated by a climate anomaly detection deployed on a NASA Hadoop cluster. This approach is also able to support efficient processing of general array-based spatiotemporal data in various geoscience domains without special configuration on a Hadoop cluster.

  6. A target sample of adolescents and reward processing: same neural and behavioral correlates engaged in common paradigms?

    PubMed

    Nees, Frauke; Vollstädt-Klein, Sabine; Fauth-Bühler, Mira; Steiner, Sabina; Mann, Karl; Poustka, Luise; Banaschewski, Tobias; Büchel, Christian; Conrod, Patricia J; Garavan, Hugh; Heinz, Andreas; Ittermann, Bernd; Artiges, Eric; Paus, Tomas; Pausova, Zdenka; Rietschel, Marcella; Smolka, Michael N; Struve, Maren; Loth, Eva; Schumann, Gunter; Flor, Herta

    2012-11-01

    Adolescence is a transition period that is assumed to be characterized by increased sensitivity to reward. While there is growing research on reward processing in adolescents, investigations into the engagement of brain regions under different reward-related conditions in one sample of healthy adolescents, especially in a target age group, are missing. We aimed to identify brain regions preferentially activated in a reaction time task (monetary incentive delay (MID) task) and a simple guessing task (SGT) in a sample of 14-year-old adolescents (N = 54) using two commonly used reward paradigms. Functional magnetic resonance imaging was employed during the MID with big versus small versus no win conditions and the SGT with big versus small win and big versus small loss conditions. Analyses focused on changes in blood oxygen level-dependent contrasts during reward and punishment processing in anticipation and feedback phases. We found clear magnitude-sensitive response in reward-related brain regions such as the ventral striatum during anticipation in the MID task, but not in the SGT. This was also true for reaction times. The feedback phase showed clear reward-related, but magnitude-independent, response patterns, for example in the anterior cingulate cortex, in both tasks. Our findings highlight neural and behavioral response patterns engaged in two different reward paradigms in one sample of 14-year-old healthy adolescents and might be important for reference in future studies investigating reward and punishment processing in a target age group.

  7. New CVD-based method for the growth of high-quality crystalline zinc oxide layers

    NASA Astrophysics Data System (ADS)

    Huber, Florian; Madel, Manfred; Reiser, Anton; Bauer, Sebastian; Thonke, Klaus

    2016-07-01

    High-quality zinc oxide (ZnO) layers were grown using a new chemical vapour deposition (CVD)-based low-cost growth method. The process is characterized by total simplicity, high growth rates, and cheap, less hazardous precursors. To produce elementary zinc vapour, methane (CH4) is used to reduce a ZnO powder. By re-oxidizing the zinc with pure oxygen, highly crystalline ZnO layers were grown on gallium nitride (GaN) layers and on sapphire substrates with an aluminum nitride (AlN) nucleation layer. Using simple CH4 as precursor has the big advantage of good controllability and the avoidance of highly toxic gases like nitrogen oxides. In photoluminescence (PL) measurements the samples show a strong near-band-edge emission and a sharp line width at 5 K. The good crystal quality has been confirmed in high resolution X-ray diffraction (HRXRD) measurements. This new growth method has great potential for industrial large-scale production of high-quality single crystal ZnO layers.

  8. Chlorine doped graphene quantum dots: Preparation, properties, and photovoltaic detectors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhao, Jianhong; Xiang, Jinzhong, E-mail: jzhxiang@ynu.edu.cn; Tang, Libin, E-mail: scitang@163.com

    Graphene quantum dots (GQDs) are becoming one of the hottest advanced functional materials because of the opening of the bandgap due to quantum confinement effect, which shows unique optical and electrical properties. The chlorine doped GQDs (Cl-GQDs) have been fabricated by chemical exfoliation of HCl treated carbon fibers (CFs), which were prepared from degreasing cotton through an annealing process at 1000 °C for 30 min. Raman study shows that both G and 2D peaks of GQDs may be redshifted (softened) by chlorine doping, leading to an n-type doping. The first vertical (Cl)-GQDs based photovoltaic detectors have been demonstrated, both the light absorbingmore » and electron-accepting roles for (Cl)-GQDs in photodetection have been found, resulting in an exceptionally big ratio of photocurrent to dark current as high as ∼10{sup 5} at room temperature using a 405 nm laser irradiation under the reverse bias voltage. The study expands the application of (Cl)-GQDs to the important optoelectronic detection devices.« less

  9. Nuclear power plant 5,000 to 10,000 kilowatts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    The purpose of this proposal is to present a suggested program for the development of an Aqueous Homogeneous Reactor Power Plant for the production of power in the 5000 to 10,000 kilowatt range under the terms of the Atomic Energy Commission's invitation of September 21, 1955. It envisions a research and development program prior to finalizing fabricating commitments of full scale components for the purpose of proving mechanical and hydraulic operating and chemical processing feasibility with the expectation that such preliminary effort will assure the contruction of the reactor at the lowest cost and successful operation at the earliest date.more » It proposes the construction of a reactor for an eventual net electrical output of ten megawatts but initially in conjunction with a five megawatt turbo-generating unit. This unit would be constructed at the site of the existing Hersey diesel generating plant of the Wolverine Electric Cooperative approximately ten miles north of Big Rapids, Michigan.« less

  10. Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine

    PubMed Central

    Bao, Shunxing; Weitendorf, Frederick D.; Plassard, Andrew J.; Huo, Yuankai; Gokhale, Aniruddha; Landman, Bennett A.

    2016-01-01

    The field of big data is generally concerned with the scale of processing at which traditional computational paradigms break down. In medical imaging, traditional large scale processing uses a cluster computer that combines a group of workstation nodes into a functional unit that is controlled by a job scheduler. Typically, a shared-storage network file system (NFS) is used to host imaging data. However, data transfer from storage to processing nodes can saturate network bandwidth when data is frequently uploaded/retrieved from the NFS, e.g., “short” processing times and/or “large” datasets. Recently, an alternative approach using Hadoop and HBase was presented for medical imaging to enable co-location of data storage and computation while minimizing data transfer. The benefits of using such a framework must be formally evaluated against a traditional approach to characterize the point at which simply “large scale” processing transitions into “big data” and necessitates alternative computational frameworks. The proposed Hadoop system was implemented on a production lab-cluster alongside a standard Sun Grid Engine (SGE). Theoretical models for wall-clock time and resource time for both approaches are introduced and validated. To provide real example data, three T1 image archives were retrieved from a university secure, shared web database and used to empirically assess computational performance under three configurations of cluster hardware (using 72, 109, or 209 CPU cores) with differing job lengths. Empirical results match the theoretical models. Based on these data, a comparative analysis is presented for when the Hadoop framework will be relevant and non-relevant for medical imaging. PMID:28736473

  11. A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data

    NASA Astrophysics Data System (ADS)

    Li, Z.; Hodgson, M.; Li, W.

    2016-12-01

    Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.

  12. Using Big Data to Discover Diagnostics and Therapeutics for Gastrointestinal and Liver Diseases.

    PubMed

    Wooden, Benjamin; Goossens, Nicolas; Hoshida, Yujin; Friedman, Scott L

    2017-01-01

    Technologies such as genome sequencing, gene expression profiling, proteomic and metabolomic analyses, electronic medical records, and patient-reported health information have produced large amounts of data from various populations, cell types, and disorders (big data). However, these data must be integrated and analyzed if they are to produce models or concepts about physiological function or mechanisms of pathogenesis. Many of these data are available to the public, allowing researchers anywhere to search for markers of specific biological processes or therapeutic targets for specific diseases or patient types. We review recent advances in the fields of computational and systems biology and highlight opportunities for researchers to use big data sets in the fields of gastroenterology and hepatology to complement traditional means of diagnostic and therapeutic discovery. Copyright © 2017 AGA Institute. Published by Elsevier Inc. All rights reserved.

  13. Interpersonal conflict, agreeableness, and personality development.

    PubMed

    Jensen-Campbell, Lauri A; Gleason, Katie A; Adams, Ryan; Malcolm, Kenya T

    2003-12-01

    This multimethod research linked the Big-Five personality dimensions to interpersonal conflict in childhood. Agreeableness was the personality dimension of focus because this dimension has been associated with maintaining positive interpersonal relations in adolescents and adults. In two studies, elementary school children were assessed on the Big-Five domains of personality. Study 1 (n=276) showed that agreeableness was uniquely associated with endorsements of conflict resolution tactics in children as well as parent and teacher reports of coping and adjustment. Study 2 (n=234) revealed that children's perceptions of themselves and others during conflict was influenced by their agreeableness regardless of their partner's agreeableness. Observers also reported that pairs higher in agreeableness had more harmonious, constructive conflicts. Overall findings suggest that of the Big-Five dimensions, agreeableness is most closely associated with processes and outcomes related to interpersonal conflict and adjustment in children.

  14. Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure

    PubMed Central

    Badger, Jonathan; LaRose, Eric; Shirzadi, Ehsan; Mahnke, Andrea; Mayer, John; Ye, Zhan; Page, David; Peissig, Peggy

    2017-01-01

    Background The study of adverse drug events (ADEs) is a tenured topic in medical literature. In recent years, increasing numbers of scientific articles and health-related social media posts have been generated and shared daily, albeit with very limited use for ADE study and with little known about the content with respect to ADEs. Objective The aim of this study was to develop a big data analytics strategy that mines the content of scientific articles and health-related Web-based social media to detect and identify ADEs. Methods We analyzed the following two data sources: (1) biomedical articles and (2) health-related social media blog posts. We developed an intelligent and scalable text mining solution on big data infrastructures composed of Apache Spark, natural language processing, and machine learning. This was combined with an Elasticsearch No-SQL distributed database to explore and visualize ADEs. Results The accuracy, precision, recall, and area under receiver operating characteristic of the system were 92.7%, 93.6%, 93.0%, and 0.905, respectively, and showed better results in comparison with traditional approaches in the literature. This work not only detected and classified ADE sentences from big data biomedical literature but also scientifically visualized ADE interactions. Conclusions To the best of our knowledge, this work is the first to investigate a big data machine learning strategy for ADE discovery on massive datasets downloaded from PubMed Central and social media. This contribution illustrates possible capacities in big data biomedical text analysis using advanced computational methods with real-time update from new data published on a daily basis. PMID:29222076

  15. The Economics of Big Area Addtiive Manufacturing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Post, Brian; Lloyd, Peter D; Lindahl, John

    Case studies on the economics of Additive Manufacturing (AM) suggest that processing time is the dominant cost in manufacturing. Most additive processes have similar performance metrics: small part sizes, low production rates and expensive feedstocks. Big Area Additive Manufacturing is based on transitioning polymer extrusion technology from a wire to a pellet feedstock. Utilizing pellets significantly increases deposition speed and lowers material cost by utilizing low cost injection molding feedstock. The use of carbon fiber reinforced polymers eliminates the need for a heated chamber, significantly reducing machine power requirements and size constraints. We hypothesize that the increase in productivity coupledmore » with decrease in feedstock and energy costs will enable AM to become more competitive with conventional manufacturing processes for many applications. As a test case, we compare the cost of using traditional fused deposition modeling (FDM) with BAAM for additively manufacturing composite tooling.« less

  16. Studies of Big Data metadata segmentation between relational and non-relational databases

    NASA Astrophysics Data System (ADS)

    Golosova, M. V.; Grigorieva, M. A.; Klimentov, A. A.; Ryabinkin, E. A.; Dimitrov, G.; Potekhin, M.

    2015-12-01

    In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe data and workflows. These metadata are used to obtain information about current system state and for statistical and trend analysis of the processes these systems drive. Over the time the amount of the stored metadata can grow dramatically. In this article we present our studies to demonstrate how metadata storage scalability and performance can be improved by using hybrid RDBMS/NoSQL architecture.

  17. Occurrence and transport of nitrogen in the Big Sunflower River, northwestern Mississippi, October 2009-June 2011

    USGS Publications Warehouse

    Barlow, Jeannie R.B.; Coupe, Richard H.

    2014-01-01

    The Big Sunflower River Basin, located within the Yazoo River Basin, is subject to large annual inputs of nitrogen from agriculture, atmospheric deposition, and point sources. Understanding how nutrients are transported in, and downstream from, the Big Sunflower River is key to quantifying their eutrophying effects on the Gulf. Recent results from two Spatially Referenced Regressions on Watershed attributes (SPARROW models), which include the Big Sunflower River, indicate minimal losses of nitrogen in stream reaches typical of the main channels of major river systems. If SPARROW assumptions of relatively conservative transport of nitrogen are correct and surface-water losses through the bed of the Big Sunflower River are negligible, then options for managing nutrient loads to the Gulf of Mexico may be limited. Simply put, if every pound of nitrogen entering the Delta is eventually delivered to the Gulf, then the only effective nutrient management option in the Delta is to reduce inputs. If, on the other hand, it can be shown that processes within river channels of the Mississippi Delta act to reduce the mass of nitrogen in transport, other hydrologic approaches may be designed to further limit nitrogen transport. Direct validation of existing SPARROW models for the Delta is a first step in assessing the assumptions underlying those models. In order to characterize spatial and temporal variability of nitrogen in the Big Sunflower River Basin, water samples were collected at four U.S. Geological Survey gaging stations located on the Big Sunflower River between October 1, 2009, and June 30, 2011. Nitrogen concentrations were generally highest at each site during the spring of the 2010 water year and the fall and winter of the 2011 water year. Additionally, the dominant form of nitrogen varied between sites. For example, in samples collected from the most upstream site (Clarksdale), the concentration of organic nitrogen was generally higher than the concentrations of ammonia and nitrate plus nitrite; conversely, at sites farther downstream (that is, at Sunflower and Anguilla), nitrate plus nitrite concentrations were generally higher than concentrations of organic nitrogen and ammonia. In addition to the routinely collected samples, water samples from the Big Sunflower River Basin were collected using a Lagrangian sampling scheme, which attempts to follow a single mass of water through time in order to determine how it changes through processing or other pathways as the water moves downstream. Lagrangian sampling was conducted five times during the study period: (1) April 8–21, 2010, (2) May 12–June 3, 2010, (3) June 15–July 1, 2010, (4) August 23–30, 2010, and (5) May 16–20, 2011. Streamflow conditions were variable for each sampling event because of input from local precipitation and irrigation return flow, and streamflow losses through the streambed. Streamflow and total nitrogen flux increased with drainage area, and the dominant form of nitrogen varied with drainage area size and temporally across sampling events. Results from each method indicate relatively conservative transport of nitrogen within the 160 miles between Clarksdale and Anguilla, providing further validation of the SPARROW models. Furthermore, these results suggest relatively conservative transport of nitrogen from the Big Sunflower River to the Gulf of Mexico and, therefore, imply a fairly close association of nutrient application and export from the Big Sunflower River Basin to the Mississippi River. However, within the Big Sunflower River Basin, two potential nitrogen sinks were identified and include the transport and potential transformation of nitrogen through the streambed and the sequestration and potential transformation of nitrogen above the drainage control structures downstream of Anguilla. By coupling these potential loss mechanisms with nitrogen transport dynamics, it may be possible to further reduce the amount of nitrogen leaving the Big Sunflower River Basin and ultimately arriving at the Gulf of Mexico.

  18. Multi proxy chemical properties of freshwater sapropel

    NASA Astrophysics Data System (ADS)

    Stankevica, Karina; Rutina, Liga; Burlakovs, Juris; Klavins, Maris

    2014-05-01

    Freshwater sapropel is organic rich lake sediment firstly named "gyttja" by Hampus van Post in 1862. It is composed of organic remains such as shell detritus, plankton, chitin of insects, spores of higher plants and mineral part formed in eutrophic lake environments. The most appropriate environments for the formation of sapropel are in shallow, overgrown post-glacial lakes and valleys of big rivers in boreal zone, while thick deposits of such kind of organic sediments rarely can be found in lakes on permafrost, mountainous regions or areas with increased aridity. Organic lake sediments are divided in 3 classes according the content of organic matter and mineral part: biogenic, clastic and mixed. The value of sapropel as natural resource increases with the content of organic matter and main applications of sapropel are in agriculture, medicine, cosmetic and chemical industry. The research of sapropel in Latvia has shown that the total amount of this natural resource is close to 2 billion m3 or ~500 million tons. Sapropel has fine, dispersed structure and is plastic, but colour due to the high natural content of phosphorus usually is dark blue, later after drying it becomes light blue. Main research of the sapropel nowadays is turned to investigation of interactions among organic and mineral part of the sapropel with living organisms thus giving the inside look in processes and biological activity of the formation. From the chemical point of view sapropel contains lipids (bitumen), water-soluble substances that are readily hydrolyzed, including humic and fulvic acids, cellulose and the residual part, which does not hydrolyze. In this work we have analyzed the class of organic sapropel: peaty, cyanobacterial and green algal types, as well as siliceous sapropel, in order to determine the presence of biologically active substances, including humic substances, proteins and enzymes as well as to check free radical scavenging activity. Samples were collected from lakes which are recognized as promising for sapropel extraction and the study may benefit the use of sapropel for soil amendments, feed additives and chemical processing.

  19. "Big data" and the electronic health record.

    PubMed

    Ross, M K; Wei, W; Ohno-Machado, L

    2014-08-15

    Implementation of Electronic Health Record (EHR) systems continues to expand. The massive number of patient encounters results in high amounts of stored data. Transforming clinical data into knowledge to improve patient care has been the goal of biomedical informatics professionals for many decades, and this work is now increasingly recognized outside our field. In reviewing the literature for the past three years, we focus on "big data" in the context of EHR systems and we report on some examples of how secondary use of data has been put into practice. We searched PubMed database for articles from January 1, 2011 to November 1, 2013. We initiated the search with keywords related to "big data" and EHR. We identified relevant articles and additional keywords from the retrieved articles were added. Based on the new keywords, more articles were retrieved and we manually narrowed down the set utilizing predefined inclusion and exclusion criteria. Our final review includes articles categorized into the themes of data mining (pharmacovigilance, phenotyping, natural language processing), data application and integration (clinical decision support, personal monitoring, social media), and privacy and security. The increasing adoption of EHR systems worldwide makes it possible to capture large amounts of clinical data. There is an increasing number of articles addressing the theme of "big data", and the concepts associated with these articles vary. The next step is to transform healthcare big data into actionable knowledge.

  20. Making sense of metacommunities: dispelling the mythology of a metacommunity typology.

    PubMed

    Brown, Bryan L; Sokol, Eric R; Skelton, James; Tornwall, Brett

    2017-03-01

    Metacommunity ecology has rapidly become a dominant framework through which ecologists understand the natural world. Unfortunately, persistent misunderstandings regarding metacommunity theory and the methods for evaluating hypotheses based on the theory are common in the ecological literature. Since its beginnings, four major paradigms-species sorting, mass effects, neutrality, and patch dynamics-have been associated with metacommunity ecology. The Big 4 have been misconstrued to represent the complete set of metacommunity dynamics. As a result, many investigators attempt to evaluate community assembly processes as strictly belonging to one of the Big 4 types, rather than embracing the full scope of metacommunity theory. The Big 4 were never intended to represent the entire spectrum of metacommunity dynamics and were rather examples of historical paradigms that fit within the new framework. We argue that perpetuation of the Big 4 typology hurts community ecology and we encourage researchers to embrace the full inference space of metacommunity theory. A related, but distinct issue is that the technique of variation partitioning is often used to evaluate the dynamics of metacommunities. This methodology has produced its own set of misunderstandings, some of which are directly a product of the Big 4 typology and others which are simply the product of poor study design or statistical artefacts. However, variation partitioning is a potentially powerful technique when used appropriately and we identify several strategies for successful utilization of variation partitioning.

  1. A Big Five approach to self-regulation: personality traits and health trajectories in the Hawaii longitudinal study of personality and health.

    PubMed

    Hampson, Sarah E; Edmonds, Grant W; Barckley, Maureen; Goldberg, Lewis R; Dubanoski, Joan P; Hillier, Teresa A

    2016-01-01

    Self-regulatory processes influencing health outcomes may have their origins in childhood personality traits. The Big Five approach to personality was used here to investigate the associations between childhood traits, trait-related regulatory processes and changes in health across middle age. Participants (N = 1176) were members of the Hawaii longitudinal study of personality and health. Teacher assessments of the participants' traits when they were in elementary school were related to trajectories of self-rated health measured on 6 occasions over 14 years in middle age. Five trajectories of self-rated health were identified by latent class growth analysis: Stable Excellent, Stable Very Good, Good, Decreasing and Poor. Childhood Conscientiousness was the only childhood trait to predict membership in the Decreasing class vs. the combined healthy classes (Stable Excellent, Stable Very Good and Good), even after controlling for adult Conscientiousness and the other adult Big Five traits. The Decreasing class had poorer objectively assessed clinical health measured on one occasion in middle age, was less well-educated, and had a history of more lifespan health-damaging behaviors compared to the combined healthy classes. These findings suggest that higher levels of childhood Conscientiousness (i.e. greater self-discipline and goal-directedness) may prevent subsequent health decline decades later through self-regulatory processes involving the acquisition of lifelong healthful behavior patterns and higher educational attainment.

  2. Geology and ground-water resources of the Big Sandy Creek Valley, Lincoln, Cheyenne, and Kiowa Counties, Colorado; with a section on Chemical quality of the ground water

    USGS Publications Warehouse

    Coffin, Donald L.; Horr, Clarence Albert

    1967-01-01

    This report describes the geology and ground-water resources of that part of the Big Sandy Creek valley from about 6 miles east of Limon, Colo., downstream to the Kiowa County and Prowers County line, an area of about 1,400 square miles. The valley is drained by Big Sandy Creek and its principal tributary, Rush Creek. The land surface ranges from flat to rolling; the most irregular topography is in the sandhills south and west of Big Sandy Creek. Farming and livestock raising are the principal occupations. Irrigated lands constitute only a sin311 part of the project area, but during the last 15 years irrigation has expanded. Exposed rocks range in age from Late Cretaceous to Recent. They comprise the Carlile Shale, Niobrara Formations, Pierre Shale (all Late Cretaceous), upland deposits (Pleistocene), valley-fill deposits (Pleistocene and Recent), and dune sand (Pleistocene and Recent). Because the Upper Cretaceous formations are relatively impermeable and inhibit water movement, they allow ground water to accumul3te in the overlying unconsolidated Pleistocene and Recent deposits. The valley-fill deposits constitute the major aquifer and yield as much as 800 gpm (gallons per mixture) to wells along Big Sandy and Rush Creeks. Transmissibilities average about 45,000 gallons per day per foot. Maximum well yields in the tributary valleys are about 200 gpm and average 5 to 10 gpm. The dune sand and upland deposits generally are drained and yield water to wells in only a few places. The ground-water reservoir is recharged only from direct infiltration of precipitation, which annually averages about 12 inches for the entire basin, and from infiltration of floodwater. Floods in the ephemeral Big Sandy Creek are a major source of recharge to ground-water reservoirs. Observations of a flood near Kit Carson indicated that about 3 acre-feet of runoff percolated into the ground-water reservoir through each acre of the wetted stream channel The downstream decrease in channel and flood-plain width indicates that floodflows percolate to the ground-water reservoir. In the project area at least 94,000 acre-feet of water is evaporated and transpired from the valley fill along Big Sandy Creek, 1,500 acre-feet is pumped, 250 acre-feet leaves the area as underflow, and 10,000 acre-feet leaves as surface flow. Surface-water irrigation has been unsuccessful because of the failure of diversion dams and because of excessive seepage from reservoirs. Ground-water irrigation dates from about World War I; most of the 30 irrigation wells now in use, however, were drilled after 1937. Iv 1960 less than 1,000 acre-feet of water was pumped for irrigation, about 500 acre-feet was pumped for municipal use, and less than 10 acre-feet was pumped for rural use (stock and domestic). Although additional water is available in the valley-fill deposits of Big Sandy and Rush Creeks, large-scale irrigation probably will not develop in the immediate future; soils are unsuitable for crops in many places, and large water supplies are not available from individual wells. The dissolved-solids content of the ground water in the valley-fill deposits ranges from 507 to 5,420 parts per million. In the Big Sandy Creek valley the dissolved-solids content generally increases downstream, whereas in the Rush Creek valley the dissolved-solids content decreases downstream. Ground water in the Big Sandy Creek valley is suitable for most uses.

  3. Using OPC and HL7 Standards to Incorporate an Industrial Big Data Historian in a Health IT Environment.

    PubMed

    Cruz, Márcio Freire; Cavalcante, Carlos Arthur Mattos Teixeira; Sá Barretto, Sérgio Torres

    2018-05-30

    Health Level Seven (HL7) is one of the standards most used to centralize data from different vital sign monitoring systems. This solution significantly limits the data available for historical analysis, because it typically uses databases that are not effective in storing large volumes of data. In industry, a specific Big Data Historian, known as a Process Information Management System (PIMS), solves this problem. This work proposes the same solution to overcome the restriction on storing vital sign data. The PIMS needs a compatible communication standard to allow storing, and the one most commonly used is the OLE for Process Control (OPC). This paper presents a HL7-OPC Server that permits communication between vital sign monitoring systems with PIMS, thus allowing the storage of long historical series of vital signs. In addition, it carries out a review about local and cloud-based Big Medical Data researches, followed by an analysis of the PIMS in a Health IT Environment. Then it shows the architecture of HL7 and OPC Standards. Finally, it shows the HL7-OPC Server and a sequence of tests that proved its full operation and performance.

  4. Enhanced K-means clustering with encryption on cloud

    NASA Astrophysics Data System (ADS)

    Singh, Iqjot; Dwivedi, Prerna; Gupta, Taru; Shynu, P. G.

    2017-11-01

    This paper tries to solve the problem of storing and managing big files over cloud by implementing hashing on Hadoop in big-data and ensure security while uploading and downloading files. Cloud computing is a term that emphasis on sharing data and facilitates to share infrastructure and resources.[10] Hadoop is an open source software that gives us access to store and manage big files according to our needs on cloud. K-means clustering algorithm is an algorithm used to calculate distance between the centroid of the cluster and the data points. Hashing is a algorithm in which we are storing and retrieving data with hash keys. The hashing algorithm is called as hash function which is used to portray the original data and later to fetch the data stored at the specific key. [17] Encryption is a process to transform electronic data into non readable form known as cipher text. Decryption is the opposite process of encryption, it transforms the cipher text into plain text that the end user can read and understand well. For encryption and decryption we are using Symmetric key cryptographic algorithm. In symmetric key cryptography are using DES algorithm for a secure storage of the files. [3

  5. A multidisciplinary investigation of groundwater fluctuations and their control on river chemistry - Insights from river dissolved concentrations and Li isotopes during flood events

    NASA Astrophysics Data System (ADS)

    Kuessner, M.; Bouchez, J.; Dangeard, M.; Bodet, L.; Thiesson, J.; Didon-Lescot, J. F.; Frick, D. A.; Grard, N.; Guérin, R.; Domergue, J. M.; Gaillardet, J.

    2017-12-01

    Water flow exerts a strong control on weathering reactions in the Critical Zone (CZ). The relationships between hydrology and river chemistry have been widely studied for the past decades [1]. Solute export responds strongly to storm events [2] and investigating the concentration and isotope composition of trace elements in river catchments can advance our understanding of the processes governing water-rock interactions and provide information on the water flow paths during these "hot moments". Especially, lithium (Li) and its isotopes are sensitive to the balance between mineral dissolution and precipitation in the subsurface and therefore, a powerful tool to characterize the response of chemical weathering to hydrology [3]. Hence, high-frequency stream chemistry yields valuable insight into the hydrological processes within the catchment during "hot moments". This study focuses on a CZ Observatory (OHMCV, part of French Research Infrastructure OZCAR). The granitic catchment Sapine (0.54 km2, southern France) is afflicted by big rain events and therefore, it is an appropriate location to study stormflows. Here we combine results from high-frequency stream water sampling during rain events with time-lapse seismic imaging to monitor the changes in aquifer properties [4]. The relationships between concentrations and discharge indicate differential responses of dissolved elements to the hydrological forcing. Especially, systematic changes are observed for Li and its isotopes as a function of water discharge, suggesting maximum secondary mineral formation at intermediate discharge. We suggest that Li dynamics are chiefly influenced by the depth at which water is flowing with, e.g. dissolution of primary minerals in deeper groundwater flows, and water-secondary mineral interaction at shallower depths. The combination of elemental concentrations and Li isotopes in river dissolved load tracing chemical weathering, with hydrogeophysical methods mapping water flows and pools, provides us with a time-resolved image of the CZ, improving our knowledge of the impact of hydrological changes on the chemical mass budgets in catchments. [1] Maher et al. (2011), Earth Planet. Sci. Lett. [2] Kirchner et al. (2010), Hydrol. Processes. [3] Liu et al. (2015), Earth Planet. Sci. Lett. [4] see poster by M. Dangeard et al.

  6. Mash-up of techniques between data crawling/transfer, data preservation/stewardship and data processing/visualization technologies on a science cloud system designed for Earth and space science: a report of successful operation and science projects of the NICT Science Cloud

    NASA Astrophysics Data System (ADS)

    Murata, K. T.

    2014-12-01

    Data-intensive or data-centric science is 4th paradigm after observational and/or experimental science (1st paradigm), theoretical science (2nd paradigm) and numerical science (3rd paradigm). Science cloud is an infrastructure for 4th science methodology. The NICT science cloud is designed for big data sciences of Earth, space and other sciences based on modern informatics and information technologies [1]. Data flow on the cloud is through the following three techniques; (1) data crawling and transfer, (2) data preservation and stewardship, and (3) data processing and visualization. Original tools and applications of these techniques have been designed and implemented. We mash up these tools and applications on the NICT Science Cloud to build up customized systems for each project. In this paper, we discuss science data processing through these three steps. For big data science, data file deployment on a distributed storage system should be well designed in order to save storage cost and transfer time. We developed a high-bandwidth virtual remote storage system (HbVRS) and data crawling tool, NICTY/DLA and Wide-area Observation Network Monitoring (WONM) system, respectively. Data files are saved on the cloud storage system according to both data preservation policy and data processing plan. The storage system is developed via distributed file system middle-ware (Gfarm: GRID datafarm). It is effective since disaster recovery (DR) and parallel data processing are carried out simultaneously without moving these big data from storage to storage. Data files are managed on our Web application, WSDBank (World Science Data Bank). The big-data on the cloud are processed via Pwrake, which is a workflow tool with high-bandwidth of I/O. There are several visualization tools on the cloud; VirtualAurora for magnetosphere and ionosphere, VDVGE for google Earth, STICKER for urban environment data and STARStouch for multi-disciplinary data. There are 30 projects running on the NICT Science Cloud for Earth and space science. In 2003 56 refereed papers were published. At the end, we introduce a couple of successful results of Earth and space sciences using these three techniques carried out on the NICT Sciences Cloud. [1] http://sc-web.nict.go.jp

  7. Big Data and Deep data in scanning and electron microscopies: functionality from multidimensional data sets

    DOE PAGES

    Belianinov, Alex; Vasudevan, Rama K; Strelcov, Evgheni; ...

    2015-05-13

    The development of electron, and scanning probe microscopies in the second half of the twentieth century have produced spectacular images of internal structure and composition of matter with, at nanometer, molecular, and atomic resolution. Largely, this progress was enabled by computer-assisted methods of microscope operation, data acquisition and analysis. The progress in imaging technologies in the beginning of the twenty first century has opened the proverbial floodgates of high-veracity information on structure and functionality. High resolution imaging now allows information on atomic positions with picometer precision, allowing for quantitative measurements of individual bond length and angles. Functional imaging often leadsmore » to multidimensional data sets containing partial or full information on properties of interest, acquired as a function of multiple parameters (time, temperature, or other external stimuli). Here, we review several recent applications of the big and deep data analysis methods to visualize, compress, and translate this data into physically and chemically relevant information from imaging data.« less

  8. Big Data and Deep data in scanning and electron microscopies: functionality from multidimensional data sets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Belianinov, Alex; Vasudevan, Rama K; Strelcov, Evgheni

    The development of electron, and scanning probe microscopies in the second half of the twentieth century have produced spectacular images of internal structure and composition of matter with, at nanometer, molecular, and atomic resolution. Largely, this progress was enabled by computer-assisted methods of microscope operation, data acquisition and analysis. The progress in imaging technologies in the beginning of the twenty first century has opened the proverbial floodgates of high-veracity information on structure and functionality. High resolution imaging now allows information on atomic positions with picometer precision, allowing for quantitative measurements of individual bond length and angles. Functional imaging often leadsmore » to multidimensional data sets containing partial or full information on properties of interest, acquired as a function of multiple parameters (time, temperature, or other external stimuli). Here, we review several recent applications of the big and deep data analysis methods to visualize, compress, and translate this data into physically and chemically relevant information from imaging data.« less

  9. Big Bang Day : The Great Big Particle Adventure - 1. Atom

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    None

    2009-10-08

    In this series, comedian and physicist Ben Miller asks the CERN scientists what they hope to find. The notion of atoms dates back to Greek philosophers who sought a natural mechanical explanation of the Universe, as opposed to a divine one. The existence what we call chemical atoms, the constituents of all we see around us, wasn't proved until a hundred years ago, but almost simultaneously it was realised these weren't the indivisible constituents the Greeks envisaged. Much of the story of physics since then has been the ever-deeper probing of matter until, at the end of the 20th century,more » a complete list of fundamental ingredients had been identified, apart from one, the much discussed Higgs particle. In this programme, Ben finds out why this last particle is so pivotal, not just to atomic theory, but to our very existence - and how hopeful the scientists are of proving its existence.« less

  10. SETI as a part of Big History

    NASA Astrophysics Data System (ADS)

    Maccone, Claudio

    2014-08-01

    Big History is an emerging academic discipline which examines history scientifically from the Big Bang to the present. It uses a multidisciplinary approach based on combining numerous disciplines from science and the humanities, and explores human existence in the context of this bigger picture. It is taught at some universities. In a series of recent papers ([11] through [15] and [17] through [18]) and in a book [16], we developed a new mathematical model embracing Darwinian Evolution (RNA to Humans, see, in particular, [17] and Human History (Aztecs to USA, see [16]) and then we extrapolated even that into the future up to ten million years (see 18), the minimum time requested for a civilization to expand to the whole Milky Way (Fermi paradox). In this paper, we further extend that model in the past so as to let it start at the Big Bang (13.8 billion years ago) thus merging Big History, Evolution on Earth and SETI (the modern Search for ExtraTerrestrial Intelligence) into a single body of knowledge of a statistical type. Our idea is that the Geometric Brownian Motion (GBM), so far used as the key stochastic process of financial mathematics (Black-Sholes models and related 1997 Nobel Prize in Economics!) may be successfully applied to the whole of Big History. In particular, in this paper we derive

  11. Astrophysical S-factor of the 32He(α,γ) 733 7Be reaction in the Big-Bang nucleosynthesis

    NASA Astrophysics Data System (ADS)

    Ghamary, Motahareh; Sadeghi, Hossein; Mohammadi, Saeed

    2018-05-01

    In the present work, we have studied the properties of the 23He(α , γ) 47Be reaction. The direct radiative capture nuclear reactions in the Big-Bang nucleosynthesis mainly, are done in the external areas of inter-nuclear interaction range and play an essential role in nuclear astrophysics. Among of these reactions, the 23He(α , γ) 47Be reaction with Q = 1.586 MeV is the main part of the Big-Bang nucleosynthesis chain reactions. This reaction can be used to understand the physical and chemical properties of the sun as well as can be justified the lake of the observed solar neutrino in the detector of the Earth. Since product neutrino fluxes are predicated in the center of the sun by the decay of 7Be and 8B, and almost are proportional to the astrophysical S-factor for the 23He(α , γ) 47Be reaction, S34. The 23He(α , γ) 47Be reaction is considered the key to solve the solar neutrino puzzle. Finally, we have astrophysical S-factor obtained for the ground S1,3/2-, first excited S1,1/2-and total S34 states by modern nucleon-nucleon two-body local potential models. We have also compared the obtained S-factor with experimental data and other theoretical works.

  12. Herbicide treatment effects on properties of mountain big sagebrush soils after fourteen years

    NASA Technical Reports Server (NTRS)

    Burke, I. C.; Reiners, W. A.; Sturges, D. L.; Matson, P. A.

    1987-01-01

    The effects of sagebrush conversion on the soil properties of a high-elevation portion of the Western Intermountain Sagebrush Steppe (West, 1983) are described. Changes were found in only a few soil chemical properties after conversion to grassland. It was found that surface concentrations of N were lower under grass vegetation than under undisturbed vegetation. Undershrub net N mineralization rates were higher under shrubs in the sagebrush vegetation than under former shrubs in the grass vegetation.

  13. Neutrino Background from Population III Stars

    NASA Astrophysics Data System (ADS)

    Iocco, Fabio

    2011-12-01

    Population III Stars (PopIII) are the first generation of stars formed from the collapse of the very first structures in the Universe. Their peculiar chemical composition (metal-free, resembling the Primordial Nucleosynthesis yields) affects their formation and evolution and makes them unusually big and hot stars. They are good candidates for the engines of Reionization of the Universe although their direct observation is extremely difficult. Here we summarize a study of their expected diffuse low-energy neutrino background flux at Earth.

  14. Big pharma screening collections: more of the same or unique libraries? The AstraZeneca-Bayer Pharma AG case.

    PubMed

    Kogej, Thierry; Blomberg, Niklas; Greasley, Peter J; Mundt, Stefan; Vainio, Mikko J; Schamberger, Jens; Schmidt, Georg; Hüser, Jörg

    2013-10-01

    In this study, the screening collections of two major pharmaceutical companies (AstraZeneca and Bayer Pharma AG) have been compared using a 2D molecular fingerprint by a nearest neighborhood approach. Results revealed a low overlap between both collections in terms of compound identity and similarity. This emphasizes the value of screening multiple compound collections to expand the chemical space that can be accessed by high-throughput screening (HTS). Copyright © 2012 Elsevier Ltd. All rights reserved.

  15. Foam model of planetary formation

    NASA Astrophysics Data System (ADS)

    Andreev, Y.; Potashko, O.

    The Analysis of 2637 terrestrial minerals shows presence of characteristic element and isotope structure for each ore irrespective of its site. The model of processes geo-nuclear syntheses elements is offered due to avalanche merge of nucleus which simply explains these laws. Main assumption: nucleus, atoms, connections, ores and minerals were formed in volume of the modern Earth at an early stage of its evolution from uniform proto-substance. Substantive provisions of the model: 1)The most part of nucleus of atoms of all chemical elements of the Earth's crust were formed on the mechanism of avalanche chain merge practically in one stage (in geological scales) in a course of correlated(in scales of a planet) process with allocation of a plenty of heat. 2) Atoms of chemical elements were generated during cooling a planet with preservation of a relative spatial arrangement of nucleus. 3) Chemical compounds have arisen at cooling a surface of a planet and were accompanied by reorganizations (hashing) macro- and geo-scale. 4) Mineral formations are consequence of correlated behaviour of chemical compounds on microscopic scales during phase transition from gaseous or liquid to a firm condition. 5) Synthesis of chemical elements in deep layers of the Earth occurs till now. "Foaming'' instead of "Big Bang" The physical space is continual gas-fluid environment consist of super fluid foam. The continuity, keeping and uniqueness of proto-substance are postulated. Scenario: primary singularity-> droplets(proto-galaxies) droplets(proto-stars)-> droplets(proto-planets)-> droplets(proto- satellites)-> droplets. Proto-planet substance->proton+electron as 1st generation disintegration result of primary foam. Nuclei or nucleonic crystals are the 2nd generation in result of cascade merge of protons into conglomerates. The theory has applied to the analysis of samples of native copper deposit from Rafalovka's ore deposit in Ukraine. The abundance of elements by use of the roentgen fluorescent microanalysis has been made. Changes of a parity of elements are described by nuclear synthesis reactions: 16O+47Ti, 23Na+40Ca, 24Mg+39K, 31P+32S-> 63Cu; 16O+49Ti, 23Na+42Ca, 26Mg+39K, 31P+34S-> 65Cu Dramatical change of isotope parities of 56Fe and 57Fe in the sites of space carried on 3 millimetres. The content of 57Fe is greater then 56Fe in Cu granule.

  16. Augmented reality enabling intelligence exploitation at the edge

    NASA Astrophysics Data System (ADS)

    Kase, Sue E.; Roy, Heather; Bowman, Elizabeth K.; Patton, Debra

    2015-05-01

    Today's Warfighters need to make quick decisions while interacting in densely populated environments comprised of friendly, hostile, and neutral host nation locals. However, there is a gap in the real-time processing of big data streams for edge intelligence. We introduce a big data processing pipeline called ARTEA that ingests, monitors, and performs a variety of analytics including noise reduction, pattern identification, and trend and event detection in the context of an area of operations (AOR). Results of the analytics are presented to the Soldier via an augmented reality (AR) device Google Glass (Glass). Non-intrusive AR devices such as Glass can visually communicate contextually relevant alerts to the Soldier based on the current mission objectives, time, location, and observed or sensed activities. This real-time processing and AR presentation approach to knowledge discovery flattens the intelligence hierarchy enabling the edge Soldier to act as a vital and active participant in the analysis process. We report preliminary observations testing ARTEA and Glass in a document exploitation and person of interest scenario simulating edge Soldier participation in the intelligence process in disconnected deployment conditions.

  17. Social customer relationship management: taking advantage of Web 2.0 and Big Data technologies.

    PubMed

    Orenga-Roglá, Sergio; Chalmeta, Ricardo

    2016-01-01

    The emergence of Web 2.0 and Big Data technologies has allowed a new customer relationship strategy based on interactivity and collaboration called Social Customer Relationship Management (Social CRM) to be created. This enhances customer engagement and satisfaction. The implementation of Social CRM is a complex task that involves different organisational, human and technological aspects. However, there is a lack of methodologies to assist companies in these processes. This paper shows a novel methodology that helps companies to implement Social CRM, taking into account different aspects such as social customer strategy, the Social CRM performance measurement system, the Social CRM business processes, or the Social CRM computer system. The methodology was applied to one company in order to validate and refine it.

  18. Distributed and parallel approach for handle and perform huge datasets

    NASA Astrophysics Data System (ADS)

    Konopko, Joanna

    2015-12-01

    Big Data refers to the dynamic, large and disparate volumes of data comes from many different sources (tools, machines, sensors, mobile devices) uncorrelated with each others. It requires new, innovative and scalable technology to collect, host and analytically process the vast amount of data. Proper architecture of the system that perform huge data sets is needed. In this paper, the comparison of distributed and parallel system architecture is presented on the example of MapReduce (MR) Hadoop platform and parallel database platform (DBMS). This paper also analyzes the problem of performing and handling valuable information from petabytes of data. The both paradigms: MapReduce and parallel DBMS are described and compared. The hybrid architecture approach is also proposed and could be used to solve the analyzed problem of storing and processing Big Data.

  19. Technology Evaluation for the Big Spring Water Treatment System at the Y-12 National Security Complex, Oak Ridge, Tennessee

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Becthel Jacobs Company LLC

    2002-11-01

    The Y-12 National Security Complex (Y-12 Complex) is an active manufacturing and developmental engineering facility that is located on the U.S. Department of Energy (DOE) Oak Ridge Reservation. Building 9201-2 was one of the first process buildings constructed at the Y-12 Complex. Construction involved relocating and straightening of the Upper East Fork Poplar Creek (UEFPC) channel, adding large quantities of fill material to level areas along the creek, and pumping of concrete into sinkholes and solution cavities present within the limestone bedrock. Flow from a large natural spring designated as ''Big Spring'' on the original 1943 Stone & Webster Buildingmore » 9201-2 Field Sketch FS6003 was captured and directed to UEFPC through a drainpipe designated Outfall 51. The building was used from 1953 to 1955 for pilot plant operations for an industrial process that involved the use of large quantities of elemental mercury. Past operations at the Y-12 Complex led to the release of mercury to the environment. Significant environmental media at the site were contaminated by accidental releases of mercury from the building process facilities piping and sumps associated with Y-12 Complex mercury handling facilities. Releases to the soil surrounding the buildings have resulted in significant levels of mercury in these areas of contamination, which is ultimately transported to UEFPC, its streambed, and off-site. Bechtel Jacobs Company LLC (BJC) is the DOE-Oak Ridge Operations prime contractor responsible for conducting environmental restoration activities at the Y-12 Complex. In order to mitigate the mercury being released to UEFPC, the Big Spring Water Treatment System will be designed and constructed as a Comprehensive Environmental Response, Compensation, and Liability Act action. This facility will treat the combined flow from Big Spring feeding Outfall 51 and the inflow now being processed at the East End Mercury Treatment System (EEMTS). Both discharge to UEFPC adjacent to Bldg. 9201-2. The EEMTS treats mercury-contaminated groundwater that collects in sumps in the basement of Bldg. 9201-2. A pre-design study was performed to investigate the applicability of various treatment technologies for reducing mercury discharges at Outfall 51 in support of the design of the Big Spring Water Treatment System. This document evaluates the results of the pre-design study for selection of the mercury removal technology for the treatment system.« less

  20. Utilizing big data to provide better health at lower cost.

    PubMed

    Jones, Laney K; Pulk, Rebecca; Gionfriddo, Michael R; Evans, Michael A; Parry, Dean

    2018-04-01

    The efficient use of big data in order to provide better health at a lower cost is described. As data become more usable and accessible in healthcare, organizations need to be prepared to use this information to positively impact patient care. In order to be successful, organizations need teams with expertise in informatics and data management that can build new infrastructure and restructure existing infrastructure to support quality and process improvements in real time, such as creating discrete data fields that can be easily retrieved and used to analyze and monitor care delivery. Organizations should use data to monitor performance (e.g., process metrics) as well as the health of their populations (e.g., clinical parameters and health outcomes). Data can be used to prevent hospitalizations, combat opioid abuse and misuse, improve antimicrobial stewardship, and reduce pharmaceutical spending. These examples also serve to highlight lessons learned to better use data to improve health. For example, data can inform and create efficiencies in care and engage and communicate with stakeholders early and often, and collaboration is necessary to have complete data. To truly transform care so that it is delivered in a way that is sustainable, responsible, and patient-centered, health systems need to act on these opportunities, invest in big data, and routinely use big data in the delivery of care. Using data efficiently has the potential to improve the care of our patients and lower cost. Despite early successes, barriers to implementation remain including data acquisition, integration, and usability. Copyright © 2018 by the American Society of Health-System Pharmacists, Inc. All rights reserved.

Top