Scaling up to address data science challenges
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wendelberger, Joanne R.
Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less
Scaling up to address data science challenges
Wendelberger, Joanne R.
2017-04-27
Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less
A large-scale perspective on stress-induced alterations in resting-state networks
NASA Astrophysics Data System (ADS)
Maron-Katz, Adi; Vaisvaser, Sharon; Lin, Tamar; Hendler, Talma; Shamir, Ron
2016-02-01
Stress is known to induce large-scale neural modulations. However, its neural effect once the stressor is removed and how it relates to subjective experience are not fully understood. Here we used a statistically sound data-driven approach to investigate alterations in large-scale resting-state functional connectivity (rsFC) induced by acute social stress. We compared rsfMRI profiles of 57 healthy male subjects before and after stress induction. Using a parcellation-based univariate statistical analysis, we identified a large-scale rsFC change, involving 490 parcel-pairs. Aiming to characterize this change, we employed statistical enrichment analysis, identifying anatomic structures that were significantly interconnected by these pairs. This analysis revealed strengthening of thalamo-cortical connectivity and weakening of cross-hemispheral parieto-temporal connectivity. These alterations were further found to be associated with change in subjective stress reports. Integrating report-based information on stress sustainment 20 minutes post induction, revealed a single significant rsFC change between the right amygdala and the precuneus, which inversely correlated with the level of subjective recovery. Our study demonstrates the value of enrichment analysis for exploring large-scale network reorganization patterns, and provides new insight on stress-induced neural modulations and their relation to subjective experience.
Stanzel, Sven; Weimer, Marc; Kopp-Schneider, Annette
2013-06-01
High-throughput screening approaches are carried out for the toxicity assessment of a large number of chemical compounds. In such large-scale in vitro toxicity studies several hundred or thousand concentration-response experiments are conducted. The automated evaluation of concentration-response data using statistical analysis scripts saves time and yields more consistent results in comparison to data analysis performed by the use of menu-driven statistical software. Automated statistical analysis requires that concentration-response data are available in a standardised data format across all compounds. To obtain consistent data formats, a standardised data management workflow must be established, including guidelines for data storage, data handling and data extraction. In this paper two procedures for data management within large-scale toxicological projects are proposed. Both procedures are based on Microsoft Excel files as the researcher's primary data format and use a computer programme to automate the handling of data files. The first procedure assumes that data collection has not yet started whereas the second procedure can be used when data files already exist. Successful implementation of the two approaches into the European project ACuteTox is illustrated. Copyright © 2012 Elsevier Ltd. All rights reserved.
Explore the Usefulness of Person-Fit Analysis on Large-Scale Assessment
ERIC Educational Resources Information Center
Cui, Ying; Mousavi, Amin
2015-01-01
The current study applied the person-fit statistic, l[subscript z], to data from a Canadian provincial achievement test to explore the usefulness of conducting person-fit analysis on large-scale assessments. Item parameter estimates were compared before and after the misfitting student responses, as identified by l[subscript z], were removed. The…
Robust Fixed-Structure Control
1994-10-30
Deterministic Foundation for Statistical Energy Analysis ," J. Sound Vibr., to appear. 1.96 D. S. Bernstein and S. P. Bhat, "Lyapunov Stability, Semistability...S. Bernstein, "Power Flow, Energy Balance, and Statistical Energy Analysis for Large Scale, Interconnected Systems," Proc. Amer. Contr. Conf., pp
ERIC Educational Resources Information Center
Fagginger Auer, Marije F.; Hickendorff, Marian; Van Putten, Cornelis M.; Béguin, Anton A.; Heiser, Willem J.
2016-01-01
A first application of multilevel latent class analysis (MLCA) to educational large-scale assessment data is demonstrated. This statistical technique addresses several of the challenges that assessment data offers. Importantly, MLCA allows modeling of the often ignored teacher effects and of the joint influence of teacher and student variables.…
Statistical Analysis of Large-Scale Structure of Universe
NASA Astrophysics Data System (ADS)
Tugay, A. V.
While galaxy cluster catalogs were compiled many decades ago, other structural elements of cosmic web are detected at definite level only in the newest works. For example, extragalactic filaments were described by velocity field and SDSS galaxy distribution during the last years. Large-scale structure of the Universe could be also mapped in the future using ATHENA observations in X-rays and SKA in radio band. Until detailed observations are not available for the most volume of Universe, some integral statistical parameters can be used for its description. Such methods as galaxy correlation function, power spectrum, statistical moments and peak statistics are commonly used with this aim. The parameters of power spectrum and other statistics are important for constraining the models of dark matter, dark energy, inflation and brane cosmology. In the present work we describe the growth of large-scale density fluctuations in one- and three-dimensional case with Fourier harmonics of hydrodynamical parameters. In result we get power-law relation for the matter power spectrum.
NASA Astrophysics Data System (ADS)
Wang, Lixia; Pei, Jihong; Xie, Weixin; Liu, Jinyuan
2018-03-01
Large-scale oceansat remote sensing images cover a big area sea surface, which fluctuation can be considered as a non-stationary process. Short-Time Fourier Transform (STFT) is a suitable analysis tool for the time varying nonstationary signal. In this paper, a novel ship detection method using 2-D STFT sea background statistical modeling for large-scale oceansat remote sensing images is proposed. First, the paper divides the large-scale oceansat remote sensing image into small sub-blocks, and 2-D STFT is applied to each sub-block individually. Second, the 2-D STFT spectrum of sub-blocks is studied and the obvious different characteristic between sea background and non-sea background is found. Finally, the statistical model for all valid frequency points in the STFT spectrum of sea background is given, and the ship detection method based on the 2-D STFT spectrum modeling is proposed. The experimental result shows that the proposed algorithm can detect ship targets with high recall rate and low missing rate.
NASA Technical Reports Server (NTRS)
Over, Thomas, M.; Gupta, Vijay K.
1994-01-01
Under the theory of independent and identically distributed random cascades, the probability distribution of the cascade generator determines the spatial and the ensemble properties of spatial rainfall. Three sets of radar-derived rainfall data in space and time are analyzed to estimate the probability distribution of the generator. A detailed comparison between instantaneous scans of spatial rainfall and simulated cascades using the scaling properties of the marginal moments is carried out. This comparison highlights important similarities and differences between the data and the random cascade theory. Differences are quantified and measured for the three datasets. Evidence is presented to show that the scaling properties of the rainfall can be captured to the first order by a random cascade with a single parameter. The dependence of this parameter on forcing by the large-scale meteorological conditions, as measured by the large-scale spatial average rain rate, is investigated for these three datasets. The data show that this dependence can be captured by a one-to-one function. Since the large-scale average rain rate can be diagnosed from the large-scale dynamics, this relationship demonstrates an important linkage between the large-scale atmospheric dynamics and the statistical cascade theory of mesoscale rainfall. Potential application of this research to parameterization of runoff from the land surface and regional flood frequency analysis is briefly discussed, and open problems for further research are presented.
Statistical Analysis of Big Data on Pharmacogenomics
Fan, Jianqing; Liu, Han
2013-01-01
This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
Planck 2015 results. XVI. Isotropy and statistics of the CMB
NASA Astrophysics Data System (ADS)
Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Akrami, Y.; Aluri, P. K.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartolo, N.; Basak, S.; Battaner, E.; Benabed, K.; Benoît, A.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bock, J. J.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Boulanger, F.; Bucher, M.; Burigana, C.; Butler, R. C.; Calabrese, E.; Cardoso, J.-F.; Casaponsa, B.; Catalano, A.; Challinor, A.; Chamballu, A.; Chiang, H. C.; Christensen, P. R.; Church, S.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Combet, C.; Contreras, D.; Couchot, F.; Coulais, A.; Crill, B. P.; Cruz, M.; Curto, A.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Diego, J. M.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Fantaye, Y.; Fergusson, J.; Fernandez-Cobos, R.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Frejsel, A.; Frolov, A.; Galeotta, S.; Galli, S.; Ganga, K.; Gauthier, C.; Ghosh, T.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gratton, S.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Hanson, D.; Harrison, D. L.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huang, Z.; Huffenberger, K. M.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Kim, J.; Kisner, T. S.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Lesgourgues, J.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; Liu, H.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maggio, G.; Maino, D.; Mandolesi, N.; Mangilli, A.; Marinucci, D.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; McGehee, P.; Meinhold, P. R.; Melchiorri, A.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mikkelsen, K.; Mitra, S.; Miville-Deschênes, M.-A.; Molinari, D.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Moss, A.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Netterfield, C. B.; Nørgaard-Nielsen, H. U.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Pant, N.; Paoletti, D.; Pasian, F.; Patanchon, G.; Pearson, T. J.; Perdereau, O.; Perotto, L.; Perrotta, F.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Popa, L.; Pratt, G. W.; Prézeau, G.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Rosset, C.; Rossetti, M.; Rotti, A.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Seiffert, M. D.; Shellard, E. P. S.; Souradeep, T.; Spencer, L. D.; Stolyarov, V.; Stompor, R.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Trombetti, T.; Tucci, M.; Tuovinen, J.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; Wehus, I. K.; Yvon, D.; Zacchei, A.; Zibin, J. P.; Zonca, A.
2016-09-01
We test the statistical isotropy and Gaussianity of the cosmic microwave background (CMB) anisotropies using observations made by the Planck satellite. Our results are based mainly on the full Planck mission for temperature, but also include some polarization measurements. In particular, we consider the CMB anisotropy maps derived from the multi-frequency Planck data by several component-separation methods. For the temperature anisotropies, we find excellent agreement between results based on these sky maps over both a very large fraction of the sky and a broad range of angular scales, establishing that potential foreground residuals do not affect our studies. Tests of skewness, kurtosis, multi-normality, N-point functions, and Minkowski functionals indicate consistency with Gaussianity, while a power deficit at large angular scales is manifested in several ways, for example low map variance. The results of a peak statistics analysis are consistent with the expectations of a Gaussian random field. The "Cold Spot" is detected with several methods, including map kurtosis, peak statistics, and mean temperature profile. We thoroughly probe the large-scale dipolar power asymmetry, detecting it with several independent tests, and address the subject of a posteriori correction. Tests of directionality suggest the presence of angular clustering from large to small scales, but at a significance that is dependent on the details of the approach. We perform the first examination of polarization data, finding the morphology of stacked peaks to be consistent with the expectations of statistically isotropic simulations. Where they overlap, these results are consistent with the Planck 2013 analysis based on the nominal mission data and provide our most thorough view of the statistics of the CMB fluctuations to date.
Planck 2015 results: XVI. Isotropy and statistics of the CMB
Ade, P. A. R.; Aghanim, N.; Akrami, Y.; ...
2016-09-20
In this paper, we test the statistical isotropy and Gaussianity of the cosmic microwave background (CMB) anisotropies using observations made by the Planck satellite. Our results are based mainly on the full Planck mission for temperature, but also include some polarization measurements. In particular, we consider the CMB anisotropy maps derived from the multi-frequency Planck data by several component-separation methods. For the temperature anisotropies, we find excellent agreement between results based on these sky maps over both a very large fraction of the sky and a broad range of angular scales, establishing that potential foreground residuals do not affect ourmore » studies. Tests of skewness, kurtosis, multi-normality, N-point functions, and Minkowski functionals indicate consistency with Gaussianity, while a power deficit at large angular scales is manifested in several ways, for example low map variance. The results of a peak statistics analysis are consistent with the expectations of a Gaussian random field. The “Cold Spot” is detected with several methods, including map kurtosis, peak statistics, and mean temperature profile. We thoroughly probe the large-scale dipolar power asymmetry, detecting it with several independent tests, and address the subject of a posteriori correction. Tests of directionality suggest the presence of angular clustering from large to small scales, but at a significance that is dependent on the details of the approach. We perform the first examination of polarization data, finding the morphology of stacked peaks to be consistent with the expectations of statistically isotropic simulations. Finally, where they overlap, these results are consistent with the Planck 2013 analysis based on the nominal mission data and provide our most thorough view of the statistics of the CMB fluctuations to date.« less
Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B
2013-03-23
Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ade, P. A. R.; Aghanim, N.; Akrami, Y.
In this paper, we test the statistical isotropy and Gaussianity of the cosmic microwave background (CMB) anisotropies using observations made by the Planck satellite. Our results are based mainly on the full Planck mission for temperature, but also include some polarization measurements. In particular, we consider the CMB anisotropy maps derived from the multi-frequency Planck data by several component-separation methods. For the temperature anisotropies, we find excellent agreement between results based on these sky maps over both a very large fraction of the sky and a broad range of angular scales, establishing that potential foreground residuals do not affect ourmore » studies. Tests of skewness, kurtosis, multi-normality, N-point functions, and Minkowski functionals indicate consistency with Gaussianity, while a power deficit at large angular scales is manifested in several ways, for example low map variance. The results of a peak statistics analysis are consistent with the expectations of a Gaussian random field. The “Cold Spot” is detected with several methods, including map kurtosis, peak statistics, and mean temperature profile. We thoroughly probe the large-scale dipolar power asymmetry, detecting it with several independent tests, and address the subject of a posteriori correction. Tests of directionality suggest the presence of angular clustering from large to small scales, but at a significance that is dependent on the details of the approach. We perform the first examination of polarization data, finding the morphology of stacked peaks to be consistent with the expectations of statistically isotropic simulations. Finally, where they overlap, these results are consistent with the Planck 2013 analysis based on the nominal mission data and provide our most thorough view of the statistics of the CMB fluctuations to date.« less
Field-aligned currents' scale analysis performed with the Swarm constellation
NASA Astrophysics Data System (ADS)
Lühr, Hermann; Park, Jaeheung; Gjerloev, Jesper W.; Rauberg, Jan; Michaelis, Ingo; Merayo, Jose M. G.; Brauer, Peter
2015-01-01
We present a statistical study of the temporal- and spatial-scale characteristics of different field-aligned current (FAC) types derived with the Swarm satellite formation. We divide FACs into two classes: small-scale, up to some 10 km, which are carried predominantly by kinetic Alfvén waves, and large-scale FACs with sizes of more than 150 km. For determining temporal variability we consider measurements at the same point, the orbital crossovers near the poles, but at different times. From correlation analysis we obtain a persistent period of small-scale FACs of order 10 s, while large-scale FACs can be regarded stationary for more than 60 s. For the first time we investigate the longitudinal scales. Large-scale FACs are different on dayside and nightside. On the nightside the longitudinal extension is on average 4 times the latitudinal width, while on the dayside, particularly in the cusp region, latitudinal and longitudinal scales are comparable.
Avalanche Statistics Identify Intrinsic Stellar Processes near Criticality in KIC 8462852
NASA Astrophysics Data System (ADS)
Sheikh, Mohammed A.; Weaver, Richard L.; Dahmen, Karin A.
2016-12-01
The star KIC8462852 (Tabby's star) has shown anomalous drops in light flux. We perform a statistical analysis of the more numerous smaller dimming events by using methods found useful for avalanches in ferromagnetism and plastic flow. Scaling exponents for avalanche statistics and temporal profiles of the flux during the dimming events are close to mean field predictions. Scaling collapses suggest that this star may be near a nonequilibrium critical point. The large events are interpreted as avalanches marked by modified dynamics, limited by the system size, and not within the scaling regime.
Statistical analysis of kinetic energy entrainment in a model wind turbine array boundary layer
NASA Astrophysics Data System (ADS)
Cal, Raul Bayoan; Hamilton, Nicholas; Kang, Hyung-Suk; Meneveau, Charles
2012-11-01
For large wind farms, kinetic energy must be entrained from the flow above the wind turbines to replenish wakes and enable power extraction in the array. Various statistical features of turbulence causing vertical entrainment of mean-flow kinetic energy are studied using hot-wire velocimetry data taken in a model wind farm in a scaled wind tunnel experiment. Conditional statistics and spectral decompositions are employed to characterize the most relevant turbulent flow structures and determine their length-scales. Sweep and ejection events are shown to be the largest contributors to the vertical kinetic energy flux, although their relative contribution depends upon the location in the wake. Sweeps are shown to be dominant in the region above the wind turbine array. A spectral analysis of the data shows that large scales of the flow, about the size of the rotor diameter in length or larger, dominate the vertical entrainment. The flow is more incoherent below the array, causing decreased vertical fluxes there. The results show that improving the rate of vertical kinetic energy entrainment into wind turbine arrays is a standing challenge and would require modifying the large-scale structures of the flow. This work was funded in part by the National Science Foundation (CBET-0730922, CBET-1133800 and CBET-0953053).
Modified Distribution-Free Goodness-of-Fit Test Statistic.
Chun, So Yeon; Browne, Michael W; Shapiro, Alexander
2018-03-01
Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.
NASA Astrophysics Data System (ADS)
Lamb, Derek A.
2016-10-01
While sunspots follow a well-defined pattern of emergence in space and time, small-scale flux emergence is assumed to occur randomly at all times in the quiet Sun. HMI's full-disk coverage, high cadence, spatial resolution, and duty cycle allow us to probe that basic assumption. Some case studies of emergence suggest that temporal clustering on spatial scales of 50-150 Mm may occur. If clustering is present, it could serve as a diagnostic of large-scale subsurface magnetic field structures. We present the results of a manual survey of small-scale flux emergence events over a short time period, and a statistical analysis addressing the question of whether these events show spatio-temporal behavior that is anything other than random.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rosa, B., E-mail: bogdan.rosa@imgw.pl; Parishani, H.; Department of Earth System Science, University of California, Irvine, California 92697-3100
2015-01-15
In this paper, we study systematically the effects of forcing time scale in the large-scale stochastic forcing scheme of Eswaran and Pope [“An examination of forcing in direct numerical simulations of turbulence,” Comput. Fluids 16, 257 (1988)] on the simulated flow structures and statistics of forced turbulence. Using direct numerical simulations, we find that the forcing time scale affects the flow dissipation rate and flow Reynolds number. Other flow statistics can be predicted using the altered flow dissipation rate and flow Reynolds number, except when the forcing time scale is made unrealistically large to yield a Taylor microscale flow Reynoldsmore » number of 30 and less. We then study the effects of forcing time scale on the kinematic collision statistics of inertial particles. We show that the radial distribution function and the radial relative velocity may depend on the forcing time scale when it becomes comparable to the eddy turnover time. This dependence, however, can be largely explained in terms of altered flow Reynolds number and the changing range of flow length scales present in the turbulent flow. We argue that removing this dependence is important when studying the Reynolds number dependence of the turbulent collision statistics. The results are also compared to those based on a deterministic forcing scheme to better understand the role of large-scale forcing, relative to that of the small-scale turbulence, on turbulent collision of inertial particles. To further elucidate the correlation between the altered flow structures and dynamics of inertial particles, a conditional analysis has been performed, showing that the regions of higher collision rate of inertial particles are well correlated with the regions of lower vorticity. Regions of higher concentration of pairs at contact are found to be highly correlated with the region of high energy dissipation rate.« less
Large-angle correlations in the cosmic microwave background
NASA Astrophysics Data System (ADS)
Efstathiou, George; Ma, Yin-Zhe; Hanson, Duncan
2010-10-01
It has been argued recently by Copi et al. 2009 that the lack of large angular correlations of the CMB temperature field provides strong evidence against the standard, statistically isotropic, inflationary Lambda cold dark matter (ΛCDM) cosmology. We compare various estimators of the temperature correlation function showing how they depend on assumptions of statistical isotropy and how they perform on the Wilkinson Microwave Anisotropy Probe (WMAP) 5-yr Internal Linear Combination (ILC) maps with and without a sky cut. We show that the low multipole harmonics that determine the large-scale features of the temperature correlation function can be reconstructed accurately from the data that lie outside the sky cuts. The reconstructions are only weakly dependent on the assumed statistical properties of the temperature field. The temperature correlation functions computed from these reconstructions are in good agreement with those computed from the ILC map over the whole sky. We conclude that the large-scale angular correlation function for our realization of the sky is well determined. A Bayesian analysis of the large-scale correlations is presented, which shows that the data cannot exclude the standard ΛCDM model. We discuss the differences between our results and those of Copi et al. Either there exists a violation of statistical isotropy as claimed by Copi et al., or these authors have overestimated the significance of the discrepancy because of a posteriori choices of estimator, statistic and sky cut.
NASA Astrophysics Data System (ADS)
Massei, Nicolas; Dieppois, Bastien; Fritier, Nicolas; Laignel, Benoit; Debret, Maxime; Lavers, David; Hannah, David
2015-04-01
In the present context of global changes, considerable efforts have been deployed by the hydrological scientific community to improve our understanding of the impacts of climate fluctuations on water resources. Both observational and modeling studies have been extensively employed to characterize hydrological changes and trends, assess the impact of climate variability or provide future scenarios of water resources. In the aim of a better understanding of hydrological changes, it is of crucial importance to determine how and to what extent trends and long-term oscillations detectable in hydrological variables are linked to global climate oscillations. In this work, we develop an approach associating large-scale/local-scale correlation, enmpirical statistical downscaling and wavelet multiresolution decomposition of monthly precipitation and streamflow over the Seine river watershed, and the North Atlantic sea level pressure (SLP) in order to gain additional insights on the atmospheric patterns associated with the regional hydrology. We hypothesized that: i) atmospheric patterns may change according to the different temporal wavelengths defining the variability of the signals; and ii) definition of those hydrological/circulation relationships for each temporal wavelength may improve the determination of large-scale predictors of local variations. The results showed that the large-scale/local-scale links were not necessarily constant according to time-scale (i.e. for the different frequencies characterizing the signals), resulting in changing spatial patterns across scales. This was then taken into account by developing an empirical statistical downscaling (ESD) modeling approach which integrated discrete wavelet multiresolution analysis for reconstructing local hydrometeorological processes (predictand : precipitation and streamflow on the Seine river catchment) based on a large-scale predictor (SLP over the Euro-Atlantic sector) on a monthly time-step. This approach basically consisted in 1- decomposing both signals (SLP field and precipitation or streamflow) using discrete wavelet multiresolution analysis and synthesis, 2- generating one statistical downscaling model per time-scale, 3- summing up all scale-dependent models in order to obtain a final reconstruction of the predictand. The results obtained revealed a significant improvement of the reconstructions for both precipitation and streamflow when using the multiresolution ESD model instead of basic ESD ; in addition, the scale-dependent spatial patterns associated to the model matched quite well those obtained from scale-dependent composite analysis. In particular, the multiresolution ESD model handled very well the significant changes in variance through time observed in either prepciptation or streamflow. For instance, the post-1980 period, which had been characterized by particularly high amplitudes in interannual-to-interdecadal variability associated with flood and extremely low-flow/drought periods (e.g., winter 2001, summer 2003), could not be reconstructed without integrating wavelet multiresolution analysis into the model. Further investigations would be required to address the issue of the stationarity of the large-scale/local-scale relationships and to test the capability of the multiresolution ESD model for interannual-to-interdecadal forecasting. In terms of methodological approach, further investigations may concern a fully comprehensive sensitivity analysis of the modeling to the parameter of the multiresolution approach (different families of scaling and wavelet functions used, number of coefficients/degree of smoothness, etc.).
Bremer, Peer-Timo; Weber, Gunther; Tierny, Julien; Pascucci, Valerio; Day, Marcus S; Bell, John B
2011-09-01
Large-scale simulations are increasingly being used to study complex scientific and engineering phenomena. As a result, advanced visualization and data analysis are also becoming an integral part of the scientific process. Often, a key step in extracting insight from these large simulations involves the definition, extraction, and evaluation of features in the space and time coordinates of the solution. However, in many applications, these features involve a range of parameters and decisions that will affect the quality and direction of the analysis. Examples include particular level sets of a specific scalar field, or local inequalities between derived quantities. A critical step in the analysis is to understand how these arbitrary parameters/decisions impact the statistical properties of the features, since such a characterization will help to evaluate the conclusions of the analysis as a whole. We present a new topological framework that in a single-pass extracts and encodes entire families of possible features definitions as well as their statistical properties. For each time step we construct a hierarchical merge tree a highly compact, yet flexible feature representation. While this data structure is more than two orders of magnitude smaller than the raw simulation data it allows us to extract a set of features for any given parameter selection in a postprocessing step. Furthermore, we augment the trees with additional attributes making it possible to gather a large number of useful global, local, as well as conditional statistic that would otherwise be extremely difficult to compile. We also use this representation to create tracking graphs that describe the temporal evolution of the features over time. Our system provides a linked-view interface to explore the time-evolution of the graph interactively alongside the segmentation, thus making it possible to perform extensive data analysis in a very efficient manner. We demonstrate our framework by extracting and analyzing burning cells from a large-scale turbulent combustion simulation. In particular, we show how the statistical analysis enabled by our techniques provides new insight into the combustion process.
Snell, Kym Ie; Ensor, Joie; Debray, Thomas Pa; Moons, Karel Gm; Riley, Richard D
2017-01-01
If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.
Robust Detection of Examinees with Aberrant Answer Changes
ERIC Educational Resources Information Center
Belov, Dmitry I.
2015-01-01
The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at testing organizations. However, AC data has an uncertainty caused by technological or human factors. Therefore, existing statistics (e.g., number of wrong-to-right ACs) used to detect examinees…
Statistical Models for the Analysis of Zero-Inflated Pain Intensity Numeric Rating Scale Data.
Goulet, Joseph L; Buta, Eugenia; Bathulapalli, Harini; Gueorguieva, Ralitza; Brandt, Cynthia A
2017-03-01
Pain intensity is often measured in clinical and research settings using the 0 to 10 numeric rating scale (NRS). NRS scores are recorded as discrete values, and in some samples they may display a high proportion of zeroes and a right-skewed distribution. Despite this, statistical methods for normally distributed data are frequently used in the analysis of NRS data. We present results from an observational cross-sectional study examining the association of NRS scores with patient characteristics using data collected from a large cohort of 18,935 veterans in Department of Veterans Affairs care diagnosed with a potentially painful musculoskeletal disorder. The mean (variance) NRS pain was 3.0 (7.5), and 34% of patients reported no pain (NRS = 0). We compared the following statistical models for analyzing NRS scores: linear regression, generalized linear models (Poisson and negative binomial), zero-inflated and hurdle models for data with an excess of zeroes, and a cumulative logit model for ordinal data. We examined model fit, interpretability of results, and whether conclusions about the predictor effects changed across models. In this study, models that accommodate zero inflation provided a better fit than the other models. These models should be considered for the analysis of NRS data with a large proportion of zeroes. We examined and analyzed pain data from a large cohort of veterans with musculoskeletal disorders. We found that many reported no current pain on the NRS on the diagnosis date. We present several alternative statistical methods for the analysis of pain intensity data with a large proportion of zeroes. Published by Elsevier Inc.
Advantages of Social Network Analysis in Educational Research
ERIC Educational Resources Information Center
Ushakov, K. M.; Kukso, K. N.
2015-01-01
Currently one of the main tools for the large scale studies of schools is statistical analysis. Although it is the most common method and it offers greatest opportunities for analysis, there are other quantitative methods for studying schools, such as network analysis. We discuss the potential advantages that network analysis has for educational…
Large-scale gene function analysis with the PANTHER classification system.
Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D
2013-08-01
The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.
NASA Astrophysics Data System (ADS)
Rowlands, G.; Kiyani, K. H.; Chapman, S. C.; Watkins, N. W.
2009-12-01
Quantitative analysis of solar wind fluctuations are often performed in the context of intermittent turbulence and center around methods to quantify statistical scaling, such as power spectra and structure functions which assume a stationary process. The solar wind exhibits large scale secular changes and so the question arises as to whether the timeseries of the fluctuations is non-stationary. One approach is to seek a local stationarity by parsing the time interval over which statistical analysis is performed. Hence, natural systems such as the solar wind unavoidably provide observations over restricted intervals. Consequently, due to a reduction of sample size leading to poorer estimates, a stationary stochastic process (time series) can yield anomalous time variation in the scaling exponents, suggestive of nonstationarity. The variance in the estimates of scaling exponents computed from an interval of N observations is known for finite variance processes to vary as ~1/N as N becomes large for certain statistical estimators; however, the convergence to this behavior will depend on the details of the process, and may be slow. We study the variation in the scaling of second-order moments of the time-series increments with N for a variety of synthetic and “real world” time series, and we find that in particular for heavy tailed processes, for realizable N, one is far from this ~1/N limiting behavior. We propose a semiempirical estimate for the minimum N needed to make a meaningful estimate of the scaling exponents for model stochastic processes and compare these with some “real world” time series from the solar wind. With fewer datapoints the stationary timeseries becomes indistinguishable from a nonstationary process and we illustrate this with nonstationary synthetic datasets. Reference article: K. H. Kiyani, S. C. Chapman and N. W. Watkins, Phys. Rev. E 79, 036109 (2009).
Wang, Lu-Yong; Fasulo, D
2006-01-01
Genome-wide association study for complex diseases will generate massive amount of single nucleotide polymorphisms (SNPs) data. Univariate statistical test (i.e. Fisher exact test) was used to single out non-associated SNPs. However, the disease-susceptible SNPs may have little marginal effects in population and are unlikely to retain after the univariate tests. Also, model-based methods are impractical for large-scale dataset. Moreover, genetic heterogeneity makes the traditional methods harder to identify the genetic causes of diseases. A more recent random forest method provides a more robust method for screening the SNPs in thousands scale. However, for more large-scale data, i.e., Affymetrix Human Mapping 100K GeneChip data, a faster screening method is required to screening SNPs in whole-genome large scale association analysis with genetic heterogeneity. We propose a boosting-based method for rapid screening in large-scale analysis of complex traits in the presence of genetic heterogeneity. It provides a relatively fast and fairly good tool for screening and limiting the candidate SNPs for further more complex computational modeling task.
Robust regression for large-scale neuroimaging studies.
Fritsch, Virgile; Da Mota, Benoit; Loth, Eva; Varoquaux, Gaël; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Brühl, Rüdiger; Butzek, Brigitte; Conrod, Patricia; Flor, Herta; Garavan, Hugh; Lemaitre, Hervé; Mann, Karl; Nees, Frauke; Paus, Tomas; Schad, Daniel J; Schümann, Gunter; Frouin, Vincent; Poline, Jean-Baptiste; Thirion, Bertrand
2015-05-01
Multi-subject datasets used in neuroimaging group studies have a complex structure, as they exhibit non-stationary statistical properties across regions and display various artifacts. While studies with small sample sizes can rarely be shown to deviate from standard hypotheses (such as the normality of the residuals) due to the poor sensitivity of normality tests with low degrees of freedom, large-scale studies (e.g. >100 subjects) exhibit more obvious deviations from these hypotheses and call for more refined models for statistical inference. Here, we demonstrate the benefits of robust regression as a tool for analyzing large neuroimaging cohorts. First, we use an analytic test based on robust parameter estimates; based on simulations, this procedure is shown to provide an accurate statistical control without resorting to permutations. Second, we show that robust regression yields more detections than standard algorithms using as an example an imaging genetics study with 392 subjects. Third, we show that robust regression can avoid false positives in a large-scale analysis of brain-behavior relationships with over 1500 subjects. Finally we embed robust regression in the Randomized Parcellation Based Inference (RPBI) method and demonstrate that this combination further improves the sensitivity of tests carried out across the whole brain. Altogether, our results show that robust procedures provide important advantages in large-scale neuroimaging group studies. Copyright © 2015 Elsevier Inc. All rights reserved.
Low energy peripheral scaling in nucleon-nucleon scattering and uncertainty quantification
NASA Astrophysics Data System (ADS)
Ruiz Simo, I.; Amaro, J. E.; Ruiz Arriola, E.; Navarro Pérez, R.
2018-03-01
We analyze the peripheral structure of the nucleon-nucleon interaction for LAB energies below 350 MeV. To this end we transform the scattering matrix into the impact parameter representation by analyzing the scaled phase shifts (L + 1/2) δ JLS (p) and the scaled mixing parameters (L + 1/2)ɛ JLS (p) in terms of the impact parameter b = (L + 1/2)/p. According to the eikonal approximation, at large angular momentum L these functions should become an universal function of b, independent on L. This allows to discuss in a rather transparent way the role of statistical and systematic uncertainties in the different long range components of the two-body potential. Implications for peripheral waves obtained in chiral perturbation theory interactions to fifth order (N5LO) or from the large body of NN data considered in the SAID partial wave analysis are also drawn from comparing them with other phenomenological high-quality interactions, constructed to fit scattering data as well. We find that both N5LO and SAID peripheral waves disagree more than 5σ with the Granada-2013 statistical analysis, more than 2σ with the 6 statistically equivalent potentials fitting the Granada-2013 database and about 1σ with the historical set of 13 high-quality potentials developed since the 1993 Nijmegen analysis.
A Large-Scale Analysis of Variance in Written Language
ERIC Educational Resources Information Center
Johns, Brendan T.; Jamieson, Randall K.
2018-01-01
The collection of very large text sources has revolutionized the study of natural language, leading to the development of several models of language learning and distributional semantics that extract sophisticated semantic representations of words based on the statistical redundancies contained within natural language (e.g., Griffiths, Steyvers,…
Introduction to bioinformatics.
Can, Tolga
2014-01-01
Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.
HiQuant: Rapid Postquantification Analysis of Large-Scale MS-Generated Proteomics Data.
Bryan, Kenneth; Jarboui, Mohamed-Ali; Raso, Cinzia; Bernal-Llinares, Manuel; McCann, Brendan; Rauch, Jens; Boldt, Karsten; Lynn, David J
2016-06-03
Recent advances in mass-spectrometry-based proteomics are now facilitating ambitious large-scale investigations of the spatial and temporal dynamics of the proteome; however, the increasing size and complexity of these data sets is overwhelming current downstream computational methods, specifically those that support the postquantification analysis pipeline. Here we present HiQuant, a novel application that enables the design and execution of a postquantification workflow, including common data-processing steps, such as assay normalization and grouping, and experimental replicate quality control and statistical analysis. HiQuant also enables the interpretation of results generated from large-scale data sets by supporting interactive heatmap analysis and also the direct export to Cytoscape and Gephi, two leading network analysis platforms. HiQuant may be run via a user-friendly graphical interface and also supports complete one-touch automation via a command-line mode. We evaluate HiQuant's performance by analyzing a large-scale, complex interactome mapping data set and demonstrate a 200-fold improvement in the execution time over current methods. We also demonstrate HiQuant's general utility by analyzing proteome-wide quantification data generated from both a large-scale public tyrosine kinase siRNA knock-down study and an in-house investigation into the temporal dynamics of the KSR1 and KSR2 interactomes. Download HiQuant, sample data sets, and supporting documentation at http://hiquant.primesdb.eu .
NASA Astrophysics Data System (ADS)
Jiang, H.; Lin, T.
2017-12-01
Rain-fed corn production systems are subject to sub-seasonal variations of precipitation and temperature during the growing season. As each growth phase has varied inherent physiological process, plants necessitate different optimal environmental conditions during each phase. However, this temporal heterogeneity towards climate variability alongside the lifecycle of crops is often simplified and fixed as constant responses in large scale statistical modeling analysis. To capture the time-variant growing requirements in large scale statistical analysis, we develop and compare statistical models at various spatial and temporal resolutions to quantify the relationship between corn yield and weather factors for 12 corn belt states from 1981 to 2016. The study compares three spatial resolutions (county, agricultural district, and state scale) and three temporal resolutions (crop growth phase, monthly, and growing season) to characterize the effects of spatial and temporal variability. Our results show that the agricultural district model together with growth phase resolution can explain 52% variations of corn yield caused by temperature and precipitation variability. It provides a practical model structure balancing the overfitting problem in county specific model and weak explanation power in state specific model. In US corn belt, precipitation has positive impact on corn yield in growing season except for vegetative stage while extreme heat attains highest sensitivity from silking to dough phase. The results show the northern counties in corn belt area are less interfered by extreme heat but are more vulnerable to water deficiency.
Houts, Carrie R; Edwards, Michael C; Wirth, R J; Deal, Linda S
2016-11-01
There has been a notable increase in the advocacy of using small-sample designs as an initial quantitative assessment of item and scale performance during the scale development process. This is particularly true in the development of clinical outcome assessments (COAs), where Rasch analysis has been advanced as an appropriate statistical tool for evaluating the developing COAs using a small sample. We review the benefits such methods are purported to offer from both a practical and statistical standpoint and detail several problematic areas, including both practical and statistical theory concerns, with respect to the use of quantitative methods, including Rasch-consistent methods, with small samples. The feasibility of obtaining accurate information and the potential negative impacts of misusing large-sample statistical methods with small samples during COA development are discussed.
Mackey, Aaron J; Pearson, William R
2004-10-01
Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.
Scaling laws and fluctuations in the statistics of word frequencies
NASA Astrophysics Data System (ADS)
Gerlach, Martin; Altmann, Eduardo G.
2014-11-01
In this paper, we combine statistical analysis of written texts and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies. The average vocabulary of an ensemble of fixed-length texts is known to scale sublinearly with the total number of words (Heaps’ law). Analyzing the fluctuations around this average in three large databases (Google-ngram, English Wikipedia, and a collection of scientific articles), we find that the standard deviation scales linearly with the average (Taylor's law), in contrast to the prediction of decaying fluctuations obtained using simple sampling arguments. We explain both scaling laws (Heaps’ and Taylor) by modeling the usage of words using a Poisson process with a fat-tailed distribution of word frequencies (Zipf's law) and topic-dependent frequencies of individual words (as in topic models). Considering topical variations lead to quenched averages, turn the vocabulary size a non-self-averaging quantity, and explain the empirical observations. For the numerous practical applications relying on estimations of vocabulary size, our results show that uncertainties remain large even for long texts. We show how to account for these uncertainties in measurements of lexical richness of texts with different lengths.
Statistical Surrogate Modeling of Atmospheric Dispersion Events Using Bayesian Adaptive Splines
NASA Astrophysics Data System (ADS)
Francom, D.; Sansó, B.; Bulaevskaya, V.; Lucas, D. D.
2016-12-01
Uncertainty in the inputs of complex computer models, including atmospheric dispersion and transport codes, is often assessed via statistical surrogate models. Surrogate models are computationally efficient statistical approximations of expensive computer models that enable uncertainty analysis. We introduce Bayesian adaptive spline methods for producing surrogate models that capture the major spatiotemporal patterns of the parent model, while satisfying all the necessities of flexibility, accuracy and computational feasibility. We present novel methodological and computational approaches motivated by a controlled atmospheric tracer release experiment conducted at the Diablo Canyon nuclear power plant in California. Traditional methods for building statistical surrogate models often do not scale well to experiments with large amounts of data. Our approach is well suited to experiments involving large numbers of model inputs, large numbers of simulations, and functional output for each simulation. Our approach allows us to perform global sensitivity analysis with ease. We also present an approach to calibration of simulators using field data.
He, W; Zhao, S; Liu, X; Dong, S; Lv, J; Liu, D; Wang, J; Meng, Z
2013-12-04
Large-scale next-generation sequencing (NGS)-based resequencing detects sequence variations, constructs evolutionary histories, and identifies phenotype-related genotypes. However, NGS-based resequencing studies generate extraordinarily large amounts of data, making computations difficult. Effective use and analysis of these data for NGS-based resequencing studies remains a difficult task for individual researchers. Here, we introduce ReSeqTools, a full-featured toolkit for NGS (Illumina sequencing)-based resequencing analysis, which processes raw data, interprets mapping results, and identifies and annotates sequence variations. ReSeqTools provides abundant scalable functions for routine resequencing analysis in different modules to facilitate customization of the analysis pipeline. ReSeqTools is designed to use compressed data files as input or output to save storage space and facilitates faster and more computationally efficient large-scale resequencing studies in a user-friendly manner. It offers abundant practical functions and generates useful statistics during the analysis pipeline, which significantly simplifies resequencing analysis. Its integrated algorithms and abundant sub-functions provide a solid foundation for special demands in resequencing projects. Users can combine these functions to construct their own pipelines for other purposes.
Effects of Large-Scale Solar Installations on Dust Mobilization and Air Quality
NASA Astrophysics Data System (ADS)
Pratt, J. T.; Singh, D.; Diffenbaugh, N. S.
2012-12-01
Large-scale solar projects are increasingly being developed worldwide and many of these installations are located in arid, desert regions. To examine the effects of these projects on regional dust mobilization and air quality, we analyze aerosol product data from NASA's Multi-angle Imaging Spectroradiometer (MISR) at annual and seasonal time intervals near fifteen photovoltaic and solar thermal stations ranging from 5-200 MW (12-4,942 acres) in size. The stations are distributed over eight different countries and were chosen based on size, location and installation date; most of the installations are large-scale, took place in desert climates and were installed between 2006 and 2010. We also consider air quality measurements of particulate matter between 2.5 and 10 micrometers (PM10) from the Environmental Protection Agency (EPA) monitoring sites near and downwind from the project installations in the U.S. We use monthly wind data from the NOAA's National Center for Atmospheric Prediction (NCEP) Global Reanalysis to select the stations downwind from the installations, and then perform statistical analysis on the data to identify any significant changes in these quantities. We find that fourteen of the fifteen regions have lower aerosol product after the start of the installations as well as all six PM10 monitoring stations showing lower particulate matter measurements after construction commenced. Results fail to show any statistically significant differences in aerosol optical index or PM10 measurements before and after the large-scale solar installations. However, many of the large installations are very recent, and there is insufficient data to fully understand the long-term effects on air quality. More data and higher resolution analysis is necessary to better understand the relationship between large-scale solar, dust and air quality.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre
Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy,more » and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics (which we discussed in [1]) where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel. We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.« less
Statistical population based estimates of water ingestion play a vital role in many types of exposure and risk analysis. A significant large scale analysis of water ingestion by the population of the United States was recently completed and is documented in the report titled ...
Appplication of statistical mechanical methods to the modeling of social networks
NASA Astrophysics Data System (ADS)
Strathman, Anthony Robert
With the recent availability of large-scale social data sets, social networks have become open to quantitative analysis via the methods of statistical physics. We examine the statistical properties of a real large-scale social network, generated from cellular phone call-trace logs. We find this network, like many other social networks to be assortative (r = 0.31) and clustered (i.e., strongly transitive, C = 0.21). We measure fluctuation scaling to identify the presence of internal structure in the network and find that structural inhomogeneity effectively disappears at the scale of a few hundred nodes, though there is no sharp cutoff. We introduce an agent-based model of social behavior, designed to model the formation and dissolution of social ties. The model is a modified Metropolis algorithm containing agents operating under the basic sociological constraints of reciprocity, communication need and transitivity. The model introduces the concept of a social temperature. We go on to show that this simple model reproduces the global statistical network features (incl. assortativity, connected fraction, mean degree, clustering, and mean shortest path length) of the real network data and undergoes two phase transitions, one being from a "gas" to a "liquid" state and the second from a liquid to a glassy state as function of this social temperature.
NASA Astrophysics Data System (ADS)
Massei, N.; Dieppois, B.; Hannah, D. M.; Lavers, D. A.; Fossa, M.; Laignel, B.; Debret, M.
2017-03-01
In the present context of global changes, considerable efforts have been deployed by the hydrological scientific community to improve our understanding of the impacts of climate fluctuations on water resources. Both observational and modeling studies have been extensively employed to characterize hydrological changes and trends, assess the impact of climate variability or provide future scenarios of water resources. In the aim of a better understanding of hydrological changes, it is of crucial importance to determine how and to what extent trends and long-term oscillations detectable in hydrological variables are linked to global climate oscillations. In this work, we develop an approach associating correlation between large and local scales, empirical statistical downscaling and wavelet multiresolution decomposition of monthly precipitation and streamflow over the Seine river watershed, and the North Atlantic sea level pressure (SLP) in order to gain additional insights on the atmospheric patterns associated with the regional hydrology. We hypothesized that: (i) atmospheric patterns may change according to the different temporal wavelengths defining the variability of the signals; and (ii) definition of those hydrological/circulation relationships for each temporal wavelength may improve the determination of large-scale predictors of local variations. The results showed that the links between large and local scales were not necessarily constant according to time-scale (i.e. for the different frequencies characterizing the signals), resulting in changing spatial patterns across scales. This was then taken into account by developing an empirical statistical downscaling (ESD) modeling approach, which integrated discrete wavelet multiresolution analysis for reconstructing monthly regional hydrometeorological processes (predictand: precipitation and streamflow on the Seine river catchment) based on a large-scale predictor (SLP over the Euro-Atlantic sector). This approach basically consisted in three steps: 1 - decomposing large-scale climate and hydrological signals (SLP field, precipitation or streamflow) using discrete wavelet multiresolution analysis, 2 - generating a statistical downscaling model per time-scale, 3 - summing up all scale-dependent models in order to obtain a final reconstruction of the predictand. The results obtained revealed a significant improvement of the reconstructions for both precipitation and streamflow when using the multiresolution ESD model instead of basic ESD. In particular, the multiresolution ESD model handled very well the significant changes in variance through time observed in either precipitation or streamflow. For instance, the post-1980 period, which had been characterized by particularly high amplitudes in interannual-to-interdecadal variability associated with alternating flood and extremely low-flow/drought periods (e.g., winter/spring 2001, summer 2003), could not be reconstructed without integrating wavelet multiresolution analysis into the model. In accordance with previous studies, the wavelet components detected in SLP, precipitation and streamflow on interannual to interdecadal time-scales could be interpreted in terms of influence of the Gulf-Stream oceanic front on atmospheric circulation.
Data-Mining Techniques in Detecting Factors Linked to Academic Achievement
ERIC Educational Resources Information Center
Martínez Abad, Fernando; Chaparro Caso López, Alicia A.
2017-01-01
In light of the emergence of statistical analysis techniques based on data mining in education sciences, and the potential they offer to detect non-trivial information in large databases, this paper presents a procedure used to detect factors linked to academic achievement in large-scale assessments. The study is based on a non-experimental,…
Quality Assessments of Long-Term Quantitative Proteomic Analysis of Breast Cancer Xenograft Tissues
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Jian-Ying; Chen, Lijun; Zhang, Bai
The identification of protein biomarkers requires large-scale analysis of human specimens to achieve statistical significance. In this study, we evaluated the long-term reproducibility of an iTRAQ (isobaric tags for relative and absolute quantification) based quantitative proteomics strategy using one channel for universal normalization across all samples. A total of 307 liquid chromatography tandem mass spectrometric (LC-MS/MS) analyses were completed, generating 107 one-dimensional (1D) LC-MS/MS datasets and 8 offline two-dimensional (2D) LC-MS/MS datasets (25 fractions for each set) for human-in-mouse breast cancer xenograft tissues representative of basal and luminal subtypes. Such large-scale studies require the implementation of robust metrics to assessmore » the contributions of technical and biological variability in the qualitative and quantitative data. Accordingly, we developed a quantification confidence score based on the quality of each peptide-spectrum match (PSM) to remove quantification outliers from each analysis. After combining confidence score filtering and statistical analysis, reproducible protein identification and quantitative results were achieved from LC-MS/MS datasets collected over a 16 month period.« less
Environmental Studies: Mathematical, Computational and Statistical Analyses
1993-03-03
mathematical analysis addresses the seasonally and longitudinally averaged circulation which is under the influence of a steady forcing located asymmetrically...employed, as has been suggested for some situations. A general discussion of how interfacial phenomena influence both the original contamination process...describing the large-scale advective and dispersive behaviour of contaminants transported by groundwater and the uncertainty associated with field-scale
Volatility return intervals analysis of the Japanese market
NASA Astrophysics Data System (ADS)
Jung, W.-S.; Wang, F. Z.; Havlin, S.; Kaizoji, T.; Moon, H.-T.; Stanley, H. E.
2008-03-01
We investigate scaling and memory effects in return intervals between price volatilities above a certain threshold q for the Japanese stock market using daily and intraday data sets. We find that the distribution of return intervals can be approximated by a scaling function that depends only on the ratio between the return interval τ and its mean <τ>. We also find memory effects such that a large (or small) return interval follows a large (or small) interval by investigating the conditional distribution and mean return interval. The results are similar to previous studies of other markets and indicate that similar statistical features appear in different financial markets. We also compare our results between the period before and after the big crash at the end of 1989. We find that scaling and memory effects of the return intervals show similar features although the statistical properties of the returns are different.
ERIC Educational Resources Information Center
Kaufman, Phillip; Bradbury, Denise
1992-01-01
The National Education Longitudinal Study of 1988 (NELS:88) is a large-scale, national longitudinal study designed and sponsored by the National Center for Education Statistics (NCES), with support from other government agencies. Beginning in the spring of 1988 with a cohort of eighth graders (25,000) attending public and private schools across…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steed, Chad Allen
EDENx is a multivariate data visualization tool that allows interactive user driven analysis of large-scale data sets with high dimensionality. EDENx builds on our earlier system, called EDEN to enable analysis of more dimensions and larger scale data sets. EDENx provides an initial overview of summary statistics for each variable in the data set under investigation. EDENx allows the user to interact with graphical summary plots of the data to investigate subsets and their statistical associations. These plots include histograms, binned scatterplots, binned parallel coordinate plots, timeline plots, and graphical correlation indicators. From the EDENx interface, a user can selectmore » a subsample of interest and launch a more detailed data visualization via the EDEN system. EDENx is best suited for high-level, aggregate analysis tasks while EDEN is more appropriate for detail data investigations.« less
A PRACTICAL ONTOLOGY FOR THE LARGE-SCALE MODELING OF SCHOLARLY ARTIFACTS AND THEIR USAGE
DOE Office of Scientific and Technical Information (OSTI.GOV)
RODRIGUEZ, MARKO A.; BOLLEN, JOHAN; VAN DE SOMPEL, HERBERT
2007-01-30
The large-scale analysis of scholarly artifact usage is constrained primarily by current practices in usage data archiving, privacy issues concerned with the dissemination of usage data, and the lack of a practical ontology for modeling the usage domain. As a remedy to the third constraint, this article presents a scholarly ontology that was engineered to represent those classes for which large-scale bibliographic and usage data exists, supports usage research, and whose instantiation is scalable to the order of 50 million articles along with their associated artifacts (e.g. authors and journals) and an accompanying 1 billion usage events. The real worldmore » instantiation of the presented abstract ontology is a semantic network model of the scholarly community which lends the scholarly process to statistical analysis and computational support. They present the ontology, discuss its instantiation, and provide some example inference rules for calculating various scholarly artifact metrics.« less
A Virtual Study of Grid Resolution on Experiments of a Highly-Resolved Turbulent Plume
NASA Astrophysics Data System (ADS)
Maisto, Pietro M. F.; Marshall, Andre W.; Gollner, Michael J.; Fire Protection Engineering Department Collaboration
2017-11-01
An accurate representation of sub-grid scale turbulent mixing is critical for modeling fire plumes and smoke transport. In this study, PLIF and PIV diagnostics are used with the saltwater modeling technique to provide highly-resolved instantaneous field measurements in unconfined turbulent plumes useful for statistical analysis, physical insight, and model validation. The effect of resolution was investigated employing a virtual interrogation window (of varying size) applied to the high-resolution field measurements. Motivated by LES low-pass filtering concepts, the high-resolution experimental data in this study can be analyzed within the interrogation windows (i.e. statistics at the sub-grid scale) and on interrogation windows (i.e. statistics at the resolved scale). A dimensionless resolution threshold (L/D*) criterion was determined to achieve converged statistics on the filtered measurements. Such a criterion was then used to establish the relative importance between large and small-scale turbulence phenomena while investigating specific scales for the turbulent flow. First order data sets start to collapse at a resolution of 0.3D*, while for second and higher order statistical moments the interrogation window size drops down to 0.2D*.
Statistical Ensemble of Large Eddy Simulations
NASA Technical Reports Server (NTRS)
Carati, Daniele; Rogers, Michael M.; Wray, Alan A.; Mansour, Nagi N. (Technical Monitor)
2001-01-01
A statistical ensemble of large eddy simulations (LES) is run simultaneously for the same flow. The information provided by the different large scale velocity fields is used to propose an ensemble averaged version of the dynamic model. This produces local model parameters that only depend on the statistical properties of the flow. An important property of the ensemble averaged dynamic procedure is that it does not require any spatial averaging and can thus be used in fully inhomogeneous flows. Also, the ensemble of LES's provides statistics of the large scale velocity that can be used for building new models for the subgrid-scale stress tensor. The ensemble averaged dynamic procedure has been implemented with various models for three flows: decaying isotropic turbulence, forced isotropic turbulence, and the time developing plane wake. It is found that the results are almost independent of the number of LES's in the statistical ensemble provided that the ensemble contains at least 16 realizations.
Recurrence interval analysis of trading volumes
NASA Astrophysics Data System (ADS)
Ren, Fei; Zhou, Wei-Xing
2010-06-01
We study the statistical properties of the recurrence intervals τ between successive trading volumes exceeding a certain threshold q . The recurrence interval analysis is carried out for the 20 liquid Chinese stocks covering a period from January 2000 to May 2009, and two Chinese indices from January 2003 to April 2009. Similar to the recurrence interval distribution of the price returns, the tail of the recurrence interval distribution of the trading volumes follows a power-law scaling, and the results are verified by the goodness-of-fit tests using the Kolmogorov-Smirnov (KS) statistic, the weighted KS statistic and the Cramér-von Mises criterion. The measurements of the conditional probability distribution and the detrended fluctuation function show that both short-term and long-term memory effects exist in the recurrence intervals between trading volumes. We further study the relationship between trading volumes and price returns based on the recurrence interval analysis method. It is found that large trading volumes are more likely to occur following large price returns, and the comovement between trading volumes and price returns is more pronounced for large trading volumes.
Recurrence interval analysis of trading volumes.
Ren, Fei; Zhou, Wei-Xing
2010-06-01
We study the statistical properties of the recurrence intervals τ between successive trading volumes exceeding a certain threshold q. The recurrence interval analysis is carried out for the 20 liquid Chinese stocks covering a period from January 2000 to May 2009, and two Chinese indices from January 2003 to April 2009. Similar to the recurrence interval distribution of the price returns, the tail of the recurrence interval distribution of the trading volumes follows a power-law scaling, and the results are verified by the goodness-of-fit tests using the Kolmogorov-Smirnov (KS) statistic, the weighted KS statistic and the Cramér-von Mises criterion. The measurements of the conditional probability distribution and the detrended fluctuation function show that both short-term and long-term memory effects exist in the recurrence intervals between trading volumes. We further study the relationship between trading volumes and price returns based on the recurrence interval analysis method. It is found that large trading volumes are more likely to occur following large price returns, and the comovement between trading volumes and price returns is more pronounced for large trading volumes.
Jorge, Inmaculada; Navarro, Pedro; Martínez-Acedo, Pablo; Núñez, Estefanía; Serrano, Horacio; Alfranca, Arántzazu; Redondo, Juan Miguel; Vázquez, Jesús
2009-01-01
Statistical models for the analysis of protein expression changes by stable isotope labeling are still poorly developed, particularly for data obtained by 16O/18O labeling. Besides large scale test experiments to validate the null hypothesis are lacking. Although the study of mechanisms underlying biological actions promoted by vascular endothelial growth factor (VEGF) on endothelial cells is of considerable interest, quantitative proteomics studies on this subject are scarce and have been performed after exposing cells to the factor for long periods of time. In this work we present the largest quantitative proteomics study to date on the short term effects of VEGF on human umbilical vein endothelial cells by 18O/16O labeling. Current statistical models based on normality and variance homogeneity were found unsuitable to describe the null hypothesis in a large scale test experiment performed on these cells, producing false expression changes. A random effects model was developed including four different sources of variance at the spectrum-fitting, scan, peptide, and protein levels. With the new model the number of outliers at scan and peptide levels was negligible in three large scale experiments, and only one false protein expression change was observed in the test experiment among more than 1000 proteins. The new model allowed the detection of significant protein expression changes upon VEGF stimulation for 4 and 8 h. The consistency of the changes observed at 4 h was confirmed by a replica at a smaller scale and further validated by Western blot analysis of some proteins. Most of the observed changes have not been described previously and are consistent with a pattern of protein expression that dynamically changes over time following the evolution of the angiogenic response. With this statistical model the 18O labeling approach emerges as a very promising and robust alternative to perform quantitative proteomics studies at a depth of several thousand proteins. PMID:19181660
Ahuja, Sanjeev; Jain, Shilpa; Ram, Kripa
2015-01-01
Characterization of manufacturing processes is key to understanding the effects of process parameters on process performance and product quality. These studies are generally conducted using small-scale model systems. Because of the importance of the results derived from these studies, the small-scale model should be predictive of large scale. Typically, small-scale bioreactors, which are considered superior to shake flasks in simulating large-scale bioreactors, are used as the scale-down models for characterizing mammalian cell culture processes. In this article, we describe a case study where a cell culture unit operation in bioreactors using one-sided pH control and their satellites (small-scale runs conducted using the same post-inoculation cultures and nutrient feeds) in 3-L bioreactors and shake flasks indicated that shake flasks mimicked the large-scale performance better than 3-L bioreactors. We detail here how multivariate analysis was used to make the pertinent assessment and to generate the hypothesis for refining the existing 3-L scale-down model. Relevant statistical techniques such as principal component analysis, partial least square, orthogonal partial least square, and discriminant analysis were used to identify the outliers and to determine the discriminatory variables responsible for performance differences at different scales. The resulting analysis, in combination with mass transfer principles, led to the hypothesis that observed similarities between 15,000-L and shake flask runs, and differences between 15,000-L and 3-L runs, were due to pCO2 and pH values. This hypothesis was confirmed by changing the aeration strategy at 3-L scale. By reducing the initial sparge rate in 3-L bioreactor, process performance and product quality data moved closer to that of large scale. © 2015 American Institute of Chemical Engineers.
Statistical properties of edge plasma turbulence in the Large Helical Device
NASA Astrophysics Data System (ADS)
Dewhurst, J. M.; Hnat, B.; Ohno, N.; Dendy, R. O.; Masuzaki, S.; Morisaki, T.; Komori, A.
2008-09-01
Ion saturation current (Isat) measurements made by three tips of a Langmuir probe array in the Large Helical Device are analysed for two plasma discharges. Absolute moment analysis is used to quantify properties on different temporal scales of the measured signals, which are bursty and intermittent. Strong coherent modes in some datasets are found to distort this analysis and are consequently removed from the time series by applying bandstop filters. Absolute moment analysis of the filtered data reveals two regions of power-law scaling, with the temporal scale τ ≈ 40 µs separating the two regimes. A comparison is made with similar results from the Mega-Amp Spherical Tokamak. The probability density function is studied and a monotonic relationship between connection length and skewness is found. Conditional averaging is used to characterize the average temporal shape of the largest intermittent bursts.
Statistical Analysis of Large Scale Structure by the Discrete Wavelet Transform
NASA Astrophysics Data System (ADS)
Pando, Jesus
1997-10-01
The discrete wavelet transform (DWT) is developed as a general statistical tool for the study of large scale structures (LSS) in astrophysics. The DWT is used in all aspects of structure identification including cluster analysis, spectrum and two-point correlation studies, scale-scale correlation analysis and to measure deviations from Gaussian behavior. The techniques developed are demonstrated on 'academic' signals, on simulated models of the Lymanα (Lyα) forests, and on observational data of the Lyα forests. This technique can detect clustering in the Ly-α clouds where traditional techniques such as the two-point correlation function have failed. The position and strength of these clusters in both real and simulated data is determined and it is shown that clusters exist on scales as large as at least 20 h-1 Mpc at significance levels of 2-4 σ. Furthermore, it is found that the strength distribution of the clusters can be used to distinguish between real data and simulated samples even where other traditional methods have failed to detect differences. Second, a method for measuring the power spectrum of a density field using the DWT is developed. All common features determined by the usual Fourier power spectrum can be calculated by the DWT. These features, such as the index of a power law or typical scales, can be detected even when the samples are geometrically complex, the samples are incomplete, or the mean density on larger scales is not known (the infrared uncertainty). Using this method the spectra of Ly-α forests in both simulated and real samples is calculated. Third, a method for measuring hierarchical clustering is introduced. Because hierarchical evolution is characterized by a set of rules of how larger dark matter halos are formed by the merging of smaller halos, scale-scale correlations of the density field should be one of the most sensitive quantities in determining the merging history. We show that these correlations can be completely determined by the correlations between discrete wavelet coefficients on adjacent scales and at nearly the same spatial position, Cj,j+12/cdot2. Scale-scale correlations on two samples of the QSO Ly-α forests absorption spectra are computed. Lastly, higher order statistics are developed to detect deviations from Gaussian behavior. These higher order statistics are necessary to fully characterize the Ly-α forests because the usual 2nd order statistics, such as the two-point correlation function or power spectrum, give inconclusive results. It is shown how this technique takes advantage of the locality of the DWT to circumvent the central limit theorem. A non-Gaussian spectrum is defined and this spectrum reveals not only the magnitude, but the scales of non-Gaussianity. When applied to simulated and observational samples of the Ly-α clouds, it is found that different popular models of structure formation have different spectra while two, independent observational data sets, have the same spectra. Moreover, the non-Gaussian spectra of real data sets are significantly different from the spectra of various possible random samples. (Abstract shortened by UMI.)
Ensemble Kalman filters for dynamical systems with unresolved turbulence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grooms, Ian, E-mail: grooms@cims.nyu.edu; Lee, Yoonsang; Majda, Andrew J.
Ensemble Kalman filters are developed for turbulent dynamical systems where the forecast model does not resolve all the active scales of motion. Coarse-resolution models are intended to predict the large-scale part of the true dynamics, but observations invariably include contributions from both the resolved large scales and the unresolved small scales. The error due to the contribution of unresolved scales to the observations, called ‘representation’ or ‘representativeness’ error, is often included as part of the observation error, in addition to the raw measurement error, when estimating the large-scale part of the system. It is here shown how stochastic superparameterization (amore » multiscale method for subgridscale parameterization) can be used to provide estimates of the statistics of the unresolved scales. In addition, a new framework is developed wherein small-scale statistics can be used to estimate both the resolved and unresolved components of the solution. The one-dimensional test problem from dispersive wave turbulence used here is computationally tractable yet is particularly difficult for filtering because of the non-Gaussian extreme event statistics and substantial small scale turbulence: a shallow energy spectrum proportional to k{sup −5/6} (where k is the wavenumber) results in two-thirds of the climatological variance being carried by the unresolved small scales. Because the unresolved scales contain so much energy, filters that ignore the representation error fail utterly to provide meaningful estimates of the system state. Inclusion of a time-independent climatological estimate of the representation error in a standard framework leads to inaccurate estimates of the large-scale part of the signal; accurate estimates of the large scales are only achieved by using stochastic superparameterization to provide evolving, large-scale dependent predictions of the small-scale statistics. Again, because the unresolved scales contain so much energy, even an accurate estimate of the large-scale part of the system does not provide an accurate estimate of the true state. By providing simultaneous estimates of both the large- and small-scale parts of the solution, the new framework is able to provide accurate estimates of the true system state.« less
Rogala, James T.; Gray, Brian R.
2006-01-01
The Long Term Resource Monitoring Program (LTRMP) uses a stratified random sampling design to obtain water quality statistics within selected study reaches of the Upper Mississippi River System (UMRS). LTRMP sampling strata are based on aquatic area types generally found in large rivers (e.g., main channel, side channel, backwater, and impounded areas). For hydrologically well-mixed strata (i.e., main channel), variance associated with spatial scales smaller than the strata scale is a relatively minor issue for many water quality parameters. However, analysis of LTRMP water quality data has shown that within-strata variability at the strata scale is high in off-channel areas (i.e., backwaters). A portion of that variability may be associated with differences among individual backwater lakes (i.e., small and large backwater regions separated by channels) that cumulatively make up the backwater stratum. The objective of the statistical modeling presented here is to determine if differences among backwater lakes account for a large portion of the variance observed in the backwater stratum for selected parameters. If variance associated with backwater lakes is high, then inclusion of backwater lake effects within statistical models is warranted. Further, lakes themselves may represent natural experimental units where associations of interest to management may be estimated.
Measures of large-scale structure in the CfA redshift survey slices
NASA Technical Reports Server (NTRS)
De Lapparent, Valerie; Geller, Margaret J.; Huchra, John P.
1991-01-01
Variations of the counts-in-cells with cell size are used here to define two statistical measures of large-scale clustering in three 6 deg slices of the CfA redshift survey. A percolation criterion is used to estimate the filling factor which measures the fraction of the total volume in the survey occupied by the large-scale structures. For the full 18 deg slice of the CfA redshift survey, f is about 0.25 + or - 0.05. After removing groups with more than five members from two of the slices, variations of the counts in occupied cells with cell size have a power-law behavior with a slope beta about 2.2 on scales from 1-10/h Mpc. Application of both this statistic and the percolation analysis to simulations suggests that a network of two-dimensional structures is a better description of the geometry of the clustering in the CfA slices than a network of one-dimensional structures. Counts-in-cells are also used to estimate at 0.3 galaxy h-squared/Mpc the average galaxy surface density in sheets like the Great Wall.
New probes of Cosmic Microwave Background large-scale anomalies
NASA Astrophysics Data System (ADS)
Aiola, Simone
Fifty years of Cosmic Microwave Background (CMB) data played a crucial role in constraining the parameters of the LambdaCDM model, where Dark Energy, Dark Matter, and Inflation are the three most important pillars not yet understood. Inflation prescribes an isotropic universe on large scales, and it generates spatially-correlated density fluctuations over the whole Hubble volume. CMB temperature fluctuations on scales bigger than a degree in the sky, affected by modes on super-horizon scale at the time of recombination, are a clean snapshot of the universe after inflation. In addition, the accelerated expansion of the universe, driven by Dark Energy, leaves a hardly detectable imprint in the large-scale temperature sky at late times. Such fundamental predictions have been tested with current CMB data and found to be in tension with what we expect from our simple LambdaCDM model. Is this tension just a random fluke or a fundamental issue with the present model? In this thesis, we present a new framework to probe the lack of large-scale correlations in the temperature sky using CMB polarization data. Our analysis shows that if a suppression in the CMB polarization correlations is detected, it will provide compelling evidence for new physics on super-horizon scale. To further analyze the statistical properties of the CMB temperature sky, we constrain the degree of statistical anisotropy of the CMB in the context of the observed large-scale dipole power asymmetry. We find evidence for a scale-dependent dipolar modulation at 2.5sigma. To isolate late-time signals from the primordial ones, we test the anomalously high Integrated Sachs-Wolfe effect signal generated by superstructures in the universe. We find that the detected signal is in tension with the expectations from LambdaCDM at the 2.5sigma level, which is somewhat smaller than what has been previously argued. To conclude, we describe the current status of CMB observations on small scales, highlighting the tensions between Planck, WMAP, and SPT temperature data and how the upcoming data release of the ACTpol experiment will contribute to this matter. We provide a description of the current status of the data-analysis pipeline and discuss its ability to recover large-scale modes.
Colizza, Vittoria; Barrat, Alain; Barthélemy, Marc; Vespignani, Alessandro
2006-02-14
The systematic study of large-scale networks has unveiled the ubiquitous presence of connectivity patterns characterized by large-scale heterogeneities and unbounded statistical fluctuations. These features affect dramatically the behavior of the diffusion processes occurring on networks, determining the ensuing statistical properties of their evolution pattern and dynamics. In this article, we present a stochastic computational framework for the forecast of global epidemics that considers the complete worldwide air travel infrastructure complemented with census population data. We address two basic issues in global epidemic modeling: (i) we study the role of the large scale properties of the airline transportation network in determining the global diffusion pattern of emerging diseases; and (ii) we evaluate the reliability of forecasts and outbreak scenarios with respect to the intrinsic stochasticity of disease transmission and traffic flows. To address these issues we define a set of quantitative measures able to characterize the level of heterogeneity and predictability of the epidemic pattern. These measures may be used for the analysis of containment policies and epidemic risk assessment.
Macroecological patterns of phytoplankton in the northwestern North Atlantic Ocean.
Li, W K W
2002-09-12
Many issues in biological oceanography are regional or global in scope; however, there are not many data sets of extensive areal coverage for marine plankton. In microbial ecology, a fruitful approach to large-scale questions is comparative analysis wherein statistical data patterns are sought from different ecosystems, frequently assembled from unrelated studies. A more recent approach termed macroecology characterizes phenomena emerging from large numbers of biological units by emphasizing the shapes and boundaries of statistical distributions, because these reflect the constraints on variation. Here, I use a set of flow cytometric measurements to provide macroecological perspectives on North Atlantic phytoplankton communities. Distinct trends of abundance in picophytoplankton and both small and large nanophytoplankton underlaid two patterns. First, total abundance of the three groups was related to assemblage mean-cell size according to the 3/4 power law of allometric scaling in biology. Second, cytometric diversity (an ataxonomic measure of assemblage entropy) was maximal at intermediate levels of water column stratification. Here, intermediate disturbance shapes diversity through an equitable distribution of cells in size classes, from which arises a high overall biomass. By subsuming local fluctuations, macroecology reveals meaningful patterns of phytoplankton at large scales.
Stability of knotted vortices in wave chaos
NASA Astrophysics Data System (ADS)
Taylor, Alexander; Dennis, Mark
Large scale tangles of disordered filaments occur in many diverse physical systems, from turbulent superfluids to optical volume speckle to liquid crystal phases. They can exhibit particular large scale random statistics despite very different local physics. We have previously used the topological statistics of knotting and linking to characterise the large scale tangling, using the vortices of three-dimensional wave chaos as a universal model system whose physical lengthscales are set only by the wavelength. Unlike geometrical quantities, the statistics of knotting depend strongly on the physical system and boundary conditions. Although knotting patterns characterise different systems, the topology of vortices is highly unstable to perturbation, under which they may reconnect with one another. In systems of constructed knots, these reconnections generally rapidly destroy the knot, but for vortex tangles the topological statistics must be stable. Using large scale simulations of chaotic eigenfunctions, we numerically investigate the prevalence and impact of reconnection events, and their effect on the topology of the tangle.
The statistical power to detect cross-scale interactions at macroscales
Wagner, Tyler; Fergus, C. Emi; Stow, Craig A.; Cheruvelil, Kendra S.; Soranno, Patricia A.
2016-01-01
Macroscale studies of ecological phenomena are increasingly common because stressors such as climate and land-use change operate at large spatial and temporal scales. Cross-scale interactions (CSIs), where ecological processes operating at one spatial or temporal scale interact with processes operating at another scale, have been documented in a variety of ecosystems and contribute to complex system dynamics. However, studies investigating CSIs are often dependent on compiling multiple data sets from different sources to create multithematic, multiscaled data sets, which results in structurally complex, and sometimes incomplete data sets. The statistical power to detect CSIs needs to be evaluated because of their importance and the challenge of quantifying CSIs using data sets with complex structures and missing observations. We studied this problem using a spatially hierarchical model that measures CSIs between regional agriculture and its effects on the relationship between lake nutrients and lake productivity. We used an existing large multithematic, multiscaled database, LAke multiscaled GeOSpatial, and temporal database (LAGOS), to parameterize the power analysis simulations. We found that the power to detect CSIs was more strongly related to the number of regions in the study rather than the number of lakes nested within each region. CSI power analyses will not only help ecologists design large-scale studies aimed at detecting CSIs, but will also focus attention on CSI effect sizes and the degree to which they are ecologically relevant and detectable with large data sets.
Solving large scale structure in ten easy steps with COLA
NASA Astrophysics Data System (ADS)
Tassev, Svetlin; Zaldarriaga, Matias; Eisenstein, Daniel J.
2013-06-01
We present the COmoving Lagrangian Acceleration (COLA) method: an N-body method for solving for Large Scale Structure (LSS) in a frame that is comoving with observers following trajectories calculated in Lagrangian Perturbation Theory (LPT). Unlike standard N-body methods, the COLA method can straightforwardly trade accuracy at small-scales in order to gain computational speed without sacrificing accuracy at large scales. This is especially useful for cheaply generating large ensembles of accurate mock halo catalogs required to study galaxy clustering and weak lensing, as those catalogs are essential for performing detailed error analysis for ongoing and future surveys of LSS. As an illustration, we ran a COLA-based N-body code on a box of size 100 Mpc/h with particles of mass ≈ 5 × 109Msolar/h. Running the code with only 10 timesteps was sufficient to obtain an accurate description of halo statistics down to halo masses of at least 1011Msolar/h. This is only at a modest speed penalty when compared to mocks obtained with LPT. A standard detailed N-body run is orders of magnitude slower than our COLA-based code. The speed-up we obtain with COLA is due to the fact that we calculate the large-scale dynamics exactly using LPT, while letting the N-body code solve for the small scales, without requiring it to capture exactly the internal dynamics of halos. Achieving a similar level of accuracy in halo statistics without the COLA method requires at least 3 times more timesteps than when COLA is employed.
NASA Astrophysics Data System (ADS)
Leng, Bi-Bin; Gong, Jian; Zhang, Wen-bo; Ji, Xue-Qiang
2017-11-01
According to the environmental pollution caused by large-scale pig breeding, the SPSS statistical software and factor analysis method were used to calculate the environmental pollution bearing index of China’s breeding scale from 2006 to 2015. The results showed that with the increase of scale the density of live pig farming and the amount of fertilizer application in agricultural production increased. However, due to the improvement of national environmental awareness, industrial waste water discharge is greatly reduced. China's hog farming environmental pollution load index is rising.
Thinking big: linking rivers to landscapes
Joan O’Callaghan; Ashley E. Steel; Kelly M. Burnett
2012-01-01
Exploring relationships between landscape characteristics and rivers is an emerging field, enabled by the proliferation of satellite date, advances in statistical analysis, and increased emphasis on large-scale monitoring. Landscapes features such as road networks, underlying geology, and human developments, determine the characteristics of the rivers flowing through...
NASA Astrophysics Data System (ADS)
Zhang, M.; Liu, S.
2017-12-01
Despite extensive studies on hydrological responses to forest cover change in small watersheds, the hydrological responses to forest change and associated mechanisms across multiple spatial scales have not been fully understood. This review thus examined about 312 watersheds worldwide to provide a generalized framework to evaluate hydrological responses to forest cover change and to identify the contribution of spatial scale, climate, forest type and hydrological regime in determining the intensity of forest change related hydrological responses in small (<1000 km2) and large watersheds (≥1000 km2). Key findings include: 1) the increase in annual runoff associated with forest cover loss is statistically significant at multiple spatial scales whereas the effect of forest cover gain is statistically inconsistent; 2) the sensitivity of annual runoff to forest cover change tends to attenuate as watershed size increases only in large watersheds; 3) annual runoff is more sensitive to forest cover change in water-limited watersheds than in energy-limited watersheds across all spatial scales; and 4) small mixed forest-dominated watersheds or large snow-dominated watersheds are more hydrologically resilient to forest cover change. These findings improve the understanding of hydrological response to forest cover change at different spatial scales and provide a scientific underpinning to future watershed management in the context of climate change and increasing anthropogenic disturbances.
Reynolds number dependence of relative dispersion statistics in isotropic turbulence
NASA Astrophysics Data System (ADS)
Sawford, Brian L.; Yeung, P. K.; Hackl, Jason F.
2008-06-01
Direct numerical simulation results for a range of relative dispersion statistics over Taylor-scale Reynolds numbers up to 650 are presented in an attempt to observe and quantify inertial subrange scaling and, in particular, Richardson's t3 law. The analysis includes the mean-square separation and a range of important but less-studied differential statistics for which the motion is defined relative to that at time t =0. It seeks to unambiguously identify and quantify the Richardson scaling by demonstrating convergence with both the Reynolds number and initial separation. According to these criteria, the standard compensated plots for these statistics in inertial subrange scaling show clear evidence of a Richardson range but with an imprecise estimate for the Richardson constant. A modified version of the cube-root plots introduced by Ott and Mann [J. Fluid Mech. 422, 207 (2000)] confirms such convergence. It has been used to yield more precise estimates for Richardson's constant g which decrease with Taylor-scale Reynolds numbers over the range of 140-650. Extrapolation to the large Reynolds number limit gives an asymptotic value for Richardson's constant in the range g =0.55-0.57, depending on the functional form used to make the extrapolation.
Ho, Andrew D; Yu, Carol C
2015-06-01
Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological practice. In this article, the authors extend these previous analyses to state-level educational test score distributions that are an increasingly common target of high-stakes analysis and interpretation. Among 504 scale-score and raw-score distributions from state testing programs from recent years, nonnormal distributions are common and are often associated with particular state programs. The authors explain how scaling procedures from item response theory lead to nonnormal distributions as well as unusual patterns of discreteness. The authors recommend that distributional descriptive statistics be calculated routinely to inform model selection for large-scale test score data, and they illustrate consequences of nonnormality using sensitivity studies that compare baseline results to those from normalized score scales.
Statistical analysis of Hasegawa-Wakatani turbulence
NASA Astrophysics Data System (ADS)
Anderson, Johan; Hnat, Bogdan
2017-06-01
Resistive drift wave turbulence is a multipurpose paradigm that can be used to understand transport at the edge of fusion devices. The Hasegawa-Wakatani model captures the essential physics of drift turbulence while retaining the simplicity needed to gain a qualitative understanding of this process. We provide a theoretical interpretation of numerically generated probability density functions (PDFs) of intermittent events in Hasegawa-Wakatani turbulence with enforced equipartition of energy in large scale zonal flows, and small scale drift turbulence. We find that for a wide range of adiabatic index values, the stochastic component representing the small scale turbulent eddies of the flow, obtained from the autoregressive integrated moving average model, exhibits super-diffusive statistics, consistent with intermittent transport. The PDFs of large events (above one standard deviation) are well approximated by the Laplace distribution, while small events often exhibit a Gaussian character. Furthermore, there exists a strong influence of zonal flows, for example, via shearing and then viscous dissipation maintaining a sub-diffusive character of the fluxes.
NASA Astrophysics Data System (ADS)
Borelli, M. E. S.; Kleinert, H.; Schakel, Adriaan M. J.
2000-03-01
The effect of quantum fluctuations on a nearly flat, nonrelativistic two-dimensional membrane with extrinsic curvature stiffness and tension is investigated. The renormalization group analysis is carried out in first-order perturbative theory. In contrast to thermal fluctuations, which soften the membrane at large scales and turn it into a crumpled surface, quantum fluctuations are found to stiffen the membrane, so that it exhibits a Hausdorff dimension equal to two. The large-scale behavior of the membrane is further studied at finite temperature, where a nontrivial fixed point is found, signaling a crumpling transition.
Li, Zhijin; Vogelmann, Andrew M.; Feng, Sha; ...
2015-01-20
We produce fine-resolution, three-dimensional fields of meteorological and other variables for the U.S. Department of Energy’s Atmospheric Radiation Measurement (ARM) Southern Great Plains site. The Community Gridpoint Statistical Interpolation system is implemented in a multiscale data assimilation (MS-DA) framework that is used within the Weather Research and Forecasting model at a cloud-resolving resolution of 2 km. The MS-DA algorithm uses existing reanalysis products and constrains fine-scale atmospheric properties by assimilating high-resolution observations. A set of experiments show that the data assimilation analysis realistically reproduces the intensity, structure, and time evolution of clouds and precipitation associated with a mesoscale convective system.more » Evaluations also show that the large-scale forcing derived from the fine-resolution analysis has an overall accuracy comparable to the existing ARM operational product. For enhanced applications, the fine-resolution fields are used to characterize the contribution of subgrid variability to the large-scale forcing and to derive hydrometeor forcing, which are presented in companion papers.« less
Integral criteria for large-scale multiple fingerprint solutions
NASA Astrophysics Data System (ADS)
Ushmaev, Oleg S.; Novikov, Sergey O.
2004-08-01
We propose the definition and analysis of the optimal integral similarity score criterion for large scale multmodal civil ID systems. Firstly, the general properties of score distributions for genuine and impostor matches for different systems and input devices are investigated. The empirical statistics was taken from the real biometric tests. Then we carry out the analysis of simultaneous score distributions for a number of combined biometric tests and primary for ultiple fingerprint solutions. The explicit and approximate relations for optimal integral score, which provides the least value of the FRR while the FAR is predefined, have been obtained. The results of real multiple fingerprint test show good correspondence with the theoretical results in the wide range of the False Acceptance and the False Rejection Rates.
Large-scale quantitative analysis of painting arts.
Kim, Daniel; Son, Seung-Woo; Jeong, Hawoong
2014-12-11
Scientists have made efforts to understand the beauty of painting art in their own languages. As digital image acquisition of painting arts has made rapid progress, researchers have come to a point where it is possible to perform statistical analysis of a large-scale database of artistic paints to make a bridge between art and science. Using digital image processing techniques, we investigate three quantitative measures of images - the usage of individual colors, the variety of colors, and the roughness of the brightness. We found a difference in color usage between classical paintings and photographs, and a significantly low color variety of the medieval period. Interestingly, moreover, the increment of roughness exponent as painting techniques such as chiaroscuro and sfumato have advanced is consistent with historical circumstances.
Schwämmle, Veit; León, Ileana Rodríguez; Jensen, Ole Nørregaard
2013-09-06
Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.
Understanding the k-5/3 to k-2.4 spectral break in aircraft wind data
NASA Astrophysics Data System (ADS)
Pinel, J.; Lovejoy, S.; Schertzer, D. J.; Tuck, A.
2010-12-01
A fundamental issue in atmospheric dynamics is to understand how the statistics of fluctuations of various fields vary with their space-time scale. The classical - and still “standard” model - dates back to Kraichnan and Charney’s work on 2-D and geostrophic (quasi 2-D) turbulence at the end of the 1960’s and early 1970’s. It postulates an isotropic 2-D turbulent regime at large scales and an isotropic 3D regime at small scales separated by a “dimensional transition” (once called a “mesoscale gap”) near the pressure scale height of ≈10 km. By the early 1980’s a quite different model emerged, the 23/9-D scaling model in which the dynamics were postulated to be dominated (over wide scale ranges) by a strongly anisotropic scale invariant cascade mechanism with structures becoming flatter and flatter at larger and larger scales in a scaling manner: the isotropy assumptions were discarded but the scaling and cascade assumptions retained. Today, thanks to the revolution in geodata and atmospheric models - both in quality and quantity - the 23/9-D model can explain the observed horizontal cascade structures in remotely sensed radiances, in meteorological “reanalyses”, in meteorological models, in high resolution drop sonde vertical analyses, of lidar vertical sections etc. All of these analyses directly contradict the standard model which predicts drastic “dimensional transitions” for scalar quantities. Indeed, until recently the only unexplained feature was a scale break in aircraft spectra of the (vector) horizontal wind somewhere between about 40 and 200 km. However - contrary to repeated claims - and thanks to a reanalysis of the historical papers - the transition that had been observed since the 1980’s was not between k^-5/3 and k^-3 but rather between k^-5/3 and k^-2.4. By 2009, the standard model was thus hanging by a thread. This was cut when careful analysis of scientific aircraft data allowed the 23/9-D model to explain the large scale k-2.4 regime as an artefact of the aircraft following a sloping trajectory: at large enough scales, the spectrum is simply dominated by vertical rather than horizontal fluctuations which have the required k^-2.4 form. Since aircraft frequently follow gently sloping isobars, this neatly explains the last obstacle to wide range anisotropic scaling models finally opening the door to an urgently needed consensus on the statistical structure of the atmosphere. However, objections remain: at large enough scales do isobaric and isoheight spectra really have different exponents? In this presentation we attempted to study this issue in more detail than before by analyzed data measured by commercial aircrafts through the Tropospheric Airborne Meteorological Data Reporting (TAMDAR) system over CONUS during year 2009. The TAMDAR system allows us to calculate the statistical properties of the wind field on constant pressure and altitude levels. Various statistical exponents were calculated (velocity increment in terms of horizontal, vertical displacement, pressure and time) and we show here what we learned and how this analysis can help with solving this question.
SLIDE - a web-based tool for interactive visualization of large-scale -omics data.
Ghosh, Soumita; Datta, Abhik; Tan, Kaisen; Choi, Hyungwon
2018-06-28
Data visualization is often regarded as a post hoc step for verifying statistically significant results in the analysis of high-throughput data sets. This common practice leaves a large amount of raw data behind, from which more information can be extracted. However, existing solutions do not provide capabilities to explore large-scale raw datasets using biologically sensible queries, nor do they allow user interaction based real-time customization of graphics. To address these drawbacks, we have designed an open-source, web-based tool called Systems-Level Interactive Data Exploration, or SLIDE to visualize large-scale -omics data interactively. SLIDE's interface makes it easier for scientists to explore quantitative expression data in multiple resolutions in a single screen. SLIDE is publicly available under BSD license both as an online version as well as a stand-alone version at https://github.com/soumitag/SLIDE. Supplementary Information are available at Bioinformatics online.
NASA Astrophysics Data System (ADS)
Yan, Hui; Wang, K. G.; Jones, Jim E.
2016-06-01
A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.
Assessment of statistical methods used in library-based approaches to microbial source tracking.
Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D
2003-12-01
Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.
Universal statistics of vortex tangles in three-dimensional random waves
NASA Astrophysics Data System (ADS)
Taylor, Alexander J.
2018-02-01
The tangled nodal lines (wave vortices) in random, three-dimensional wavefields are studied as an exemplar of a fractal loop soup. Their statistics are a three-dimensional counterpart to the characteristic random behaviour of nodal domains in quantum chaos, but in three dimensions the filaments can wind around one another to give distinctly different large scale behaviours. By tracing numerically the structure of the vortices, their conformations are shown to follow recent analytical predictions for random vortex tangles with periodic boundaries, where the local disorder of the model ‘averages out’ to produce large scale power law scaling relations whose universality classes do not depend on the local physics. These results explain previous numerical measurements in terms of an explicit effect of the periodic boundaries, where the statistics of the vortices are strongly affected by the large scale connectedness of the system even at arbitrarily high energies. The statistics are investigated primarily for static (monochromatic) wavefields, but the analytical results are further shown to directly describe the reconnection statistics of vortices evolving in certain dynamic systems, or occurring during random perturbations of the static configuration.
Solving large scale structure in ten easy steps with COLA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tassev, Svetlin; Zaldarriaga, Matias; Eisenstein, Daniel J., E-mail: stassev@cfa.harvard.edu, E-mail: matiasz@ias.edu, E-mail: deisenstein@cfa.harvard.edu
2013-06-01
We present the COmoving Lagrangian Acceleration (COLA) method: an N-body method for solving for Large Scale Structure (LSS) in a frame that is comoving with observers following trajectories calculated in Lagrangian Perturbation Theory (LPT). Unlike standard N-body methods, the COLA method can straightforwardly trade accuracy at small-scales in order to gain computational speed without sacrificing accuracy at large scales. This is especially useful for cheaply generating large ensembles of accurate mock halo catalogs required to study galaxy clustering and weak lensing, as those catalogs are essential for performing detailed error analysis for ongoing and future surveys of LSS. As anmore » illustration, we ran a COLA-based N-body code on a box of size 100 Mpc/h with particles of mass ≈ 5 × 10{sup 9}M{sub s}un/h. Running the code with only 10 timesteps was sufficient to obtain an accurate description of halo statistics down to halo masses of at least 10{sup 11}M{sub s}un/h. This is only at a modest speed penalty when compared to mocks obtained with LPT. A standard detailed N-body run is orders of magnitude slower than our COLA-based code. The speed-up we obtain with COLA is due to the fact that we calculate the large-scale dynamics exactly using LPT, while letting the N-body code solve for the small scales, without requiring it to capture exactly the internal dynamics of halos. Achieving a similar level of accuracy in halo statistics without the COLA method requires at least 3 times more timesteps than when COLA is employed.« less
NASA Astrophysics Data System (ADS)
Grabsch, Aurélien; Majumdar, Satya N.; Texier, Christophe
2017-06-01
Invariant ensembles of random matrices are characterized by the distribution of their eigenvalues \\{λ _1,\\ldots ,λ _N\\}. We study the distribution of truncated linear statistics of the form \\tilde{L}=\\sum _{i=1}^p f(λ _i) with p
Computational methods to extract meaning from text and advance theories of human cognition.
McNamara, Danielle S
2011-01-01
Over the past two decades, researchers have made great advances in the area of computational methods for extracting meaning from text. This research has to a large extent been spurred by the development of latent semantic analysis (LSA), a method for extracting and representing the meaning of words using statistical computations applied to large corpora of text. Since the advent of LSA, researchers have developed and tested alternative statistical methods designed to detect and analyze meaning in text corpora. This research exemplifies how statistical models of semantics play an important role in our understanding of cognition and contribute to the field of cognitive science. Importantly, these models afford large-scale representations of human knowledge and allow researchers to explore various questions regarding knowledge, discourse processing, text comprehension, and language. This topic includes the latest progress by the leading researchers in the endeavor to go beyond LSA. Copyright © 2010 Cognitive Science Society, Inc.
NASA Astrophysics Data System (ADS)
Whittaker, Kara A.; McShane, Dan
2013-02-01
A large storm event in southwest Washington State triggered over 2500 landslides and provided an opportunity to assess two slope stability screening tools. The statistical analysis conducted demonstrated that both screening tools are effective at predicting where landslides were likely to take place (Whittaker and McShane, 2012). Here we reply to two discussions of this article related to the development of the slope stability screening tools and the accuracy and scale of the spatial data used. Neither of the discussions address our statistical analysis or results. We provide greater detail on our sampling criteria and also elaborate on the policy and management implications of our findings and how they complement those of a separate investigation of landslides resulting from the same storm. The conclusions made in Whittaker and McShane (2012) stand as originally published unless future analysis indicates otherwise.
Simulating statistics of lightning-induced and man made fires
NASA Astrophysics Data System (ADS)
Krenn, R.; Hergarten, S.
2009-04-01
The frequency-area distributions of forest fires show power-law behavior with scaling exponents α in a quite narrow range, relating wildfire research to the theoretical framework of self-organized criticality. Examples of self-organized critical behavior can be found in computer simulations of simple cellular automata. The established self-organized critical Drossel-Schwabl forest fire model (DS-FFM) is one of the most widespread models in this context. Despite its qualitative agreement with event-size statistics from nature, its applicability is still questioned. Apart from general concerns that the DS-FFM apparently oversimplifies the complex nature of forest dynamics, it significantly overestimates the frequency of large fires. We present a straightforward modification of the model rules that increases the scaling exponent α by approximately 13 and brings the simulated event-size statistics close to those observed in nature. In addition, combined simulations of both the original and the modified model predict a dependence of the overall distribution on the ratio of lightning induced and man made fires as well as a difference between their respective event-size statistics. The increase of the scaling exponent with decreasing lightning probability as well as the splitting of the partial distributions are confirmed by the analysis of the Canadian Large Fire Database. As a consequence, lightning induced and man made forest fires cannot be treated separately in wildfire modeling, hazard assessment and forest management.
Spline analysis of the mandible in human subjects with class III malocclusion.
Singh, G D; McNamara, J A; Lozanoff, S
1997-05-01
This study determines deformations that contribute to a Class III mandibular morphology, employing thin-plate spline (TPS) analysis. A total of 133 lateral cephalographs of prepubertal children of European-American descent with either a Class I molar occlusion or a Class III malocclusion were compared. The cephalographs were traced and checked, and eight homologous landmarks on the mandible were identified and digitized. The datasets were scaled to an equivalent size and subjected to statistical analyses. These tests indicated significant differences between average Class I and Class III mandibular morphologies. When the sample was subdivided into seven age and sex-matched groups statistical differences were maintained for each group. TPS analysis indicated that both affine (uniform) and non-affine transformations contribute towards the total spline, and towards the average mandibular morphology at each age group. For non-affine transformations, partial warp 5 had the highest magnitude, indicating large-scale deformations of the mandibular configuration between articulare and pogonion. In contrast, partial warp 1 indicated localized shape changes in the mandibular symphyseal region. It is concluded that large spatial-scale deformations affect the body of the mandible, in combination with localized distortions further anteriorly. These deformations may represent a developmental elongation of the mandibular corpus antero-posteriorly that, allied with symphyseal changes, leads to the appearance of a Class III prognathic mandibular profile.
NASA Astrophysics Data System (ADS)
Feldmann, Daniel; Bauer, Christian; Wagner, Claus
2018-03-01
We present results from direct numerical simulations (DNS) of turbulent pipe flow at shear Reynolds numbers up to Reτ = 1500 using different computational domains with lengths up to ?. The objectives are to analyse the effect of the finite size of the periodic pipe domain on large flow structures in dependency of Reτ and to assess a minimum ? required for relevant turbulent scales to be captured and a minimum Reτ for very large-scale motions (VLSM) to be analysed. Analysing one-point statistics revealed that the mean velocity profile is invariant for ?. The wall-normal location at which deviations occur in shorter domains changes strongly with increasing Reτ from the near-wall region to the outer layer, where VLSM are believed to live. The root mean square velocity profiles exhibit domain length dependencies for pipes shorter than 14R and 7R depending on Reτ. For all Reτ, the higher-order statistical moments show only weak dependencies and only for the shortest domain considered here. However, the analysis of one- and two-dimensional pre-multiplied energy spectra revealed that even for larger ?, not all physically relevant scales are fully captured, even though the aforementioned statistics are in good agreement with the literature. We found ? to be sufficiently large to capture VLSM-relevant turbulent scales in the considered range of Reτ based on our definition of an integral energy threshold of 10%. The requirement to capture at least 1/10 of the global maximum energy level is justified by a 14% increase of the streamwise turbulence intensity in the outer region between Reτ = 720 and 1500, which can be related to VLSM-relevant length scales. Based on this scaling anomaly, we found Reτ⪆1500 to be a necessary minimum requirement to investigate VLSM-related effects in pipe flow, even though the streamwise energy spectra does not yet indicate sufficient scale separation between the most energetic and the very long motions.
Identifying currents in the gene pool for bacterial populations using an integrative approach.
Tang, Jing; Hanage, William P; Fraser, Christophe; Corander, Jukka
2009-08-01
The evolution of bacterial populations has recently become considerably better understood due to large-scale sequencing of population samples. It has become clear that DNA sequences from a multitude of genes, as well as a broad sample coverage of a target population, are needed to obtain a relatively unbiased view of its genetic structure and the patterns of ancestry connected to the strains. However, the traditional statistical methods for evolutionary inference, such as phylogenetic analysis, are associated with several difficulties under such an extensive sampling scenario, in particular when a considerable amount of recombination is anticipated to have taken place. To meet the needs of large-scale analyses of population structure for bacteria, we introduce here several statistical tools for the detection and representation of recombination between populations. Also, we introduce a model-based description of the shape of a population in sequence space, in terms of its molecular variability and affinity towards other populations. Extensive real data from the genus Neisseria are utilized to demonstrate the potential of an approach where these population genetic tools are combined with an phylogenetic analysis. The statistical tools introduced here are freely available in BAPS 5.2 software, which can be downloaded from http://web.abo.fi/fak/mnf/mate/jc/software/baps.html.
Pao, Sheng-Ying; Lin, Win-Li; Hwang, Ming-Jing
2006-01-01
Background Screening for differentially expressed genes on the genomic scale and comparative analysis of the expression profiles of orthologous genes between species to study gene function and regulation are becoming increasingly feasible. Expressed sequence tags (ESTs) are an excellent source of data for such studies using bioinformatic approaches because of the rich libraries and tremendous amount of data now available in the public domain. However, any large-scale EST-based bioinformatics analysis must deal with the heterogeneous, and often ambiguous, tissue and organ terms used to describe EST libraries. Results To deal with the issue of tissue source, in this work, we carefully screened and organized more than 8 million human and mouse ESTs into 157 human and 108 mouse tissue/organ categories, to which we applied an established statistic test using different thresholds of the p value to identify genes differentially expressed in different tissues. Further analysis of the tissue distribution and level of expression of human and mouse orthologous genes showed that tissue-specific orthologs tended to have more similar expression patterns than those lacking significant tissue specificity. On the other hand, a number of orthologs were found to have significant disparity in their expression profiles, hinting at novel functions, divergent regulation, or new ortholog relationships. Conclusion Comprehensive statistics on the tissue-specific expression of human and mouse genes were obtained in this very large-scale, EST-based analysis. These statistical results have been organized into a database, freely accessible at our website , for easy searching of human and mouse tissue-specific genes and for investigating gene expression profiles in the context of comparative genomics. Comparative analysis showed that, although highly tissue-specific genes tend to exhibit similar expression profiles in human and mouse, there are significant exceptions, indicating that orthologous genes, while sharing basic genomic properties, could result in distinct phenotypes. PMID:16626500
Lu, Qiongshi; Li, Boyang; Ou, Derek; Erlendsdottir, Margret; Powles, Ryan L; Jiang, Tony; Hu, Yiming; Chang, David; Jin, Chentian; Dai, Wei; He, Qidu; Liu, Zefeng; Mukherjee, Shubhabrata; Crane, Paul K; Zhao, Hongyu
2017-12-07
Despite the success of large-scale genome-wide association studies (GWASs) on complex traits, our understanding of their genetic architecture is far from complete. Jointly modeling multiple traits' genetic profiles has provided insights into the shared genetic basis of many complex traits. However, large-scale inference sets a high bar for both statistical power and biological interpretability. Here we introduce a principled framework to estimate annotation-stratified genetic covariance between traits using GWAS summary statistics. Through theoretical and numerical analyses, we demonstrate that our method provides accurate covariance estimates, thereby enabling researchers to dissect both the shared and distinct genetic architecture across traits to better understand their etiologies. Among 50 complex traits with publicly accessible GWAS summary statistics (N total ≈ 4.5 million), we identified more than 170 pairs with statistically significant genetic covariance. In particular, we found strong genetic covariance between late-onset Alzheimer disease (LOAD) and amyotrophic lateral sclerosis (ALS), two major neurodegenerative diseases, in single-nucleotide polymorphisms (SNPs) with high minor allele frequencies and in SNPs located in the predicted functional genome. Joint analysis of LOAD, ALS, and other traits highlights LOAD's correlation with cognitive traits and hints at an autoimmune component for ALS. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Huang, Liang; Ni, Xuan; Ditto, William L.; Spano, Mark; Carney, Paul R.; Lai, Ying-Cheng
2017-01-01
We develop a framework to uncover and analyse dynamical anomalies from massive, nonlinear and non-stationary time series data. The framework consists of three steps: preprocessing of massive datasets to eliminate erroneous data segments, application of the empirical mode decomposition and Hilbert transform paradigm to obtain the fundamental components embedded in the time series at distinct time scales, and statistical/scaling analysis of the components. As a case study, we apply our framework to detecting and characterizing high-frequency oscillations (HFOs) from a big database of rat electroencephalogram recordings. We find a striking phenomenon: HFOs exhibit on-off intermittency that can be quantified by algebraic scaling laws. Our framework can be generalized to big data-related problems in other fields such as large-scale sensor data and seismic data analysis.
NASA Astrophysics Data System (ADS)
Gerlitz, Lars; Gafurov, Abror; Apel, Heiko; Unger-Sayesteh, Katy; Vorogushyn, Sergiy; Merz, Bruno
2016-04-01
Statistical climate forecast applications typically utilize a small set of large scale SST or climate indices, such as ENSO, PDO or AMO as predictor variables. If the predictive skill of these large scale modes is insufficient, specific predictor variables such as customized SST patterns are frequently included. Hence statistically based climate forecast models are either based on a fixed number of climate indices (and thus might not consider important predictor variables) or are highly site specific and barely transferable to other regions. With the aim of developing an operational seasonal forecast model, which is easily transferable to any region in the world, we present a generic data mining approach which automatically selects potential predictors from gridded SST observations and reanalysis derived large scale atmospheric circulation patterns and generates robust statistical relationships with posterior precipitation anomalies for user selected target regions. Potential predictor variables are derived by means of a cellwise correlation analysis of precipitation anomalies with gridded global climate variables under consideration of varying lead times. Significantly correlated grid cells are subsequently aggregated to predictor regions by means of a variability based cluster analysis. Finally for every month and lead time, an individual random forest based forecast model is automatically calibrated and evaluated by means of the preliminary generated predictor variables. The model is exemplarily applied and evaluated for selected headwater catchments in Central and South Asia. Particularly the for winter and spring precipitation (which is associated with westerly disturbances in the entire target domain) the model shows solid results with correlation coefficients up to 0.7, although the variability of precipitation rates is highly underestimated. Likewise for the monsoonal precipitation amounts in the South Asian target areas a certain skill of the model could be detected. The skill of the model for the dry summer season in Central Asia and the transition seasons over South Asia is found to be low. A sensitivity analysis by means on well known climate indices reveals the major large scale controlling mechanisms for the seasonal precipitation climate of each target area. For the Central Asian target areas, both, the El Nino Southern Oscillation and the North Atlantic Oscillation are identified as important controlling factors for precipitation totals during moist spring season. Drought conditions are found to be triggered by a warm ENSO phase in combination with a positive phase of the NAO. For the monsoonal summer precipitation amounts over Southern Asia, the model suggests a distinct negative response to El Nino events.
Structure of small-scale magnetic fields in the kinematic dynamo theory.
Schekochihin, Alexander; Cowley, Steven; Maron, Jason; Malyshkin, Leonid
2002-01-01
A weak fluctuating magnetic field embedded into a a turbulent conducting medium grows exponentially while its characteristic scale decays. In the interstellar medium and protogalactic plasmas, the magnetic Prandtl number is very large, so a broad spectrum of growing magnetic fluctuations is excited at small (subviscous) scales. The condition for the onset of nonlinear back reaction depends on the structure of the field lines. We study the statistical correlations that are set up in the field pattern and show that the magnetic-field lines possess a folding structure, where most of the scale decrease is due to the field variation across itself (rapid transverse direction reversals), while the scale of the field variation along itself stays approximately constant. Specifically, we find that, though both the magnetic energy and the mean-square curvature of the field lines grow exponentially, the field strength and the field-line curvature are anticorrelated, i.e., the curved field is relatively weak, while the growing field is relatively flat. The detailed analysis of the statistics of the curvature shows that it possesses a stationary limiting distribution with the bulk located at the values of curvature comparable to the characteristic wave number of the velocity field and a power tail extending to large values of curvature where it is eventually cut off by the resistive regularization. The regions of large curvature, therefore, occupy only a small fraction of the total volume of the system. Our theoretical results are corroborated by direct numerical simulations. The implication of the folding effect is that the advent of the Lorentz back reaction occurs when the magnetic energy approaches that of the smallest turbulent eddies. Our results also directly apply to the problem of statistical geometry of the material lines in a random flow.
NASA Astrophysics Data System (ADS)
Sengupta, A.; Kletzing, C.; Howk, R.; Kurth, W. S.
2017-12-01
An important goal of the Van Allen Probes mission is to understand wave particle interactions that can energize relativistic electron in the Earth's Van Allen radiation belts. The EMFISIS instrumentation suite provides measurements of wave electric and magnetic fields of wave features such as chorus that participate in these interactions. Geometric signal processing discovers structural relationships, e.g. connectivity across ridge-like features in chorus elements to reveal properties such as dominant angles of the element (frequency sweep rate) and integrated power along the a given chorus element. These techniques disambiguate these wave features against background hiss-like chorus. This enables autonomous discovery of chorus elements across the large volumes of EMFISIS data. At the scale of individual or overlapping chorus elements, topological pattern recognition techniques enable interpretation of chorus microstructure by discovering connectivity and other geometric features within the wave signature of a single chorus element or between overlapping chorus elements. Thus chorus wave features can be quantified and studied at multiple scales of spectral geometry using geometric signal processing techniques. We present recently developed computational techniques that exploit spectral geometry of chorus elements and whistlers to enable large-scale automated discovery, detection and statistical analysis of these events over EMFISIS data. Specifically, we present different case studies across a diverse portfolio of chorus elements and discuss the performance of our algorithms regarding precision of detection as well as interpretation of chorus microstructure. We also provide large-scale statistical analysis on the distribution of dominant sweep rates and other properties of the detected chorus elements.
Large-Scale Circulation and Climate Variability. Chapter 5
NASA Technical Reports Server (NTRS)
Perlwitz, J.; Knutson, T.; Kossin, J. P.; LeGrande, A. N.
2017-01-01
The causes of regional climate trends cannot be understood without considering the impact of variations in large-scale atmospheric circulation and an assessment of the role of internally generated climate variability. There are contributions to regional climate trends from changes in large-scale latitudinal circulation, which is generally organized into three cells in each hemisphere-Hadley cell, Ferrell cell and Polar cell-and which determines the location of subtropical dry zones and midlatitude jet streams. These circulation cells are expected to shift poleward during warmer periods, which could result in poleward shifts in precipitation patterns, affecting natural ecosystems, agriculture, and water resources. In addition, regional climate can be strongly affected by non-local responses to recurring patterns (or modes) of variability of the atmospheric circulation or the coupled atmosphere-ocean system. These modes of variability represent preferred spatial patterns and their temporal variation. They account for gross features in variance and for teleconnections which describe climate links between geographically separated regions. Modes of variability are often described as a product of a spatial climate pattern and an associated climate index time series that are identified based on statistical methods like Principal Component Analysis (PC analysis), which is also called Empirical Orthogonal Function Analysis (EOF analysis), and cluster analysis.
NASA Astrophysics Data System (ADS)
Kendall, E. A.; Bhatt, A.
2017-12-01
The Midlatitude Allsky-imaging Network for GeoSpace Observations (MANGO) is a network of imagers filtered at 630 nm spread across the continental United States. MANGO is used to image large-scale airglow and aurora features and observes the generation, propagation, and dissipation of medium and large-scale wave activity in the subauroral, mid and low-latitude thermosphere. This network consists of seven all-sky imagers providing continuous coverage over the United States and extending south into Mexico. This network sees high levels of medium and large scale wave activity due to both neutral and geomagnetic storm forcing. The geomagnetic storm observations largely fall into two categories: Stable Auroral Red (SAR) arcs and Large-scale traveling ionospheric disturbances (LSTIDs). In addition, less-often observed effects include anomalous airglow brightening, bright swirls, and frozen-in traveling structures. We will present an analysis of multiple events observed over four years of MANGO network operation. We will provide both statistics on the cumulative observations and a case study of the "Memorial Day Storm" on May 27, 2017.
DEIVA: a web application for interactive visual analysis of differential gene expression profiles.
Harshbarger, Jayson; Kratz, Anton; Carninci, Piero
2017-01-07
Differential gene expression (DGE) analysis is a technique to identify statistically significant differences in RNA abundance for genes or arbitrary features between different biological states. The result of a DGE test is typically further analyzed using statistical software, spreadsheets or custom ad hoc algorithms. We identified a need for a web-based system to share DGE statistical test results, and locate and identify genes in DGE statistical test results with a very low barrier of entry. We have developed DEIVA, a free and open source, browser-based single page application (SPA) with a strong emphasis on being user friendly that enables locating and identifying single or multiple genes in an immediate, interactive, and intuitive manner. By design, DEIVA scales with very large numbers of users and datasets. Compared to existing software, DEIVA offers a unique combination of design decisions that enable inspection and analysis of DGE statistical test results with an emphasis on ease of use.
Impact of large-scale tides on cosmological distortions via redshift-space power spectrum
NASA Astrophysics Data System (ADS)
Akitsu, Kazuyuki; Takada, Masahiro
2018-03-01
Although large-scale perturbations beyond a finite-volume survey region are not direct observables, these affect measurements of clustering statistics of small-scale (subsurvey) perturbations in large-scale structure, compared with the ensemble average, via the mode-coupling effect. In this paper we show that a large-scale tide induced by scalar perturbations causes apparent anisotropic distortions in the redshift-space power spectrum of galaxies in a way depending on an alignment between the tide, wave vector of small-scale modes and line-of-sight direction. Using the perturbation theory of structure formation, we derive a response function of the redshift-space power spectrum to large-scale tide. We then investigate the impact of large-scale tide on estimation of cosmological distances and the redshift-space distortion parameter via the measured redshift-space power spectrum for a hypothetical large-volume survey, based on the Fisher matrix formalism. To do this, we treat the large-scale tide as a signal, rather than an additional source of the statistical errors, and show that a degradation in the parameter is restored if we can employ the prior on the rms amplitude expected for the standard cold dark matter (CDM) model. We also discuss whether the large-scale tide can be constrained at an accuracy better than the CDM prediction, if the effects up to a larger wave number in the nonlinear regime can be included.
Large-Scale Quantitative Analysis of Painting Arts
Kim, Daniel; Son, Seung-Woo; Jeong, Hawoong
2014-01-01
Scientists have made efforts to understand the beauty of painting art in their own languages. As digital image acquisition of painting arts has made rapid progress, researchers have come to a point where it is possible to perform statistical analysis of a large-scale database of artistic paints to make a bridge between art and science. Using digital image processing techniques, we investigate three quantitative measures of images – the usage of individual colors, the variety of colors, and the roughness of the brightness. We found a difference in color usage between classical paintings and photographs, and a significantly low color variety of the medieval period. Interestingly, moreover, the increment of roughness exponent as painting techniques such as chiaroscuro and sfumato have advanced is consistent with historical circumstances. PMID:25501877
Is There Any Real Observational Contradictoty To The Lcdm Model?
NASA Astrophysics Data System (ADS)
Ma, Yin-Zhe
2011-01-01
In this talk, I am going to question the two apparent observational contradictories to LCDM cosmology---- the lack of large angle correlations in the cosmic microwave background, and the very large bulk flow of galaxy peculiar velocities. On the super-horizon scale, "Copi etal. (2009)” have been arguing that the lack of large angular correlations of the CMB temperature field provides strong evidence against the standard, statistically isotropic, LCDM cosmology. I am going to argue that the "ad-hoc” discrepancy is due to the sub-optimal estimator of the low-l multipoles, and a posteriori statistics, which exaggerates the statistical significance. On Galactic scales, "Watkins et al. (2008)” shows that the very large bulk flow prefers a very large density fluctuation, which seems to contradict to the LCDM model. I am going to show that these results are due to their underestimation of the small scale velocity dispersion, and an arbitrary way of combining catalogues. With the appropriate way of combining catalogue data, as well as the treating the small scale velocity dispersion as a free parameter, the peculiar velocity field provides unconvincing evidence against LCDM cosmology.
Tree-space statistics and approximations for large-scale analysis of anatomical trees.
Feragen, Aasa; Owen, Megan; Petersen, Jens; Wille, Mathilde M W; Thomsen, Laura H; Dirksen, Asger; de Bruijne, Marleen
2013-01-01
Statistical analysis of anatomical trees is hard to perform due to differences in the topological structure of the trees. In this paper we define statistical properties of leaf-labeled anatomical trees with geometric edge attributes by considering the anatomical trees as points in the geometric space of leaf-labeled trees. This tree-space is a geodesic metric space where any two trees are connected by a unique shortest path, which corresponds to a tree deformation. However, tree-space is not a manifold, and the usual strategy of performing statistical analysis in a tangent space and projecting onto tree-space is not available. Using tree-space and its shortest paths, a variety of statistical properties, such as mean, principal component, hypothesis testing and linear discriminant analysis can be defined. For some of these properties it is still an open problem how to compute them; others (like the mean) can be computed, but efficient alternatives are helpful in speeding up algorithms that use means iteratively, like hypothesis testing. In this paper, we take advantage of a very large dataset (N = 8016) to obtain computable approximations, under the assumption that the data trees parametrize the relevant parts of tree-space well. Using the developed approximate statistics, we illustrate how the structure and geometry of airway trees vary across a population and show that airway trees with Chronic Obstructive Pulmonary Disease come from a different distribution in tree-space than healthy ones. Software is available from http://image.diku.dk/aasa/software.php.
Wang, WeiBo; Sun, Wei; Wang, Wei; Szatkiewicz, Jin
2018-03-01
The application of high-throughput sequencing in a broad range of quantitative genomic assays (e.g., DNA-seq, ChIP-seq) has created a high demand for the analysis of large-scale read-count data. Typically, the genome is divided into tiling windows and windowed read-count data is generated for the entire genome from which genomic signals are detected (e.g. copy number changes in DNA-seq, enrichment peaks in ChIP-seq). For accurate analysis of read-count data, many state-of-the-art statistical methods use generalized linear models (GLM) coupled with the negative-binomial (NB) distribution by leveraging its ability for simultaneous bias correction and signal detection. However, although statistically powerful, the GLM+NB method has a quadratic computational complexity and therefore suffers from slow running time when applied to large-scale windowed read-count data. In this study, we aimed to speed up substantially the GLM+NB method by using a randomized algorithm and we demonstrate here the utility of our approach in the application of detecting copy number variants (CNVs) using a real example. We propose an efficient estimator, the randomized GLM+NB coefficients estimator (RGE), for speeding up the GLM+NB method. RGE samples the read-count data and solves the estimation problem on a smaller scale. We first theoretically validated the consistency and the variance properties of RGE. We then applied RGE to GENSENG, a GLM+NB based method for detecting CNVs. We named the resulting method as "R-GENSENG". Based on extensive evaluation using both simulated and empirical data, we concluded that R-GENSENG is ten times faster than the original GENSENG while maintaining GENSENG's accuracy in CNV detection. Our results suggest that RGE strategy developed here could be applied to other GLM+NB based read-count analyses, i.e. ChIP-seq data analysis, to substantially improve their computational efficiency while preserving the analytic power.
The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth
ERIC Educational Resources Information Center
Steyvers, Mark; Tenenbaum, Joshua B.
2005-01-01
We present statistical analyses of the large-scale structure of 3 types of semantic networks: word associations, WordNet, and Roget's Thesaurus. We show that they have a small-world structure, characterized by sparse connectivity, short average path lengths between words, and strong local clustering. In addition, the distributions of the number of…
Utilizing Wavelet Analysis to assess hydrograph change in northwestern North America
NASA Astrophysics Data System (ADS)
Tang, W.; Carey, S. K.
2017-12-01
Historical streamflow data in the mountainous regions of northwestern North America suggest that changes flows are driven by warming temperature, declining snowpack and glacier extent, and large-scale teleconnections. However, few sites exist that have robust long-term records for statistical analysis, and pervious research has focussed on high and low-flow indices along with trend analysis using Mann-Kendal test and other similar approaches. Furthermore, there has been less emphasis on ascertaining the drivers of change in changes in shape of the streamflow hydrograph compared with traditional flow metrics. In this work, we utilize wavelet analysis to evaluate changes in hydrograph characteristics for snowmelt driven rivers in northwestern North America across a range of scales. Results suggest that wavelets can be used to detect a lengthening and advancement of freshet with a corresponding decline in peak flows. Furthermore, the gradual transition of flows from nival to pluvial regimes in more southerly catchments is evident in the wavelet spectral power through time. This method of change detection is challenged by evaluating the statistical significance of changes in wavelet spectra as related to hydrograph form, yet ongoing work seeks to link these patters to driving weather and climate along with larger scale teleconnections.
Unmasking the masked Universe: the 2M++ catalogue through Bayesian eyes
NASA Astrophysics Data System (ADS)
Lavaux, Guilhem; Jasche, Jens
2016-01-01
This work describes a full Bayesian analysis of the Nearby Universe as traced by galaxies of the 2M++ survey. The analysis is run in two sequential steps. The first step self-consistently derives the luminosity-dependent galaxy biases, the power spectrum of matter fluctuations and matter density fields within a Gaussian statistic approximation. The second step makes a detailed analysis of the three-dimensional large-scale structures, assuming a fixed bias model and a fixed cosmology. This second step allows for the reconstruction of both the final density field and the initial conditions at z = 1000 assuming a fixed bias model. From these, we derive fields that self-consistently extrapolate the observed large-scale structures. We give two examples of these extrapolation and their utility for the detection of structures: the visibility of the Sloan Great Wall, and the detection and characterization of the Local Void using DIVA, a Lagrangian based technique to classify structures.
Probing the statistics of primordial fluctuations and their evolution
NASA Technical Reports Server (NTRS)
Gaztanaga, Enrique; Yokoyama, Jun'ichi
1993-01-01
The statistical distribution of fluctuations on various scales is analyzed in terms of the counts in cells of smoothed density fields, using volume-limited samples of galaxy redshift catalogs. It is shown that the distribution on large scales, with volume average of the two-point correlation function of the smoothed field less than about 0.05, is consistent with Gaussian. Statistics are shown to agree remarkably well with the negative binomial distribution, which has hierarchial correlations and a Gaussian behavior at large scales. If these observed properties correspond to the matter distribution, they suggest that our universe started with Gaussian fluctuations and evolved keeping hierarchial form.
MWASTools: an R/bioconductor package for metabolome-wide association studies.
Rodriguez-Martinez, Andrea; Posma, Joram M; Ayala, Rafael; Neves, Ana L; Anwar, Maryam; Petretto, Enrico; Emanueli, Costanza; Gauguier, Dominique; Nicholson, Jeremy K; Dumas, Marc-Emmanuel
2018-03-01
MWASTools is an R package designed to provide an integrated pipeline to analyse metabonomic data in large-scale epidemiological studies. Key functionalities of our package include: quality control analysis; metabolome-wide association analysis using various models (partial correlations, generalized linear models); visualization of statistical outcomes; metabolite assignment using statistical total correlation spectroscopy (STOCSY); and biological interpretation of metabolome-wide association studies results. The MWASTools R package is implemented in R (version > =3.4) and is available from Bioconductor: https://bioconductor.org/packages/MWASTools/. m.dumas@imperial.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
A Study on Mutil-Scale Background Error Covariances in 3D-Var Data Assimilation
NASA Astrophysics Data System (ADS)
Zhang, Xubin; Tan, Zhe-Min
2017-04-01
The construction of background error covariances is a key component of three-dimensional variational data assimilation. There are different scale background errors and interactions among them in the numerical weather Prediction. However, the influence of these errors and their interactions cannot be represented in the background error covariances statistics when estimated by the leading methods. So, it is necessary to construct background error covariances influenced by multi-scale interactions among errors. With the NMC method, this article firstly estimates the background error covariances at given model-resolution scales. And then the information of errors whose scales are larger and smaller than the given ones is introduced respectively, using different nesting techniques, to estimate the corresponding covariances. The comparisons of three background error covariances statistics influenced by information of errors at different scales reveal that, the background error variances enhance particularly at large scales and higher levels when introducing the information of larger-scale errors by the lateral boundary condition provided by a lower-resolution model. On the other hand, the variances reduce at medium scales at the higher levels, while those show slight improvement at lower levels in the nested domain, especially at medium and small scales, when introducing the information of smaller-scale errors by nesting a higher-resolution model. In addition, the introduction of information of larger- (smaller-) scale errors leads to larger (smaller) horizontal and vertical correlation scales of background errors. Considering the multivariate correlations, the Ekman coupling increases (decreases) with the information of larger- (smaller-) scale errors included, whereas the geostrophic coupling in free atmosphere weakens in both situations. The three covariances obtained in above work are used in a data assimilation and model forecast system respectively, and then the analysis-forecast cycles for a period of 1 month are conducted. Through the comparison of both analyses and forecasts from this system, it is found that the trends for variation in analysis increments with information of different scale errors introduced are consistent with those for variation in variances and correlations of background errors. In particular, introduction of smaller-scale errors leads to larger amplitude of analysis increments for winds at medium scales at the height of both high- and low- level jet. And analysis increments for both temperature and humidity are greater at the corresponding scales at middle and upper levels under this circumstance. These analysis increments improve the intensity of jet-convection system which includes jets at different levels and coupling between them associated with latent heat release, and these changes in analyses contribute to the better forecasts for winds and temperature in the corresponding areas. When smaller-scale errors are included, analysis increments for humidity enhance significantly at large scales at lower levels to moisten southern analyses. This humidification devotes to correcting dry bias there and eventually improves forecast skill of humidity. Moreover, inclusion of larger- (smaller-) scale errors is beneficial for forecast quality of heavy (light) precipitation at large (small) scales due to the amplification (diminution) of intensity and area in precipitation forecasts but tends to overestimate (underestimate) light (heavy) precipitation .
NASA Astrophysics Data System (ADS)
Chatterjee, Tanmoy; Peet, Yulia T.
2018-03-01
Length scales of eddies involved in the power generation of infinite wind farms are studied by analyzing the spectra of the turbulent flux of mean kinetic energy (MKE) from large eddy simulations (LES). Large-scale structures with an order of magnitude bigger than the turbine rotor diameter (D ) are shown to have substantial contribution to wind power. Varying dynamics in the intermediate scales (D -10 D ) are also observed from a parametric study involving interturbine distances and hub height of the turbines. Further insight about the eddies responsible for the power generation have been provided from the scaling analysis of two-dimensional premultiplied spectra of MKE flux. The LES code is developed in a high Reynolds number near-wall modeling framework, using an open-source spectral element code Nek5000, and the wind turbines have been modelled using a state-of-the-art actuator line model. The LES of infinite wind farms have been validated against the statistical results from the previous literature. The study is expected to improve our understanding of the complex multiscale dynamics in the domain of large wind farms and identify the length scales that contribute to the power. This information can be useful for design of wind farm layout and turbine placement that take advantage of the large-scale structures contributing to wind turbine power.
Predicting protein functions from redundancies in large-scale protein interaction networks
NASA Technical Reports Server (NTRS)
Samanta, Manoj Pratim; Liang, Shoudan
2003-01-01
Interpreting data from large-scale protein interaction experiments has been a challenging task because of the widespread presence of random false positives. Here, we present a network-based statistical algorithm that overcomes this difficulty and allows us to derive functions of unannotated proteins from large-scale interaction data. Our algorithm uses the insight that if two proteins share significantly larger number of common interaction partners than random, they have close functional associations. Analysis of publicly available data from Saccharomyces cerevisiae reveals >2,800 reliable functional associations, 29% of which involve at least one unannotated protein. By further analyzing these associations, we derive tentative functions for 81 unannotated proteins with high certainty. Our method is not overly sensitive to the false positives present in the data. Even after adding 50% randomly generated interactions to the measured data set, we are able to recover almost all (approximately 89%) of the original associations.
Applications of artificial intelligence systems in the analysis of epidemiological data.
Flouris, Andreas D; Duffy, Jack
2006-01-01
A brief review of the germane literature suggests that the use of artificial intelligence (AI) statistical algorithms in epidemiology has been limited. We discuss the advantages and disadvantages of using AI systems in large-scale sets of epidemiological data to extract inherent, formerly unidentified, and potentially valuable patterns that human-driven deductive models may miss.
Using SQL Databases for Sequence Similarity Searching and Analysis.
Pearson, William R; Mackey, Aaron J
2017-09-13
Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Gravitational lenses and large scale structure
NASA Technical Reports Server (NTRS)
Turner, Edwin L.
1987-01-01
Four possible statistical tests of the large scale distribution of cosmic material are described. Each is based on gravitational lensing effects. The current observational status of these tests is also summarized.
Scaling of the velocity fluctuations in turbulent channels up to Reτ=2003
NASA Astrophysics Data System (ADS)
Hoyas, Sergio; Jiménez, Javier
2006-01-01
A new numerical simulation of a turbulent channel in a large box at Reτ=2003 is described and briefly compared with simulations at lower Reynolds numbers and with experiments. Some of the fluctuation intensities, especially the streamwise velocity, do not scale well in wall units, both near and away from the wall. Spectral analysis traces the near-wall scaling failure to the interaction of the logarithmic layer with the wall. The present statistics can be downloaded from http://torroja.dmt.upm.es/ftp/channels. Further ones will be added to the site as they become available.
NASA Astrophysics Data System (ADS)
Duroure, Christophe; Sy, Abdoulaye; Baray, Jean luc; Van baelen, Joel; Diop, Bouya
2017-04-01
Precipitation plays a key role in the management of sustainable water resources and flood risk analyses. Changes in rainfall will be a critical factor determining the overall impact of climate change. We propose to analyse long series (10 years) of daily precipitation at different regions. We present the Fourier densities energy spectra and morphological spectra (i.e. probability repartition functions of the duration and the horizontal scale) of large precipitating systems. Satellite data from the Global precipitation climatology project (GPCP) and local pluviometers long time series in Senegal and France are used and compared in this work. For mid-latitude and Sahelian regions (North of 12°N), the morphological spectra are close to exponential decreasing distribution. This fact allows to define two characteristic scales (duration and space extension) for the precipitating region embedded into the large meso-scale convective system (MCS). For tropical and equatorial regions (South of 12°N) the morphological spectra are close to a Levy-stable distribution (power law decrease) which does not allow to define a characteristic scale (scaling range). When the time and space characteristic scales are defined, a "statistical velocity" of precipitating MCS can be defined, and compared to observed zonal advection. Maps of the characteristic scales and Levy-stable exponent over West Africa and south Europe are presented. The 12° latitude transition between exponential and Levy-stable behaviors of precipitating MCS is compared with the result of ECMWF ERA-Interim reanalysis for the same period. This morphological sharp transition could be used to test the different parameterizations of deep convection in forecast models.
The changing landscape of astrostatistics and astroinformatics
NASA Astrophysics Data System (ADS)
Feigelson, Eric D.
2017-06-01
The history and current status of the cross-disciplinary fields of astrostatistics and astroinformatics are reviewed. Astronomers need a wide range of statistical methods for both data reduction and science analysis. With the proliferation of high-throughput telescopes, efficient large scale computational methods are also becoming essential. However, astronomers receive only weak training in these fields during their formal education. Interest in the fields is rapidly growing with conferences organized by scholarly societies, textbooks and tutorial workshops, and research studies pushing the frontiers of methodology. R, the premier language of statistical computing, can provide an important software environment for the incorporation of advanced statistical and computational methodology into the astronomical community.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Poidevin, Frédérick; Ade, Peter A. R.; Hargrave, Peter C.
2014-08-10
Turbulence and magnetic fields are expected to be important for regulating molecular cloud formation and evolution. However, their effects on sub-parsec to 100 parsec scales, leading to the formation of starless cores, are not well understood. We investigate the prestellar core structure morphologies obtained from analysis of the Herschel-SPIRE 350 μm maps of the Lupus I cloud. This distribution is first compared on a statistical basis to the large-scale shape of the main filament. We find the distribution of the elongation position angle of the cores to be consistent with a random distribution, which means no specific orientation of themore » morphology of the cores is observed with respect to the mean orientation of the large-scale filament in Lupus I, nor relative to a large-scale bent filament model. This distribution is also compared to the mean orientation of the large-scale magnetic fields probed at 350 μm with the Balloon-borne Large Aperture Telescope for Polarimetry during its 2010 campaign. Here again we do not find any correlation between the core morphology distribution and the average orientation of the magnetic fields on parsec scales. Our main conclusion is that the local filament dynamics—including secondary filaments that often run orthogonally to the primary filament—and possibly small-scale variations in the local magnetic field direction, could be the dominant factors for explaining the final orientation of each core.« less
Park, Junghyun A; Kim, Minki; Yoon, Seokjoon
2016-05-17
Sophisticated anti-fraud systems for the healthcare sector have been built based on several statistical methods. Although existing methods have been developed to detect fraud in the healthcare sector, these algorithms consume considerable time and cost, and lack a theoretical basis to handle large-scale data. Based on mathematical theory, this study proposes a new approach to using Benford's Law in that we closely examined the individual-level data to identify specific fees for in-depth analysis. We extended the mathematical theory to demonstrate the manner in which large-scale data conform to Benford's Law. Then, we empirically tested its applicability using actual large-scale healthcare data from Korea's Health Insurance Review and Assessment (HIRA) National Patient Sample (NPS). For Benford's Law, we considered the mean absolute deviation (MAD) formula to test the large-scale data. We conducted our study on 32 diseases, comprising 25 representative diseases and 7 DRG-regulated diseases. We performed an empirical test on 25 diseases, showing the applicability of Benford's Law to large-scale data in the healthcare industry. For the seven DRG-regulated diseases, we examined the individual-level data to identify specific fees to carry out an in-depth analysis. Among the eight categories of medical costs, we considered the strength of certain irregularities based on the details of each DRG-regulated disease. Using the degree of abnormality, we propose priority action to be taken by government health departments and private insurance institutions to bring unnecessary medical expenses under control. However, when we detect deviations from Benford's Law, relatively high contamination ratios are required at conventional significance levels.
From random microstructures to representative volume elements
NASA Astrophysics Data System (ADS)
Zeman, J.; Šejnoha, M.
2007-06-01
A unified treatment of random microstructures proposed in this contribution opens the way to efficient solutions of large-scale real world problems. The paper introduces a notion of statistically equivalent periodic unit cell (SEPUC) that replaces in a computational step the actual complex geometries on an arbitrary scale. A SEPUC is constructed such that its morphology conforms with images of real microstructures. Here, the appreciated two-point probability function and the lineal path function are employed to classify, from the statistical point of view, the geometrical arrangement of various material systems. Examples of statistically equivalent unit cells constructed for a unidirectional fibre tow, a plain weave textile composite and an irregular-coursed masonry wall are given. A specific result promoting the applicability of the SEPUC as a tool for the derivation of homogenized effective properties that are subsequently used in an independent macroscopic analysis is also presented.
Experimental design and quantitative analysis of microbial community multiomics.
Mallick, Himel; Ma, Siyuan; Franzosa, Eric A; Vatanen, Tommi; Morgan, Xochitl C; Huttenhower, Curtis
2017-11-30
Studies of the microbiome have become increasingly sophisticated, and multiple sequence-based, molecular methods as well as culture-based methods exist for population-scale microbiome profiles. To link the resulting host and microbial data types to human health, several experimental design considerations, data analysis challenges, and statistical epidemiological approaches must be addressed. Here, we survey current best practices for experimental design in microbiome molecular epidemiology, including technologies for generating, analyzing, and integrating microbiome multiomics data. We highlight studies that have identified molecular bioactives that influence human health, and we suggest steps for scaling translational microbiome research to high-throughput target discovery across large populations.
Luo, Li; Zhu, Yun
2012-01-01
Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812
Luo, Li; Zhu, Yun; Xiong, Momiao
2012-06-01
The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.
Lohmann, Gabriele; Stelzer, Johannes; Zuber, Verena; Buschmann, Tilo; Margulies, Daniel; Bartels, Andreas; Scheffler, Klaus
2016-01-01
The formation of transient networks in response to external stimuli or as a reflection of internal cognitive processes is a hallmark of human brain function. However, its identification in fMRI data of the human brain is notoriously difficult. Here we propose a new method of fMRI data analysis that tackles this problem by considering large-scale, task-related synchronisation networks. Networks consist of nodes and edges connecting them, where nodes correspond to voxels in fMRI data, and the weight of an edge is determined via task-related changes in dynamic synchronisation between their respective times series. Based on these definitions, we developed a new data analysis algorithm that identifies edges that show differing levels of synchrony between two distinct task conditions and that occur in dense packs with similar characteristics. Hence, we call this approach “Task-related Edge Density” (TED). TED proved to be a very strong marker for dynamic network formation that easily lends itself to statistical analysis using large scale statistical inference. A major advantage of TED compared to other methods is that it does not depend on any specific hemodynamic response model, and it also does not require a presegmentation of the data for dimensionality reduction as it can handle large networks consisting of tens of thousands of voxels. We applied TED to fMRI data of a fingertapping and an emotion processing task provided by the Human Connectome Project. TED revealed network-based involvement of a large number of brain areas that evaded detection using traditional GLM-based analysis. We show that our proposed method provides an entirely new window into the immense complexity of human brain function. PMID:27341204
Lohmann, Gabriele; Stelzer, Johannes; Zuber, Verena; Buschmann, Tilo; Margulies, Daniel; Bartels, Andreas; Scheffler, Klaus
2016-01-01
The formation of transient networks in response to external stimuli or as a reflection of internal cognitive processes is a hallmark of human brain function. However, its identification in fMRI data of the human brain is notoriously difficult. Here we propose a new method of fMRI data analysis that tackles this problem by considering large-scale, task-related synchronisation networks. Networks consist of nodes and edges connecting them, where nodes correspond to voxels in fMRI data, and the weight of an edge is determined via task-related changes in dynamic synchronisation between their respective times series. Based on these definitions, we developed a new data analysis algorithm that identifies edges that show differing levels of synchrony between two distinct task conditions and that occur in dense packs with similar characteristics. Hence, we call this approach "Task-related Edge Density" (TED). TED proved to be a very strong marker for dynamic network formation that easily lends itself to statistical analysis using large scale statistical inference. A major advantage of TED compared to other methods is that it does not depend on any specific hemodynamic response model, and it also does not require a presegmentation of the data for dimensionality reduction as it can handle large networks consisting of tens of thousands of voxels. We applied TED to fMRI data of a fingertapping and an emotion processing task provided by the Human Connectome Project. TED revealed network-based involvement of a large number of brain areas that evaded detection using traditional GLM-based analysis. We show that our proposed method provides an entirely new window into the immense complexity of human brain function.
Statistics of Magnetic Reconnection X-Lines in Kinetic Turbulence
NASA Astrophysics Data System (ADS)
Haggerty, C. C.; Parashar, T.; Matthaeus, W. H.; Shay, M. A.; Wan, M.; Servidio, S.; Wu, P.
2016-12-01
In this work we examine the statistics of magnetic reconnection (x-lines) and their associated reconnection rates in intermittent current sheets generated in turbulent plasmas. Although such statistics have been studied previously for fluid simulations (e.g. [1]), they have not yet been generalized to fully kinetic particle-in-cell (PIC) simulations. A significant problem with PIC simulations, however, is electrostatic fluctuations generated due to numerical particle counting statistics. We find that analyzing gradients of the magnetic vector potential from the raw PIC field data identifies numerous artificial (or non-physical) x-points. Using small Orszag-Tang vortex PIC simulations, we analyze x-line identification and show that these artificial x-lines can be removed using sub-Debye length filtering of the data. We examine how turbulent properties such as the magnetic spectrum and scale dependent kurtosis are affected by particle noise and sub-Debye length filtering. We subsequently apply these analysis methods to a large scale kinetic PIC turbulent simulation. Consistent with previous fluid models, we find a range of normalized reconnection rates as large as ½ but with the bulk of the rates being approximately less than to 0.1. [1] Servidio, S., W. H. Matthaeus, M. A. Shay, P. A. Cassak, and P. Dmitruk (2009), Magnetic reconnection and two-dimensional magnetohydrodynamic turbulence, Phys. Rev. Lett., 102, 115003.
Lara-Ramírez, Edgar E.; Salazar, Ma Isabel; López-López, María de Jesús; Salas-Benito, Juan Santiago; Sánchez-Varela, Alejandro
2014-01-01
The increasing number of dengue virus (DENV) genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4) has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC) with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3) as well as the effective number of codons (ENC, ENCp) versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA) and clustering analysis on relative synonymous codon usage (RSCU) within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution. PMID:25136631
A space-time multifractal analysis on radar rainfall sequences from central Poland
NASA Astrophysics Data System (ADS)
Licznar, Paweł; Deidda, Roberto
2014-05-01
Rainfall downscaling belongs to most important tasks of modern hydrology. Especially from the perspective of urban hydrology there is real need for development of practical tools for possible rainfall scenarios generation. Rainfall scenarios of fine temporal scale reaching single minutes are indispensable as inputs for hydrological models. Assumption of probabilistic philosophy of drainage systems design and functioning leads to widespread application of hydrodynamic models in engineering practice. However models like these covering large areas could not be supplied with only uncorrelated point-rainfall time series. They should be rather supplied with space time rainfall scenarios displaying statistical properties of local natural rainfall fields. Implementation of a Space-Time Rainfall (STRAIN) model for hydrometeorological applications in Polish conditions, such as rainfall downscaling from the large scales of meteorological models to the scale of interest for rainfall-runoff processes is the long-distance aim of our research. As an introduction part of our study we verify the veracity of the following STRAIN model assumptions: rainfall fields are isotropic and statistically homogeneous in space; self-similarity holds (so that, after having rescaled the time by the advection velocity, rainfall is a fully homogeneous and isotropic process in the space-time domain); statistical properties of rainfall are characterized by an "a priori" known multifractal behavior. We conduct a space-time multifractal analysis on radar rainfall sequences selected from the Polish national radar system POLRAD. Radar rainfall sequences covering the area of 256 km x 256 km of original 2 km x 2 km spatial resolution and 15 minutes temporal resolution are used as study material. Attention is mainly focused on most severe summer convective rainfalls. It is shown that space-time rainfall can be considered with a good approximation to be a self-similar multifractal process. Multifractal analysis is carried out assuming Taylor's hypothesis to hold and the advection velocity needed to rescale the time dimension is assumed to be equal about 16 km/h. This assumption is verified by the analysis of autocorrelation functions along the x and y directions of "rainfall cubes" and along the time axis rescaled with assumed advection velocity. In general for analyzed rainfall sequences scaling is observed for spatial scales ranging from 4 to 256 km and for timescales from 15 min to 16 hours. However in most cases scaling break is identified for spatial scales between 4 and 8, corresponding to spatial dimensions of 16 km to 32 km. It is assumed that the scaling break occurrence at these particular scales in central Poland conditions could be at least partly explained by the rainfall mesoscale gap (on the edge of meso-gamma, storm-scale and meso-beta scale).
The statistical overlap theory of chromatography using power law (fractal) statistics.
Schure, Mark R; Davis, Joe M
2011-12-30
The chromatographic dimensionality was recently proposed as a measure of retention time spacing based on a power law (fractal) distribution. Using this model, a statistical overlap theory (SOT) for chromatographic peaks is developed that estimates the number of peak maxima as a function of the chromatographic dimension, saturation and scale. Power law models exhibit a threshold region whereby below a critical saturation value no loss of peak maxima due to peak fusion occurs as saturation increases. At moderate saturation, behavior is similar to the random (Poisson) peak model. At still higher saturation, the power law model shows loss of peaks nearly independent of the scale and dimension of the model. The physicochemical meaning of the power law scale parameter is discussed and shown to be equal to the Boltzmann-weighted free energy of transfer over the scale limits. The scale is discussed. Small scale range (small β) is shown to generate more uniform chromatograms. Large scale range chromatograms (large β) are shown to give occasional large excursions of retention times; this is a property of power laws where "wild" behavior is noted to occasionally occur. Both cases are shown to be useful depending on the chromatographic saturation. A scale-invariant model of the SOT shows very simple relationships between the fraction of peak maxima and the saturation, peak width and number of theoretical plates. These equations provide much insight into separations which follow power law statistics. Copyright © 2011 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Heus, Thijs; Jonker, Harm J. J.; van den Akker, Harry E. A.; Griffith, Eric J.; Koutek, Michal; Post, Frits H.
2009-03-01
In this study, a new method is developed to investigate the entire life cycle of shallow cumuli in large eddy simulations. Although trained observers have no problem in distinguishing the different life stages of a cloud, this process proves difficult to automate, because cloud-splitting and cloud-merging events complicate the distinction between a single system divided in several cloudy parts and two independent systems that collided. Because the human perception is well equipped to capture and to make sense of these time-dependent three-dimensional features, a combination of automated constraints and human inspection in a three-dimensional virtual reality environment is used to select clouds that are exemplary in their behavior throughout their entire life span. Three specific cases (ARM, BOMEX, and BOMEX without large-scale forcings) are analyzed in this way, and the considerable number of selected clouds warrants reliable statistics of cloud properties conditioned on the phase in their life cycle. The most dominant feature in this statistical life cycle analysis is the pulsating growth that is present throughout the entire lifetime of the cloud, independent of the case and of the large-scale forcings. The pulses are a self-sustained phenomenon, driven by a balance between buoyancy and horizontal convergence of dry air. The convective inhibition just above the cloud base plays a crucial role as a barrier for the cloud to overcome in its infancy stage, and as a buffer region later on, ensuring a steady supply of buoyancy into the cloud.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tessore, Nicolas; Metcalf, R. Benton; Winther, Hans A.
A number of alternatives to general relativity exhibit gravitational screening in the non-linear regime of structure formation. We describe a set of algorithms that can produce weak lensing maps of large scale structure in such theories and can be used to generate mock surveys for cosmological analysis. By analysing a few basic statistics we indicate how these alternatives can be distinguished from general relativity with future weak lensing surveys.
ERIC Educational Resources Information Center
Ojerinde, Dibu; Popoola, Omokunmi; Onyeneho, Patrick; Egberongbe, Aminat
2016-01-01
Statistical procedure used in adjusting test score difficulties on test forms is known as "equating". Equating makes it possible for various test forms to be used interchangeably. In terms of where the equating method fits in the assessment cycle, there are pre-equating and post-equating methods. The major benefits of pre-equating, when…
ERIC Educational Resources Information Center
Rushton, Gregory T.; Rosengrant, David; Dewar, Andrew; Shah, Lisa; Ray, Herman E.; Sheppard, Keith; Watanabe, Lynn
2017-01-01
Efforts to improve the number and quality of the high school physics teaching workforce have taken several forms, including those sponsored by professional organizations. Using a series of large-scale teacher demographic data sets from the National Center for Education Statistics (NCES), this study sought to investigate trends in teacher quality…
Supporting Regularized Logistic Regression Privately and Efficiently.
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
2016-01-01
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.
Validation of a short qualitative food frequency list used in several German large scale surveys.
Winkler, G; Döring, A
1998-09-01
Our study aimed to test the validity of a short, qualitative food frequency list (FFL) used in several German large scale surveys. In the surveys of the MONICA project Augsburg, the FFL was used in randomly selected adults. In 1984/85, a dietary survey with 7-day records (DR) was conducted within the subsample of men aged 45 to 64 (response 70%). The 899 DR were used to validate the FFL. Mean weekly food intake frequency and mean daily food intake were compared and Spearman rank order correlation coefficients and classification into tertiles with values of the statistic Kappa were calculated. Spearman correlations range between 0.15 for the item "Other sweets (candies, compote)" and 0.60 for the items "Curds, yoghurt, sour milk", "Milk including butter milk" and "Mineral water"; values for statistic Kappa vary between 0.04 ("White bread, brown bread, crispbread") and 0.41 ("Flaked oats, muesli, cornflakes" and "milk including butter milk"). With the exception of two items, FFL data can be used for analysis on group level. Analysis on individual level should be done with caution. It seems, as if some food groups are generally easier to ask for in FFL than others.
Supporting Regularized Logistic Regression Privately and Efficiently
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
2016-01-01
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc. PMID:27271738
Numerical study of axial turbulent flow over long cylinders
NASA Technical Reports Server (NTRS)
Neves, J. C.; Moin, P.; Moser, R. D.
1991-01-01
The effects of transverse curvature are investigated by means of direct numerical simulations of turbulent axial flow over cylinders. Two cases of Reynolds number of about 3400 and layer-thickness-to-cylinder-radius ratios of 5 and 11 were simulated. All essential turbulence scales were resolved in both calculations, and a large number of turbulence statistics were computed. The results are compared with the plane channel results of Kim et al. (1987) and with experiments. With transverse curvature the skin friction coefficient increases and the turbulence statistics, when scaled with wall units, are lower than in the plane channel. The momentum equation provides a scaling that collapses the cylinder statistics, and allows the results to be interpreted in light of the plane channel flow. The azimuthal and radial length scales of the structures in the flow are of the order of the cylinder diameter. Boomerang-shaped structures with large spanwise length scales were observed in the flow.
Statistical characterization of short wind waves from stereo images of the sea surface
NASA Astrophysics Data System (ADS)
Mironov, Alexey; Yurovskaya, Maria; Dulov, Vladimir; Hauser, Danièle; Guérin, Charles-Antoine
2013-04-01
We propose a methodology to extract short-scale statistical characteristics of the sea surface topography by means of stereo image reconstruction. The possibilities and limitations of the technique are discussed and tested on a data set acquired from an oceanographic platform at the Black Sea. The analysis shows that reconstruction of the topography based on stereo method is an efficient way to derive non-trivial statistical properties of surface short- and intermediate-waves (say from 1 centimer to 1 meter). Most technical issues pertaining to this type of datasets (limited range of scales, lacunarity of data or irregular sampling) can be partially overcome by appropriate processing of the available points. The proposed technique also allows one to avoid linear interpolation which dramatically corrupts properties of retrieved surfaces. The processing technique imposes that the field of elevation be polynomially detrended, which has the effect of filtering out the large scales. Hence the statistical analysis can only address the small-scale components of the sea surface. The precise cut-off wavelength, which is approximatively half the patch size, can be obtained by applying a high-pass frequency filter on the reference gauge time records. The results obtained for the one- and two-points statistics of small-scale elevations are shown consistent, at least in order of magnitude, with the corresponding gauge measurements as well as other experimental measurements available in the literature. The calculation of the structure functions provides a powerful tool to investigate spectral and statistical properties of the field of elevations. Experimental parametrization of the third-order structure function, the so-called skewness function, is one of the most important and original outcomes of this study. This function is of primary importance in analytical scattering models from the sea surface and was up to now unavailable in field conditions. Due to the lack of precise reference measurements for the small-scale wave field, we could not quantify exactly the accuracy of the retrieval technique. However, it appeared clearly that the obtained accuracy is good enough for the estimation of second-order statistical quantities (such as the correlation function), acceptable for third-order quantities (such as the skwewness function) and insufficient for fourth-order quantities (such as kurtosis). Therefore, the stereo technique in the present stage should not be thought as a self-contained universal tool to characterize the surface statistics. Instead, it should be used in conjunction with other well calibrated but sparse reference measurement (such as wave gauges) for cross-validation and calibration. It then completes the statistical analysis in as much as it provides a snapshot of the three-dimensional field and allows for the evaluation of higher-order spatial statistics.
Genomic analysis of regulatory network dynamics reveals large topological changes
NASA Astrophysics Data System (ADS)
Luscombe, Nicholas M.; Madan Babu, M.; Yu, Haiyuan; Snyder, Michael; Teichmann, Sarah A.; Gerstein, Mark
2004-09-01
Network analysis has been applied widely, providing a unifying language to describe disparate systems ranging from social interactions to power grids. It has recently been used in molecular biology, but so far the resulting networks have only been analysed statically. Here we present the dynamics of a biological network on a genomic scale, by integrating transcriptional regulatory information and gene-expression data for multiple conditions in Saccharomyces cerevisiae. We develop an approach for the statistical analysis of network dynamics, called SANDY, combining well-known global topological measures, local motifs and newly derived statistics. We uncover large changes in underlying network architecture that are unexpected given current viewpoints and random simulations. In response to diverse stimuli, transcription factors alter their interactions to varying degrees, thereby rewiring the network. A few transcription factors serve as permanent hubs, but most act transiently only during certain conditions. By studying sub-network structures, we show that environmental responses facilitate fast signal propagation (for example, with short regulatory cascades), whereas the cell cycle and sporulation direct temporal progression through multiple stages (for example, with highly inter-connected transcription factors). Indeed, to drive the latter processes forward, phase-specific transcription factors inter-regulate serially, and ubiquitously active transcription factors layer above them in a two-tiered hierarchy. We anticipate that many of the concepts presented here-particularly the large-scale topological changes and hub transience-will apply to other biological networks, including complex sub-systems in higher eukaryotes.
Temporal scaling and spatial statistical analyses of groundwater level fluctuations
NASA Astrophysics Data System (ADS)
Sun, H.; Yuan, L., Sr.; Zhang, Y.
2017-12-01
Natural dynamics such as groundwater level fluctuations can exhibit multifractionality and/or multifractality due likely to multi-scale aquifer heterogeneity and controlling factors, whose statistics requires efficient quantification methods. This study explores multifractionality and non-Gaussian properties in groundwater dynamics expressed by time series of daily level fluctuation at three wells located in the lower Mississippi valley, after removing the seasonal cycle in the temporal scaling and spatial statistical analysis. First, using the time-scale multifractional analysis, a systematic statistical method is developed to analyze groundwater level fluctuations quantified by the time-scale local Hurst exponent (TS-LHE). Results show that the TS-LHE does not remain constant, implying the fractal-scaling behavior changing with time and location. Hence, we can distinguish the potentially location-dependent scaling feature, which may characterize the hydrology dynamic system. Second, spatial statistical analysis shows that the increment of groundwater level fluctuations exhibits a heavy tailed, non-Gaussian distribution, which can be better quantified by a Lévy stable distribution. Monte Carlo simulations of the fluctuation process also show that the linear fractional stable motion model can well depict the transient dynamics (i.e., fractal non-Gaussian property) of groundwater level, while fractional Brownian motion is inadequate to describe natural processes with anomalous dynamics. Analysis of temporal scaling and spatial statistics therefore may provide useful information and quantification to understand further the nature of complex dynamics in hydrology.
Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kleijnen, J.P.C.; Helton, J.C.
1999-04-01
The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are consideredmore » for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.« less
Aćimović, Jugoslava; Mäki-Marttunen, Tuomo; Linne, Marja-Leena
2015-01-01
We developed a two-level statistical model that addresses the question of how properties of neurite morphology shape the large-scale network connectivity. We adopted a low-dimensional statistical description of neurites. From the neurite model description we derived the expected number of synapses, node degree, and the effective radius, the maximal distance between two neurons expected to form at least one synapse. We related these quantities to the network connectivity described using standard measures from graph theory, such as motif counts, clustering coefficient, minimal path length, and small-world coefficient. These measures are used in a neuroscience context to study phenomena from synaptic connectivity in the small neuronal networks to large scale functional connectivity in the cortex. For these measures we provide analytical solutions that clearly relate different model properties. Neurites that sparsely cover space lead to a small effective radius. If the effective radius is small compared to the overall neuron size the obtained networks share similarities with the uniform random networks as each neuron connects to a small number of distant neurons. Large neurites with densely packed branches lead to a large effective radius. If this effective radius is large compared to the neuron size, the obtained networks have many local connections. In between these extremes, the networks maximize the variability of connection repertoires. The presented approach connects the properties of neuron morphology with large scale network properties without requiring heavy simulations with many model parameters. The two-steps procedure provides an easier interpretation of the role of each modeled parameter. The model is flexible and each of its components can be further expanded. We identified a range of model parameters that maximizes variability in network connectivity, the property that might affect network capacity to exhibit different dynamical regimes.
Lagrangian statistics of mesoscale turbulence in a natural environment: The Agulhas return current.
Carbone, Francesco; Gencarelli, Christian N; Hedgecock, Ian M
2016-12-01
The properties of mesoscale geophysical turbulence in an oceanic environment have been investigated through the Lagrangian statistics of sea surface temperature measured by a drifting buoy within the Agulhas return current, where strong temperature mixing produces locally sharp temperature gradients. By disentangling the large-scale forcing which affects the small-scale statistics, we found that the statistical properties of intermittency are identical to those obtained from the multifractal prediction in the Lagrangian frame for the velocity trajectory. The results suggest a possible universality of turbulence scaling.
NASA Astrophysics Data System (ADS)
Straus, D. M.
2006-12-01
The transitions between portions of the state space of the large-scale flow is studied from daily wintertime data over the Pacific North America region using the NCEP reanalysis data set (54 winters) and very large suites of hindcasts made with the COLA atmospheric GCM with observed SST (55 members for each of 18 winters). The partition of the large-scale state space is guided by cluster analysis, whose statistical significance and relationship to SST is reviewed (Straus and Molteni, 2004; Straus, Corti and Molteni, 2006). The determination of the global nature of the flow through state space is studied using Markov Chains (Crommelin, 2004). In particular the non-diffusive part of the flow is contrasted in nature (small data sample) and the AGCM (large data sample). The intrinsic error growth associated with different portions of the state space is studied through sets of identical twin AGCM simulations. The goal is to obtain realistic estimates of predictability times for large-scale transitions that should be useful in long-range forecasting.
NASA Astrophysics Data System (ADS)
Sogaro, Francesca; Poole, Robert; Dennis, David
2014-11-01
High-speed stereoscopic particle image velocimetry has been performed in fully developed turbulent pipe flow at moderate Reynolds numbers with and without a drag-reducing additive (an aqueous solution of high molecular weight polyacrylamide). Three-dimensional large and very large-scale motions (LSM and VLSM) are extracted from the flow fields by a detection algorithm and the characteristics for each case are statistically compared. The results show that the three-dimensional extent of VLSMs in drag reduced (DR) flow appears to increase significantly compared to their Newtonian counterparts. A statistical increase in azimuthal extent of DR VLSM is observed by means of two-point spatial autocorrelation of the streamwise velocity fluctuation in the radial-azimuthal plane. Furthermore, a remarkable increase in length of these structures is observed by three-dimensional two-point spatial autocorrelation. These results are accompanied by an analysis of the swirling strength in the flow field that shows a significant reduction in strength and number of the vortices for the DR flow. The findings suggest that the damping of the small scales due to polymer addition results in the undisturbed development of longer flow structures.
Statistics of velocity fluctuations of Geldart A particles in a circulating fluidized bed riser
Vaidheeswaran, Avinash; Shaffer, Franklin; Gopalan, Balaji
2017-11-21
Here, the statistics of fluctuating velocity components are studied in the riser of a closed-loop circulating fluidized bed with fluid catalytic cracking catalyst particles. Our analysis shows distinct similarities as well as deviations compared to existing theories and bench-scale experiments. The study confirms anisotropic and non-Maxwellian distribution of fluctuating velocity components. The velocity distribution functions (VDFs) corresponding to transverse fluctuations exhibit symmetry, and follow a stretched-exponential behavior up to three standard deviations. The form of the transverse VDF is largely determined by interparticle interactions. The tails become more overpopulated with an increase in particle loading. The observed deviations from themore » Gaussian distribution are represented using the leading order term in the Sonine expansion, which is commonly used to approximate the VDFs in kinetic theory for granular flows. The vertical fluctuating VDFs are asymmetric and the skewness shifts as the wall is approached. In comparison to transverse fluctuations, the vertical VDF is determined by the local hydrodynamics. This is an observation of particle velocity fluctuations in a large-scale system and their quantitative comparison with the Maxwell-Boltzmann statistics.« less
Proteomics wants cRacker: automated standardized data analysis of LC-MS derived proteomic data.
Zauber, Henrik; Schulze, Waltraud X
2012-11-02
The large-scale analysis of thousands of proteins under various experimental conditions or in mutant lines has gained more and more importance in hypothesis-driven scientific research and systems biology in the past years. Quantitative analysis by large scale proteomics using modern mass spectrometry usually results in long lists of peptide ion intensities. The main interest for most researchers, however, is to draw conclusions on the protein level. Postprocessing and combining peptide intensities of a proteomic data set requires expert knowledge, and the often repetitive and standardized manual calculations can be time-consuming. The analysis of complex samples can result in very large data sets (lists with several 1000s to 100,000 entries of different peptides) that cannot easily be analyzed using standard spreadsheet programs. To improve speed and consistency of the data analysis of LC-MS derived proteomic data, we developed cRacker. cRacker is an R-based program for automated downstream proteomic data analysis including data normalization strategies for metabolic labeling and label free quantitation. In addition, cRacker includes basic statistical analysis, such as clustering of data, or ANOVA and t tests for comparison between treatments. Results are presented in editable graphic formats and in list files.
NASA Technical Reports Server (NTRS)
Geller, Margaret J.; Huchra, J. P.
1991-01-01
Present-day understanding of the large-scale galaxy distribution is reviewed. The statistics of the CfA redshift survey are briefly discussed. The need for deeper surveys to clarify the issues raised by recent studies of large-scale galactic distribution is addressed.
NASA Astrophysics Data System (ADS)
Galich, N. E.
A novel nonlinear statistical method of immunofluorescence data analysis is presented. The data of DNA fluorescence due to oxidative activity in neutrophils nuclei of peripheral blood is analyzed. Histograms of photon counts statistics are generated using flow cytometry method. The histograms represent the distributions of fluorescence flash frequency as functions of intensity for large populations∼104-105 of fluorescing cells. We have shown that these experiments present 3D-correlations of oxidative activity of DNA for full chromosomes set in cells with spatial resolution of measurements is about few nanometers in the flow direction the jet of blood. Detailed analysis showed that large-scale correlations in oxidative activity of DNA in cells are described as networks of small- worlds (complex systems with logarithmic scaling) with self own small-world networks for given donor at given time for all states of health. We observed changes in fractal networks of oxidative activity of DNA in neutrophils in vivo and during medical treatments for classification and diagnostics of pathologies for wide spectra of diseases. Our approach based on analysis of changes topology of networks (fractal dimension) at variation the scales of networks. We produce the general estimation of health status of a given donor in a form of yes/no of answers (healthy/sick) in the dependence on the sign of plus/minus in the trends change of fractal dimensions due to decreasing the scale of nets. We had noted the increasing biodiversity of neutrophils and stochastic (Brownian) character of intercellular correlations of different neutrophils in the blood of healthy donor. In the blood of sick people we observed the deterministic cell-cell correlations of neutrophils and decreasing their biodiversity.
Topology Analysis of the Sloan Digital Sky Survey. I. Scale and Luminosity Dependence
NASA Astrophysics Data System (ADS)
Park, Changbom; Choi, Yun-Young; Vogeley, Michael S.; Gott, J. Richard, III; Kim, Juhan; Hikage, Chiaki; Matsubara, Takahiko; Park, Myeong-Gu; Suto, Yasushi; Weinberg, David H.; SDSS Collaboration
2005-11-01
We measure the topology of volume-limited galaxy samples selected from a parent sample of 314,050 galaxies in the Sloan Digital Sky Survey (SDSS), which is now complete enough to describe the fully three-dimensional topology and its dependence on galaxy properties. We compare the observed genus statistic G(νf) to predictions for a Gaussian random field and to the genus measured for mock surveys constructed from new large-volume simulations of the ΛCDM cosmology. In this analysis we carefully examine the dependence of the observed genus statistic on the Gaussian smoothing scale RG from 3.5 to 11 h-1 Mpc and on the luminosity of galaxies over the range -22.50
Detection of crossover time scales in multifractal detrended fluctuation analysis
NASA Astrophysics Data System (ADS)
Ge, Erjia; Leung, Yee
2013-04-01
Fractal is employed in this paper as a scale-based method for the identification of the scaling behavior of time series. Many spatial and temporal processes exhibiting complex multi(mono)-scaling behaviors are fractals. One of the important concepts in fractals is crossover time scale(s) that separates distinct regimes having different fractal scaling behaviors. A common method is multifractal detrended fluctuation analysis (MF-DFA). The detection of crossover time scale(s) is, however, relatively subjective since it has been made without rigorous statistical procedures and has generally been determined by eye balling or subjective observation. Crossover time scales such determined may be spurious and problematic. It may not reflect the genuine underlying scaling behavior of a time series. The purpose of this paper is to propose a statistical procedure to model complex fractal scaling behaviors and reliably identify the crossover time scales under MF-DFA. The scaling-identification regression model, grounded on a solid statistical foundation, is first proposed to describe multi-scaling behaviors of fractals. Through the regression analysis and statistical inference, we can (1) identify the crossover time scales that cannot be detected by eye-balling observation, (2) determine the number and locations of the genuine crossover time scales, (3) give confidence intervals for the crossover time scales, and (4) establish the statistically significant regression model depicting the underlying scaling behavior of a time series. To substantive our argument, the regression model is applied to analyze the multi-scaling behaviors of avian-influenza outbreaks, water consumption, daily mean temperature, and rainfall of Hong Kong. Through the proposed model, we can have a deeper understanding of fractals in general and a statistical approach to identify multi-scaling behavior under MF-DFA in particular.
Wang, Yikai; Kang, Jian; Kemmer, Phebe B.; Guo, Ying
2016-01-01
Currently, network-oriented analysis of fMRI data has become an important tool for understanding brain organization and brain networks. Among the range of network modeling methods, partial correlation has shown great promises in accurately detecting true brain network connections. However, the application of partial correlation in investigating brain connectivity, especially in large-scale brain networks, has been limited so far due to the technical challenges in its estimation. In this paper, we propose an efficient and reliable statistical method for estimating partial correlation in large-scale brain network modeling. Our method derives partial correlation based on the precision matrix estimated via Constrained L1-minimization Approach (CLIME), which is a recently developed statistical method that is more efficient and demonstrates better performance than the existing methods. To help select an appropriate tuning parameter for sparsity control in the network estimation, we propose a new Dens-based selection method that provides a more informative and flexible tool to allow the users to select the tuning parameter based on the desired sparsity level. Another appealing feature of the Dens-based method is that it is much faster than the existing methods, which provides an important advantage in neuroimaging applications. Simulation studies show that the Dens-based method demonstrates comparable or better performance with respect to the existing methods in network estimation. We applied the proposed partial correlation method to investigate resting state functional connectivity using rs-fMRI data from the Philadelphia Neurodevelopmental Cohort (PNC) study. Our results show that partial correlation analysis removed considerable between-module marginal connections identified by full correlation analysis, suggesting these connections were likely caused by global effects or common connection to other nodes. Based on partial correlation, we find that the most significant direct connections are between homologous brain locations in the left and right hemisphere. When comparing partial correlation derived under different sparse tuning parameters, an important finding is that the sparse regularization has more shrinkage effects on negative functional connections than on positive connections, which supports previous findings that many of the negative brain connections are due to non-neurophysiological effects. An R package “DensParcorr” can be downloaded from CRAN for implementing the proposed statistical methods. PMID:27242395
Wang, Yikai; Kang, Jian; Kemmer, Phebe B; Guo, Ying
2016-01-01
Currently, network-oriented analysis of fMRI data has become an important tool for understanding brain organization and brain networks. Among the range of network modeling methods, partial correlation has shown great promises in accurately detecting true brain network connections. However, the application of partial correlation in investigating brain connectivity, especially in large-scale brain networks, has been limited so far due to the technical challenges in its estimation. In this paper, we propose an efficient and reliable statistical method for estimating partial correlation in large-scale brain network modeling. Our method derives partial correlation based on the precision matrix estimated via Constrained L1-minimization Approach (CLIME), which is a recently developed statistical method that is more efficient and demonstrates better performance than the existing methods. To help select an appropriate tuning parameter for sparsity control in the network estimation, we propose a new Dens-based selection method that provides a more informative and flexible tool to allow the users to select the tuning parameter based on the desired sparsity level. Another appealing feature of the Dens-based method is that it is much faster than the existing methods, which provides an important advantage in neuroimaging applications. Simulation studies show that the Dens-based method demonstrates comparable or better performance with respect to the existing methods in network estimation. We applied the proposed partial correlation method to investigate resting state functional connectivity using rs-fMRI data from the Philadelphia Neurodevelopmental Cohort (PNC) study. Our results show that partial correlation analysis removed considerable between-module marginal connections identified by full correlation analysis, suggesting these connections were likely caused by global effects or common connection to other nodes. Based on partial correlation, we find that the most significant direct connections are between homologous brain locations in the left and right hemisphere. When comparing partial correlation derived under different sparse tuning parameters, an important finding is that the sparse regularization has more shrinkage effects on negative functional connections than on positive connections, which supports previous findings that many of the negative brain connections are due to non-neurophysiological effects. An R package "DensParcorr" can be downloaded from CRAN for implementing the proposed statistical methods.
Remote visual analysis of large turbulence databases at multiple scales
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pulido, Jesus; Livescu, Daniel; Kanov, Kalin
The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less
Remote visual analysis of large turbulence databases at multiple scales
Pulido, Jesus; Livescu, Daniel; Kanov, Kalin; ...
2018-06-15
The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less
Single-trabecula building block for large-scale finite element models of cancellous bone.
Dagan, D; Be'ery, M; Gefen, A
2004-07-01
Recent development of high-resolution imaging of cancellous bone allows finite element (FE) analysis of bone tissue stresses and strains in individual trabeculae. However, specimen-specific stress/strain analyses can include effects of anatomical variations and local damage that can bias the interpretation of the results from individual specimens with respect to large populations. This study developed a standard (generic) 'building-block' of a trabecula for large-scale FE models. Being parametric and based on statistics of dimensions of ovine trabeculae, this building block can be scaled for trabecular thickness and length and be used in commercial or custom-made FE codes to construct generic, large-scale FE models of bone, using less computer power than that currently required to reproduce the accurate micro-architecture of trabecular bone. Orthogonal lattices constructed with this building block, after it was scaled to trabeculae of the human proximal femur, provided apparent elastic moduli of approximately 150 MPa, in good agreement with experimental data for the stiffness of cancellous bone from this site. Likewise, lattices with thinner, osteoporotic-like trabeculae could predict a reduction of approximately 30% in the apparent elastic modulus, as reported in experimental studies of osteoporotic femora. Based on these comparisons, it is concluded that the single-trabecula element developed in the present study is well-suited for representing cancellous bone in large-scale generic FE simulations.
A new ionospheric storm scale based on TEC and foF2 statistics
NASA Astrophysics Data System (ADS)
Nishioka, Michi; Tsugawa, Takuya; Jin, Hidekatsu; Ishii, Mamoru
2017-01-01
In this paper, we propose the I-scale, a new ionospheric storm scale for general users in various regions in the world. With the I-scale, ionospheric storms can be classified at any season, local time, and location. Since the ionospheric condition largely depends on many factors such as solar irradiance, energy input from the magnetosphere, and lower atmospheric activity, it had been difficult to scale ionospheric storms, which are mainly caused by solar and geomagnetic activities. In this study, statistical analysis was carried out for total electron content (TEC) and F2 layer critical frequency (foF2) in Japan for 18 years from 1997 to 2014. Seasonal, local time, and latitudinal dependences of TEC and foF2 variabilities are excluded by normalizing each percentage variation using their statistical standard deviations. The I-scale is defined by setting thresholds to the normalized numbers to seven categories: I0, IP1, IP2, IP3, IN1, IN2, and IN3. I0 represents a quiet state, and IP1 (IN1), IP2 (IN2), and IP3 (IN3) represent moderate, strong, and severe positive (negative) storms, respectively. The proposed I-scale can be used for other locations, such as polar and equatorial regions. It is considered that the proposed I-scale can be a standardized scale to help the users to assess the impact of space weather on their systems.
Lee, Ellen E; Della Selva, Megan P; Liu, Anson; Himelhoch, Seth
2015-01-01
Given the significant disability, morbidity and mortality associated with depression, the promising recent trials of ketamine highlight a novel intervention. A meta-analysis was conducted to assess the efficacy of ketamine in comparison with placebo for the reduction of depressive symptoms in patients who meet criteria for a major depressive episode. Two electronic databases were searched in September 2013 for English-language studies that were randomized placebo-controlled trials of ketamine treatment for patients with major depressive disorder or bipolar depression and utilized a standardized rating scale. Studies including participants receiving electroconvulsive therapy and adolescent/child participants were excluded. Five studies were included in the quantitative meta-analysis. The quantitative meta-analysis showed that ketamine significantly reduced depressive symptoms. The overall effect size at day 1 was large and statistically significant with an overall standardized mean difference of 1.01 (95% confidence interval 0.69-1.34) (P<.001), with the effects sustained at 7 days postinfusion. The heterogeneity of the studies was low and not statistically significant, and the funnel plot showed no publication bias. The large and statistically significant effect of ketamine on depressive symptoms supports a promising, new and effective pharmacotherapy with rapid onset, high efficacy and good tolerability. Copyright © 2015. Published by Elsevier Inc.
Surface shape analysis with an application to brain surface asymmetry in schizophrenia.
Brignell, Christopher J; Dryden, Ian L; Gattone, S Antonio; Park, Bert; Leask, Stuart; Browne, William J; Flynn, Sean
2010-10-01
Some methods for the statistical analysis of surface shapes and asymmetry are introduced. We focus on a case study where magnetic resonance images of the brain are available from groups of 30 schizophrenia patients and 38 controls, and we investigate large-scale brain surface shape differences. Key aspects of shape analysis are to remove nuisance transformations by registration and to identify which parts of one object correspond with the parts of another object. We introduce maximum likelihood and Bayesian methods for registering brain images and providing large-scale correspondences of the brain surfaces. Brain surface size-and-shape analysis is considered using random field theory, and also dimension reduction is carried out using principal and independent components analysis. Some small but significant differences are observed between the the patient and control groups. We then investigate a particular type of asymmetry called torque. Differences in asymmetry are observed between the control and patient groups, which add strength to other observations in the literature. Further investigations of the midline plane location in the 2 groups and the fitting of nonplanar curved midlines are also considered.
Complex Analysis of Combat in Afghanistan
2014-12-01
analysis we have β−ffE ~)( where β= 2H - 1 = 1 - γ, with H being the Hurst exponent , related to the correlation exponent γ. Usually, real-world data are...statistical nature. In every instance we found strong power law correlations in the data, and were able to extract accurate scaling exponents . On the... exponents , α. The case αɘ.5 corresponds to long-term anti-correlations, meaning that large values are most likely to be followed by small values and
NASA Astrophysics Data System (ADS)
Tiselj, Iztok
2014-12-01
Channel flow DNS (Direct Numerical Simulation) at friction Reynolds number 180 and with passive scalars of Prandtl numbers 1 and 0.01 was performed in various computational domains. The "normal" size domain was ˜2300 wall units long and ˜750 wall units wide; size taken from the similar DNS of Moser et al. The "large" computational domain, which is supposed to be sufficient to describe the largest structures of the turbulent flows was 3 times longer and 3 times wider than the "normal" domain. The "very large" domain was 6 times longer and 6 times wider than the "normal" domain. All simulations were performed with the same spatial and temporal resolution. Comparison of the standard and large computational domains shows the velocity field statistics (mean velocity, root-mean-square (RMS) fluctuations, and turbulent Reynolds stresses) that are within 1%-2%. Similar agreement is observed for Pr = 1 temperature fields and can be observed also for the mean temperature profiles at Pr = 0.01. These differences can be attributed to the statistical uncertainties of the DNS. However, second-order moments, i.e., RMS temperature fluctuations of standard and large computational domains at Pr = 0.01 show significant differences of up to 20%. Stronger temperature fluctuations in the "large" and "very large" domains confirm the existence of the large-scale structures. Their influence is more or less invisible in the main velocity field statistics or in the statistics of the temperature fields at Prandtl numbers around 1. However, these structures play visible role in the temperature fluctuations at low Prandtl number, where high temperature diffusivity effectively smears the small-scale structures in the thermal field and enhances the relative contribution of large-scales. These large thermal structures represent some kind of an echo of the large scale velocity structures: the highest temperature-velocity correlations are not observed between the instantaneous temperatures and instantaneous streamwise velocities, but between the instantaneous temperatures and velocities averaged over certain time interval.
Cascades of energy and helicity in axisymmetric turbulence
NASA Astrophysics Data System (ADS)
Qu, Bo; Naso, Aurore; Bos, Wouter J. T.
2018-01-01
A spectral analysis of strictly axisymmetric turbulence is performed. Both freely decaying and statistically steady flows are considered. In helical flows we identify a dual cascade, where energy is transferred towards the large scales and helicity to the smallest ones. It is shown that even in the absence of net helicity, a dual cascade persists, transferring energy backward and positively and negatively polarized helicity fluctuations forward.
ERIC Educational Resources Information Center
Rossion, Bruno; Hanseeuw, Bernard; Dricot, Laurence
2012-01-01
A number of human brain areas showing a larger response to faces than to objects from different categories, or to scrambled faces, have been identified in neuroimaging studies. Depending on the statistical criteria used, the set of areas can be overextended or minimized, both at the local (size of areas) and global (number of areas) levels. Here…
Tsugawa, Hiroshi; Arita, Masanori; Kanazawa, Mitsuhiro; Ogiwara, Atsushi; Bamba, Takeshi; Fukusaki, Eiichiro
2013-05-21
We developed a new software program, MRMPROBS, for widely targeted metabolomics by using the large-scale multiple reaction monitoring (MRM) mode. The strategy became increasingly popular for the simultaneous analysis of up to several hundred metabolites at high sensitivity, selectivity, and quantitative capability. However, the traditional method of assessing measured metabolomics data without probabilistic criteria is not only time-consuming but is often subjective and makeshift work. Our program overcomes these problems by detecting and identifying metabolites automatically, by separating isomeric metabolites, and by removing background noise using a probabilistic score defined as the odds ratio from an optimized multivariate logistic regression model. Our software program also provides a user-friendly graphical interface to curate and organize data matrices and to apply principal component analyses and statistical tests. For a demonstration, we conducted a widely targeted metabolome analysis (152 metabolites) of propagating Saccharomyces cerevisiae measured at 15 time points by gas and liquid chromatography coupled to triple quadrupole mass spectrometry. MRMPROBS is a useful and practical tool for the assessment of large-scale MRM data available to any instrument or any experimental condition.
Linking crop yield anomalies to large-scale atmospheric circulation in Europe.
Ceglar, Andrej; Turco, Marco; Toreti, Andrea; Doblas-Reyes, Francisco J
2017-06-15
Understanding the effects of climate variability and extremes on crop growth and development represents a necessary step to assess the resilience of agricultural systems to changing climate conditions. This study investigates the links between the large-scale atmospheric circulation and crop yields in Europe, providing the basis to develop seasonal crop yield forecasting and thus enabling a more effective and dynamic adaptation to climate variability and change. Four dominant modes of large-scale atmospheric variability have been used: North Atlantic Oscillation, Eastern Atlantic, Scandinavian and Eastern Atlantic-Western Russia patterns. Large-scale atmospheric circulation explains on average 43% of inter-annual winter wheat yield variability, ranging between 20% and 70% across countries. As for grain maize, the average explained variability is 38%, ranging between 20% and 58%. Spatially, the skill of the developed statistical models strongly depends on the large-scale atmospheric variability impact on weather at the regional level, especially during the most sensitive growth stages of flowering and grain filling. Our results also suggest that preceding atmospheric conditions might provide an important source of predictability especially for maize yields in south-eastern Europe. Since the seasonal predictability of large-scale atmospheric patterns is generally higher than the one of surface weather variables (e.g. precipitation) in Europe, seasonal crop yield prediction could benefit from the integration of derived statistical models exploiting the dynamical seasonal forecast of large-scale atmospheric circulation.
Analysis of the fluctuations of the tumour/host interface
NASA Astrophysics Data System (ADS)
Milotti, Edoardo; Vyshemirsky, Vladislav; Stella, Sabrina; Dogo, Federico; Chignola, Roberto
2017-11-01
In a recent analysis of metabolic scaling in solid tumours we found a scaling law that interpolates between the power laws μ ∝ V and μ ∝V 2 / 3, where μ is the metabolic rate expressed as the glucose absorption rate and V is the tumour volume. The scaling law fits quite well both in vitro and in vivo data, however we also observed marked fluctuations that are associated with the specific biological properties of individual tumours. Here we analyse these fluctuations, in an attempt to find the population-wide distribution of an important parameter (A) which expresses the total extent of the interface between the solid tumour and the non-cancerous environment. Heuristic considerations suggest that the values of the A parameter follow a lognormal distribution, and, allowing for the large uncertainties of the experimental data, our statistical analysis confirms this.
Scale growth of structures in the turbulent boundary layer with a rod-roughened wall
NASA Astrophysics Data System (ADS)
Lee, Jin; Kim, Jung Hoon; Lee, Jae Hwa
2016-01-01
Direct numerical simulation of a turbulent boundary layer over a rod-roughened wall is performed with a long streamwise domain to examine the streamwise-scale growth mechanism of streamwise velocity fluctuating structures in the presence of two-dimensional (2-D) surface roughness. An instantaneous analysis shows that there is a slightly larger population of long structures with a small helix angle (spanwise inclinations relative to streamwise) and a large spanwise width over the rough-wall compared to that over a smooth-wall. Further inspection of time-evolving instantaneous fields clearly exhibits that adjacent long structures combine to form a longer structure through a spanwise merging process over the rough-wall; moreover, spanwise merging for streamwise scale growth is expected to occur frequently over the rough-wall due to the large spanwise scales generated by the 2-D roughness. Finally, we examine the influence of a large width and a small helix angle of the structures over the rough-wall with regard to spatial two-point correlation. The results show that these factors can increase the streamwise coherence of the structures in a statistical sense.
Quantifying Climate Change Hydrologic Risk at NASA Ames Research Center
NASA Astrophysics Data System (ADS)
Mills, W. B.; Bromirski, P. D.; Coats, R. N.; Costa-Cabral, M.; Fong, J.; Loewenstein, M.; Milesi, C.; Miller, N.; Murphy, N.; Roy, S.
2013-12-01
In response to 2009 Executive Order 13514 mandating U.S. federal agencies to evaluate infrastructure vulnerabilities due to climate variability and change we provide an analysis of future climate flood risk at NASA Ames Research Center (Ames) along South S.F. Bay. This includes likelihood analysis of large-scale water vapor transport, statistical analysis of intense precipitation, high winds, sea level rise, storm surge, estuary dynamics, saturated overland flooding, and likely impacts to wetlands and habitat loss near Ames. We use the IPCC CMIP5 data from three Atmosphere-Ocean General Circulation Models with Radiative Concentration Pathways of 8.5 Wm-2 and 4.5 Wm-2 and provide an analysis of climate variability and change associated with flooding and impacts at Ames. Intense storms impacting Ames are due to two large-scale processes, sub-tropical atmospheric rivers (AR) and north Pacific Aleutian low-pressure (AL) storm systems, both of which are analyzed here in terms of the Integrated Water Vapor (IWV) exceeding a critical threshold within a search domain and the wind vector transporting the IWV from southerly to westerly to northwesterly for ARs and northwesterly to northerly for ALs and within the Ames impact area during 1970-1999, 2040-2069, and 2070-2099. We also include a statistical model of extreme precipitation at Ames based on large-scale climatic predictors, and characterize changes using CMIP5 projections. Requirements for levee height to protect Ames are projected to increase and continually accelerate throughout this century as sea level rises. We use empirical statistical and analytical methods to determine the likelihood, in each year from present through 2099, of water level surpassing different threshold values in SF Bay near NASA Ames. We study the sensitivity of the water level corresponding to a 1-in-10 and 1-in-100 likelihood of exceedance to changes in the statistical distribution of storm surge height and ENSO height, in addition to increasing mean sea level. We examine the implications in the face of the CMIP5 projections. Storm intensification may result in increased flooding hazards at Ames. We analyze how the changes in precipitation intensity will impact the storm drainage system at Ames through continuous stormwater modeling of runoff with the EPA model SWMM 5 and projected downscaled daily precipitation data. Although extreme events will not adversely affect wetland habitats, adaptation projects--especially levee construction and improvement--will require filling of wetlands. Federal law mandates mitigation for fill placed in wetlands. We are currently calculating the potential mitigation burden by habitat type.
Bayesian inference for joint modelling of longitudinal continuous, binary and ordinal events.
Li, Qiuju; Pan, Jianxin; Belcher, John
2016-12-01
In medical studies, repeated measurements of continuous, binary and ordinal outcomes are routinely collected from the same patient. Instead of modelling each outcome separately, in this study we propose to jointly model the trivariate longitudinal responses, so as to take account of the inherent association between the different outcomes and thus improve statistical inferences. This work is motivated by a large cohort study in the North West of England, involving trivariate responses from each patient: Body Mass Index, Depression (Yes/No) ascertained with cut-off score not less than 8 at the Hospital Anxiety and Depression Scale, and Pain Interference generated from the Medical Outcomes Study 36-item short-form health survey with values returned on an ordinal scale 1-5. There are some well-established methods for combined continuous and binary, or even continuous and ordinal responses, but little work was done on the joint analysis of continuous, binary and ordinal responses. We propose conditional joint random-effects models, which take into account the inherent association between the continuous, binary and ordinal outcomes. Bayesian analysis methods are used to make statistical inferences. Simulation studies show that, by jointly modelling the trivariate outcomes, standard deviations of the estimates of parameters in the models are smaller and much more stable, leading to more efficient parameter estimates and reliable statistical inferences. In the real data analysis, the proposed joint analysis yields a much smaller deviance information criterion value than the separate analysis, and shows other good statistical properties too. © The Author(s) 2014.
Quantifying predictability in a model with statistical features of the atmosphere
Kleeman, Richard; Majda, Andrew J.; Timofeyev, Ilya
2002-01-01
The Galerkin truncated inviscid Burgers equation has recently been shown by the authors to be a simple model with many degrees of freedom, with many statistical properties similar to those occurring in dynamical systems relevant to the atmosphere. These properties include long time-correlated, large-scale modes of low frequency variability and short time-correlated “weather modes” at smaller scales. The correlation scaling in the model extends over several decades and may be explained by a simple theory. Here a thorough analysis of the nature of predictability in the idealized system is developed by using a theoretical framework developed by R.K. This analysis is based on a relative entropy functional that has been shown elsewhere by one of the authors to measure the utility of statistical predictions precisely. The analysis is facilitated by the fact that most relevant probability distributions are approximately Gaussian if the initial conditions are assumed to be so. Rather surprisingly this holds for both the equilibrium (climatological) and nonequilibrium (prediction) distributions. We find that in most cases the absolute difference in the first moments of these two distributions (the “signal” component) is the main determinant of predictive utility variations. Contrary to conventional belief in the ensemble prediction area, the dispersion of prediction ensembles is generally of secondary importance in accounting for variations in utility associated with different initial conditions. This conclusion has potentially important implications for practical weather prediction, where traditionally most attention has focused on dispersion and its variability. PMID:12429863
The Galics Project: Virtual Galaxy: from Cosmological N-body Simulations
NASA Astrophysics Data System (ADS)
Guiderdoni, B.
The GalICS project develops extensive semi-analytic post-processing of large cosmological simulations to describe hierarchical galaxy formation. The multiwavelength statistical properties of high-redshift and local galaxies are predicted within the large-scale structures. The fake catalogs and mock images that are generated from the outputs are used for the analysis and preparation of deep surveys. The whole set of results is now available in an on-line database that can be easily queried. The GalICS project represents a first step towards a 'Virtual Observatory of virtual galaxies'.
Cold dark matter and degree-scale cosmic microwave background anisotropy statistics after COBE
NASA Technical Reports Server (NTRS)
Gorski, Krzysztof M.; Stompor, Radoslaw; Juszkiewicz, Roman
1993-01-01
We conduct a Monte Carlo simulation of the cosmic microwave background (CMB) anisotropy in the UCSB South Pole 1991 degree-scale experiment. We examine cold dark matter cosmology with large-scale structure seeded by the Harrison-Zel'dovich hierarchy of Gaussian-distributed primordial inhomogeneities normalized to the COBE-DMR measurement of large-angle CMB anisotropy. We find it statistically implausible (in the sense of low cumulative probability F lower than 5 percent, of not measuring a cosmological delta-T/T signal) that the degree-scale cosmological CMB anisotropy predicted in such models could have escaped a detection at the level of sensitivity achieved in the South Pole 1991 experiment.
Lightweight and Statistical Techniques for Petascale PetaScale Debugging
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Barton
2014-06-30
This project investigated novel techniques for debugging scientific applications on petascale architectures. In particular, we developed lightweight tools that narrow the problem space when bugs are encountered. We also developed techniques that either limit the number of tasks and the code regions to which a developer must apply a traditional debugger or that apply statistical techniques to provide direct suggestions of the location and type of error. We extend previous work on the Stack Trace Analysis Tool (STAT), that has already demonstrated scalability to over one hundred thousand MPI tasks. We also extended statistical techniques developed to isolate programming errorsmore » in widely used sequential or threaded applications in the Cooperative Bug Isolation (CBI) project to large scale parallel applications. Overall, our research substantially improved productivity on petascale platforms through a tool set for debugging that complements existing commercial tools. Previously, Office Of Science application developers relied either on primitive manual debugging techniques based on printf or they use tools, such as TotalView, that do not scale beyond a few thousand processors. However, bugs often arise at scale and substantial effort and computation cycles are wasted in either reproducing the problem in a smaller run that can be analyzed with the traditional tools or in repeated runs at scale that use the primitive techniques. New techniques that work at scale and automate the process of identifying the root cause of errors were needed. These techniques significantly reduced the time spent debugging petascale applications, thus leading to a greater overall amount of time for application scientists to pursue the scientific objectives for which the systems are purchased. We developed a new paradigm for debugging at scale: techniques that reduced the debugging scenario to a scale suitable for traditional debuggers, e.g., by narrowing the search for the root-cause analysis to a small set of nodes or by identifying equivalence classes of nodes and sampling our debug targets from them. We implemented these techniques as lightweight tools that efficiently work on the full scale of the target machine. We explored four lightweight debugging refinements: generic classification parameters, such as stack traces, application-specific classification parameters, such as global variables, statistical data acquisition techniques and machine learning based approaches to perform root cause analysis. Work done under this project can be divided into two categories, new algorithms and techniques for scalable debugging, and foundation infrastructure work on our MRNet multicast-reduction framework for scalability, and Dyninst binary analysis and instrumentation toolkits.« less
Kaushal, Mayank; Oni-Orisan, Akinwunmi; Chen, Gang; Li, Wenjun; Leschke, Jack; Ward, Doug; Kalinosky, Benjamin; Budde, Matthew; Schmit, Brian; Li, Shi-Jiang; Muqeet, Vaishnavi; Kurpad, Shekar
2017-09-01
Network analysis based on graph theory depicts the brain as a complex network that allows inspection of overall brain connectivity pattern and calculation of quantifiable network metrics. To date, large-scale network analysis has not been applied to resting-state functional networks in complete spinal cord injury (SCI) patients. To characterize modular reorganization of whole brain into constituent nodes and compare network metrics between SCI and control subjects, fifteen subjects with chronic complete cervical SCI and 15 neurologically intact controls were scanned. The data were preprocessed followed by parcellation of the brain into 116 regions of interest (ROI). Correlation analysis was performed between every ROI pair to construct connectivity matrices and ROIs were categorized into distinct modules. Subsequently, local efficiency (LE) and global efficiency (GE) network metrics were calculated at incremental cost thresholds. The application of a modularity algorithm organized the whole-brain resting-state functional network of the SCI and the control subjects into nine and seven modules, respectively. The individual modules differed across groups in terms of the number and the composition of constituent nodes. LE demonstrated statistically significant decrease at multiple cost levels in SCI subjects. GE did not differ significantly between the two groups. The demonstration of modular architecture in both groups highlights the applicability of large-scale network analysis in studying complex brain networks. Comparing modules across groups revealed differences in number and membership of constituent nodes, indicating modular reorganization due to neural plasticity.
Frazier, Thomas W; Ratliff, Kristin R; Gruber, Chris; Zhang, Yi; Law, Paul A; Constantino, John N
2014-01-01
Understanding the factor structure of autistic symptomatology is critical to the discovery and interpretation of causal mechanisms in autism spectrum disorder. We applied confirmatory factor analysis and assessment of measurement invariance to a large (N = 9635) accumulated collection of reports on quantitative autistic traits using the Social Responsiveness Scale, representing a broad diversity of age, severity, and reporter type. A two-factor structure (corresponding to social communication impairment and restricted, repetitive behavior) as elaborated in the updated Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5) criteria for autism spectrum disorder exhibited acceptable model fit in confirmatory factor analysis. Measurement invariance was appreciable across age, sex, and reporter (self vs other), but somewhat less apparent between clinical and nonclinical populations in this sample comprised of both familial and sporadic autism spectrum disorders. The statistical power afforded by this large sample allowed relative differentiation of three factors among items encompassing social communication impairment (emotion recognition, social avoidance, and interpersonal relatedness) and two factors among items encompassing restricted, repetitive behavior (insistence on sameness and repetitive mannerisms). Cross-trait correlations remained extremely high, that is, on the order of 0.66-0.92. These data clarify domains of statistically significant factoral separation that may relate to partially-but not completely-overlapping biological mechanisms, contributing to variation in human social competency. Given such robust intercorrelations among symptom domains, understanding their co-emergence remains a high priority in conceptualizing common neural mechanisms underlying autistic syndromes.
Statistical Characterization of School Bus Drive Cycles Collected via Onboard Logging Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duran, A.; Walkowicz, K.
In an effort to characterize the dynamics typical of school bus operation, National Renewable Energy Laboratory (NREL) researchers set out to gather in-use duty cycle data from school bus fleets operating across the country. Employing a combination of Isaac Instruments GPS/CAN data loggers in conjunction with existing onboard telemetric systems resulted in the capture of operating information for more than 200 individual vehicles in three geographically unique domestic locations. In total, over 1,500 individual operational route shifts from Washington, New York, and Colorado were collected. Upon completing the collection of in-use field data using either NREL-installed data acquisition devices ormore » existing onboard telemetry systems, large-scale duty-cycle statistical analyses were performed to examine underlying vehicle dynamics trends within the data and to explore vehicle operation variations between fleet locations. Based on the results of these analyses, high, low, and average vehicle dynamics requirements were determined, resulting in the selection of representative standard chassis dynamometer test cycles for each condition. In this paper, the methodology and accompanying results of the large-scale duty-cycle statistical analysis are presented, including graphical and tabular representations of a number of relationships between key duty-cycle metrics observed within the larger data set. In addition to presenting the results of this analysis, conclusions are drawn and presented regarding potential applications of advanced vehicle technology as it relates specifically to school buses.« less
Percolation Analysis as a Tool to Describe the Topology of the Large Scale Structure of the Universe
NASA Astrophysics Data System (ADS)
Yess, Capp D.
1997-09-01
Percolation analysis is the study of the properties of clusters. In cosmology, it is the statistics of the size and number of clusters. This thesis presents a refinement of percolation analysis and its application to astronomical data. An overview of the standard model of the universe and the development of large scale structure is presented in order to place the study in historical and scientific context. Then using percolation statistics we, for the first time, demonstrate the universal character of a network pattern in the real space, mass distributions resulting from nonlinear gravitational instability of initial Gaussian fluctuations. We also find that the maximum of the number of clusters statistic in the evolved, nonlinear distributions is determined by the effective slope of the power spectrum. Next, we present percolation analyses of Wiener Reconstructions of the IRAS 1.2 Jy Redshift Survey. There are ten reconstructions of galaxy density fields in real space spanning the range β = 0.1 to 1.0, where β=Ω0.6/b,/ Ω is the present dimensionless density and b is the linear bias factor. Our method uses the growth of the largest cluster statistic to characterize the topology of a density field, where Gaussian randomized versions of the reconstructions are used as standards for analysis. For the reconstruction volume of radius, R≈100h-1 Mpc, percolation analysis reveals a slight 'meatball' topology for the real space, galaxy distribution of the IRAS survey. Finally, we employ a percolation technique developed for pointwise distributions to analyze two-dimensional projections of the three northern and three southern slices in the Las Campanas Redshift Survey and then give consideration to further study of the methodology, errors and application of percolation. We track the growth of the largest cluster as a topological indicator to a depth of 400 h-1 Mpc, and report an unambiguous signal, with high signal-to-noise ratio, indicating a network topology which in two dimensions is indicative of a filamentary distribution. It is hoped that one day percolation analysis can characterize the structure of the universe to a degree that will aid theorists in confidently describing the nature of our world.
NASA Astrophysics Data System (ADS)
Wanchuliak, O. Ya.; Peresunko, A. P.; Bakko, Bouzan Adel; Kushnerick, L. Ya.
2011-09-01
This paper presents the foundations of a large scale - localized wavelet - polarization analysis - inhomogeneous laser images of histological sections of myocardial tissue. Opportunities were identified defining relations between the structures of wavelet coefficients and causes of death. The optical model of polycrystalline networks of myocardium protein fibrils is presented. The technique of determining the coordinate distribution of polarization azimuth of the points of laser images of myocardium histological sections is suggested. The results of investigating the interrelation between the values of statistical (statistical moments of the 1st-4th order) parameters are presented which characterize distributions of wavelet - coefficients polarization maps of myocardium layers and death reasons.
ERIC Educational Resources Information Center
O'Brien, Mark
2011-01-01
The appropriateness of using statistical data to inform the design of any given service development or initiative often depends upon judgements regarding scale. Large-scale data sets, perhaps national in scope, whilst potentially important in informing the design, implementation and roll-out of experimental initiatives, will often remain unused…
The statistics of primordial density fluctuations
NASA Astrophysics Data System (ADS)
Barrow, John D.; Coles, Peter
1990-05-01
The statistical properties of the density fluctuations produced by power-law inflation are investigated. It is found that, even the fluctuations present in the scalar field driving the inflation are Gaussian, the resulting density perturbations need not be, due to stochastic variations in the Hubble parameter. All the moments of the density fluctuations are calculated, and is is argued that, for realistic parameter choices, the departures from Gaussian statistics are small and would have a negligible effect on the large-scale structure produced in the model. On the other hand, the model predicts a power spectrum with n not equal to 1, and this could be good news for large-scale structure.
FGWAS: Functional genome wide association analysis.
Huang, Chao; Thompson, Paul; Wang, Yalin; Yu, Yang; Zhang, Jingwen; Kong, Dehan; Colen, Rivka R; Knickmeyer, Rebecca C; Zhu, Hongtu
2017-10-01
Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs. Copyright © 2017 Elsevier Inc. All rights reserved.
Multi-scale Modeling of Radiation Damage: Large Scale Data Analysis
NASA Astrophysics Data System (ADS)
Warrier, M.; Bhardwaj, U.; Bukkuru, S.
2016-10-01
Modification of materials in nuclear reactors due to neutron irradiation is a multiscale problem. These neutrons pass through materials creating several energetic primary knock-on atoms (PKA) which cause localized collision cascades creating damage tracks, defects (interstitials and vacancies) and defect clusters depending on the energy of the PKA. These defects diffuse and recombine throughout the whole duration of operation of the reactor, thereby changing the micro-structure of the material and its properties. It is therefore desirable to develop predictive computational tools to simulate the micro-structural changes of irradiated materials. In this paper we describe how statistical averages of the collision cascades from thousands of MD simulations are used to provide inputs to Kinetic Monte Carlo (KMC) simulations which can handle larger sizes, more defects and longer time durations. Use of unsupervised learning and graph optimization in handling and analyzing large scale MD data will be highlighted.
Anima: Modular Workflow System for Comprehensive Image Data Analysis
Rantanen, Ville; Valori, Miko; Hautaniemi, Sampsa
2014-01-01
Modern microscopes produce vast amounts of image data, and computational methods are needed to analyze and interpret these data. Furthermore, a single image analysis project may require tens or hundreds of analysis steps starting from data import and pre-processing to segmentation and statistical analysis; and ending with visualization and reporting. To manage such large-scale image data analysis projects, we present here a modular workflow system called Anima. Anima is designed for comprehensive and efficient image data analysis development, and it contains several features that are crucial in high-throughput image data analysis: programing language independence, batch processing, easily customized data processing, interoperability with other software via application programing interfaces, and advanced multivariate statistical analysis. The utility of Anima is shown with two case studies focusing on testing different algorithms developed in different imaging platforms and an automated prediction of alive/dead C. elegans worms by integrating several analysis environments. Anima is a fully open source and available with documentation at www.anduril.org/anima. PMID:25126541
What initial condition of inflation would suppress the large-scale CMB spectrum?
Chen, Pisin; Lin, Yu -Hsiang
2016-01-08
There is an apparent power deficit relative to the Λ CDM prediction of the cosmic microwave background spectrum at large scales, which, though not yet statistically significant, persists from WMAP to Planck data. Proposals that invoke some form of initial condition for the inflation have been made to address this apparent power suppression, albeit with conflicting conclusions. By studying the curvature perturbations of a scalar field in the Friedmann-Lemaître-Robertson-Walker universe parameterized by the equation of state parameter w, we find that the large-scale spectrum at the end of inflation reflects the superhorizon spectrum of the initial state. The large-scale spectrummore » is suppressed if the universe begins with the adiabatic vacuum in a superinflation (w < –1) or positive-pressure (w > 0) era. In the latter case, there is however no causal mechanism to establish the initial adiabatic vacuum. On the other hand, as long as the universe begins with the adiabatic vacuum in an era with –1 < w < 0, even if there exists an intermediate positive-pressure era, the large-scale spectrum would be enhanced rather than suppressed. In conclusion, we further calculate the spectrum of a two-stage inflation model with a two-field potential and show that the result agrees with that obtained from the ad hoc single-field analysis.« less
Pavlacky, David C; Lukacs, Paul M; Blakesley, Jennifer A; Skorkowsky, Robert C; Klute, David S; Hahn, Beth A; Dreitz, Victoria J; George, T Luke; Hanni, David J
2017-01-01
Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1) coordination across organizations and regions, 2) meaningful management and conservation objectives, and 3) rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR) program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17). We provide two examples for the Brewer's sparrow (Spizella breweri) in BCR 17 demonstrating the ability of the design to 1) determine hierarchical population responses to landscape change and 2) estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous statistical design and analyses ensures reliable knowledge about bird populations that is relevant and integral to bird conservation at multiple scales.
Hahn, Beth A.; Dreitz, Victoria J.; George, T. Luke
2017-01-01
Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1) coordination across organizations and regions, 2) meaningful management and conservation objectives, and 3) rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR) program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17). We provide two examples for the Brewer’s sparrow (Spizella breweri) in BCR 17 demonstrating the ability of the design to 1) determine hierarchical population responses to landscape change and 2) estimate hierarchical habitat relationships to predict the response of the Brewer’s sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous statistical design and analyses ensures reliable knowledge about bird populations that is relevant and integral to bird conservation at multiple scales. PMID:29065128
Huang, Zhenzhen; Duan, Huilong; Li, Haomin
2015-01-01
Large-scale human cancer genomics projects, such as TCGA, generated large genomics data for further study. Exploring and mining these data to obtain meaningful analysis results can help researchers find potential genomics alterations that intervene the development and metastasis of tumors. We developed a web-based gene analysis platform, named TCGA4U, which used statistics methods and models to help translational investigators explore, mine and visualize human cancer genomic characteristic information from the TCGA datasets. Furthermore, through Gene Ontology (GO) annotation and clinical data integration, the genomic data were transformed into biological process, molecular function, cellular component and survival curves to help researchers identify potential driver genes. Clinical researchers without expertise in data analysis will benefit from such a user-friendly genomic analysis platform.
Zhu, Yun; Fan, Ruzong; Xiong, Momiao
2017-01-01
Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics. PMID:29040274
BRepertoire: a user-friendly web server for analysing antibody repertoire data.
Margreitter, Christian; Lu, Hui-Chun; Townsend, Catherine; Stewart, Alexander; Dunn-Walters, Deborah K; Fraternali, Franca
2018-04-14
Antibody repertoire analysis by high throughput sequencing is now widely used, but a persisting challenge is enabling immunologists to explore their data to discover discriminating repertoire features for their own particular investigations. Computational methods are necessary for large-scale evaluation of antibody properties. We have developed BRepertoire, a suite of user-friendly web-based software tools for large-scale statistical analyses of repertoire data. The software is able to use data preprocessed by IMGT, and performs statistical and comparative analyses with versatile plotting options. BRepertoire has been designed to operate in various modes, for example analysing sequence-specific V(D)J gene usage, discerning physico-chemical properties of the CDR regions and clustering of clonotypes. Those analyses are performed on the fly by a number of R packages and are deployed by a shiny web platform. The user can download the analysed data in different table formats and save the generated plots as image files ready for publication. We believe BRepertoire to be a versatile analytical tool that complements experimental studies of immune repertoires. To illustrate the server's functionality, we show use cases including differential gene usage in a vaccination dataset and analysis of CDR3H properties in old and young individuals. The server is accessible under http://mabra.biomed.kcl.ac.uk/BRepertoire.
Mapping regional patterns of large forest fires in Wildland-Urban Interface areas in Europe.
Modugno, Sirio; Balzter, Heiko; Cole, Beth; Borrelli, Pasquale
2016-05-01
Over recent decades, Land Use and Cover Change (LUCC) trends in many regions of Europe have reconfigured the landscape structures around many urban areas. In these areas, the proximity to landscape elements with high forest fuels has increased the fire risk to people and property. These Wildland-Urban Interface areas (WUI) can be defined as landscapes where anthropogenic urban land use and forest fuel mass come into contact. Mapping their extent is needed to prioritize fire risk control and inform local forest fire risk management strategies. This study proposes a method to map the extent and spatial patterns of the European WUI areas at continental scale. Using the European map of WUI areas, the hypothesis is tested that the distance from the nearest WUI area is related to the forest fire probability. Statistical relationships between the distance from the nearest WUI area, and large forest fire incidents from satellite remote sensing were subsequently modelled by logistic regression analysis. The first European scale map of the WUI extent and locations is presented. Country-specific positive and negative relationships of large fires and the proximity to the nearest WUI area are found. A regional-scale analysis shows a strong influence of the WUI zones on large fires in parts of the Mediterranean regions. Results indicate that the probability of large burned surfaces increases with diminishing WUI distance in touristic regions like Sardinia, Provence-Alpes-Côte d'Azur, or in regions with a strong peri-urban component as Catalunya, Comunidad de Madrid, Comunidad Valenciana. For the above regions, probability curves of large burned surfaces show statistical relationships (ROC value > 0.5) inside a 5000 m buffer of the nearest WUI. Wise land management can provide a valuable ecosystem service of fire risk reduction that is currently not explicitly included in ecosystem service valuations. The results re-emphasise the importance of including this ecosystem service in landscape valuations to account for the significant landscape function of reducing the risk of catastrophic large fires. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
PRANAS: A New Platform for Retinal Analysis and Simulation.
Cessac, Bruno; Kornprobst, Pierre; Kraria, Selim; Nasser, Hassan; Pamplona, Daniela; Portelli, Geoffrey; Viéville, Thierry
2017-01-01
The retina encodes visual scenes by trains of action potentials that are sent to the brain via the optic nerve. In this paper, we describe a new free access user-end software allowing to better understand this coding. It is called PRANAS (https://pranas.inria.fr), standing for Platform for Retinal ANalysis And Simulation. PRANAS targets neuroscientists and modelers by providing a unique set of retina-related tools. PRANAS integrates a retina simulator allowing large scale simulations while keeping a strong biological plausibility and a toolbox for the analysis of spike train population statistics. The statistical method (entropy maximization under constraints) takes into account both spatial and temporal correlations as constraints, allowing to analyze the effects of memory on statistics. PRANAS also integrates a tool computing and representing in 3D (time-space) receptive fields. All these tools are accessible through a friendly graphical user interface. The most CPU-costly of them have been implemented to run in parallel.
Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R; Kang, Hyun Min
2015-06-01
The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. © 2015 Jun et al.; Published by Cold Spring Harbor Laboratory Press.
Bryan, Rebecca; Nair, Prasanth B; Taylor, Mark
2009-09-18
Interpatient variability is often overlooked in orthopaedic computational studies due to the substantial challenges involved in sourcing and generating large numbers of bone models. A statistical model of the whole femur incorporating both geometric and material property variation was developed as a potential solution to this problem. The statistical model was constructed using principal component analysis, applied to 21 individual computer tomography scans. To test the ability of the statistical model to generate realistic, unique, finite element (FE) femur models it was used as a source of 1000 femurs to drive a study on femoral neck fracture risk. The study simulated the impact of an oblique fall to the side, a scenario known to account for a large proportion of hip fractures in the elderly and have a lower fracture load than alternative loading approaches. FE model generation, application of subject specific loading and boundary conditions, FE processing and post processing of the solutions were completed automatically. The generated models were within the bounds of the training data used to create the statistical model with a high mesh quality, able to be used directly by the FE solver without remeshing. The results indicated that 28 of the 1000 femurs were at highest risk of fracture. Closer analysis revealed the percentage of cortical bone in the proximal femur to be a crucial differentiator between the failed and non-failed groups. The likely fracture location was indicated to be intertrochantic. Comparison to previous computational, clinical and experimental work revealed support for these findings.
How Do Microphysical Processes Influence Large-Scale Precipitation Variability and Extremes?
Hagos, Samson; Ruby Leung, L.; Zhao, Chun; ...
2018-02-10
Convection permitting simulations using the Model for Prediction Across Scales-Atmosphere (MPAS-A) are used to examine how microphysical processes affect large-scale precipitation variability and extremes. An episode of the Madden-Julian Oscillation is simulated using MPAS-A with a refined region at 4-km grid spacing over the Indian Ocean. It is shown that cloud microphysical processes regulate the precipitable water (PW) statistics. Because of the non-linear relationship between precipitation and PW, PW exceeding a certain critical value (PWcr) contributes disproportionately to precipitation variability. However, the frequency of PW exceeding PWcr decreases rapidly with PW, so changes in microphysical processes that shift the columnmore » PW statistics relative to PWcr even slightly have large impacts on precipitation variability. Furthermore, precipitation variance and extreme precipitation frequency are approximately linearly related to the difference between the mean and critical PW values. Thus observed precipitation statistics could be used to directly constrain model microphysical parameters as this study demonstrates using radar observations from DYNAMO field campaign.« less
How Do Microphysical Processes Influence Large-Scale Precipitation Variability and Extremes?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hagos, Samson; Ruby Leung, L.; Zhao, Chun
Convection permitting simulations using the Model for Prediction Across Scales-Atmosphere (MPAS-A) are used to examine how microphysical processes affect large-scale precipitation variability and extremes. An episode of the Madden-Julian Oscillation is simulated using MPAS-A with a refined region at 4-km grid spacing over the Indian Ocean. It is shown that cloud microphysical processes regulate the precipitable water (PW) statistics. Because of the non-linear relationship between precipitation and PW, PW exceeding a certain critical value (PWcr) contributes disproportionately to precipitation variability. However, the frequency of PW exceeding PWcr decreases rapidly with PW, so changes in microphysical processes that shift the columnmore » PW statistics relative to PWcr even slightly have large impacts on precipitation variability. Furthermore, precipitation variance and extreme precipitation frequency are approximately linearly related to the difference between the mean and critical PW values. Thus observed precipitation statistics could be used to directly constrain model microphysical parameters as this study demonstrates using radar observations from DYNAMO field campaign.« less
Gene coexpression measures in large heterogeneous samples using count statistics.
Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan
2014-11-18
With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.
Using Microsoft Excel[R] to Calculate Descriptive Statistics and Create Graphs
ERIC Educational Resources Information Center
Carr, Nathan T.
2008-01-01
Descriptive statistics and appropriate visual representations of scores are important for all test developers, whether they are experienced testers working on large-scale projects, or novices working on small-scale local tests. Many teachers put in charge of testing projects do not know "why" they are important, however, and are utterly convinced…
Feature-Based Statistical Analysis of Combustion Simulation Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bennett, J; Krishnamoorthy, V; Liu, S
2011-11-18
We present a new framework for feature-based statistical analysis of large-scale scientific data and demonstrate its effectiveness by analyzing features from Direct Numerical Simulations (DNS) of turbulent combustion. Turbulent flows are ubiquitous and account for transport and mixing processes in combustion, astrophysics, fusion, and climate modeling among other disciplines. They are also characterized by coherent structure or organized motion, i.e. nonlocal entities whose geometrical features can directly impact molecular mixing and reactive processes. While traditional multi-point statistics provide correlative information, they lack nonlocal structural information, and hence, fail to provide mechanistic causality information between organized fluid motion and mixing andmore » reactive processes. Hence, it is of great interest to capture and track flow features and their statistics together with their correlation with relevant scalar quantities, e.g. temperature or species concentrations. In our approach we encode the set of all possible flow features by pre-computing merge trees augmented with attributes, such as statistical moments of various scalar fields, e.g. temperature, as well as length-scales computed via spectral analysis. The computation is performed in an efficient streaming manner in a pre-processing step and results in a collection of meta-data that is orders of magnitude smaller than the original simulation data. This meta-data is sufficient to support a fully flexible and interactive analysis of the features, allowing for arbitrary thresholds, providing per-feature statistics, and creating various global diagnostics such as Cumulative Density Functions (CDFs), histograms, or time-series. We combine the analysis with a rendering of the features in a linked-view browser that enables scientists to interactively explore, visualize, and analyze the equivalent of one terabyte of simulation data. We highlight the utility of this new framework for combustion science; however, it is applicable to many other science domains.« less
Coherent nonhelical shear dynamos driven by magnetic fluctuations at low Reynolds numbers
Squire, J.; Bhattacharjee, A.
2015-10-28
Nonhelical shear dynamos are studied with a particular focus on the possibility of coherent dynamo action. The primary results—serving as a follow up to the results of Squire & Bhattacharjee—pertain to the "magnetic shear-current effect" as a viable mechanism to drive large-scale magnetic field generation. This effect raises the interesting possibility that the saturated state of the small-scale dynamo could drive large-scale dynamo action, and is likely to be important in the unstratified regions of accretion disk turbulence. In this paper, the effect is studied at low Reynolds numbers, removing the complications of small-scale dynamo excitation and aiding analysis bymore » enabling the use of quasi-linear statistical simulation methods. In addition to the magnetically driven dynamo, new results on the kinematic nonhelical shear dynamo are presented. Furthermore, these illustrate the relationship between coherent and incoherent driving in such dynamos, demonstrating the importance of rotation in determining the relative dominance of each mechanism.« less
COHERENT NONHELICAL SHEAR DYNAMOS DRIVEN BY MAGNETIC FLUCTUATIONS AT LOW REYNOLDS NUMBERS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Squire, J.; Bhattacharjee, A., E-mail: jsquire@caltech.edu
2015-11-01
Nonhelical shear dynamos are studied with a particular focus on the possibility of coherent dynamo action. The primary results—serving as a follow up to the results of Squire and Bhattacharjee—pertain to the “magnetic shear-current effect” as a viable mechanism to drive large-scale magnetic field generation. This effect raises the interesting possibility that the saturated state of the small-scale dynamo could drive large-scale dynamo action, and is likely to be important in the unstratified regions of accretion disk turbulence. In this paper, the effect is studied at low Reynolds numbers, removing the complications of small-scale dynamo excitation and aiding analysis bymore » enabling the use of quasi-linear statistical simulation methods. In addition to the magnetically driven dynamo, new results on the kinematic nonhelical shear dynamo are presented. These illustrate the relationship between coherent and incoherent driving in such dynamos, demonstrating the importance of rotation in determining the relative dominance of each mechanism.« less
Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario
2014-01-01
Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565
Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye
2016-01-13
A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.
Skyscape Archaeology: an emerging interdiscipline for archaeoastronomers and archaeologists
NASA Astrophysics Data System (ADS)
Henty, Liz
2016-02-01
For historical reasons archaeoastronomy and archaeology differ in their approach to prehistoric monuments and this has created a divide between the disciplines which adopt seemingly incompatible methodologies. The reasons behind the impasse will be explored to show how these different approaches gave rise to their respective methods. Archaeology investigations tend to concentrate on single site analysis whereas archaeoastronomical surveys tend to be data driven from the examination of a large number of similar sets. A comparison will be made between traditional archaeoastronomical data gathering and an emerging methodology which looks at sites on a small scale and combines archaeology and astronomy. Silva's recent research in Portugal and this author's survey in Scotland have explored this methodology and termed it skyscape archaeology. This paper argues that this type of phenomenological skyscape archaeology offers an alternative to large scale statistical studies which analyse astronomical data obtained from a large number of superficially similar archaeological sites.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vaidheeswaran, Avinash; Shaffer, Franklin; Gopalan, Balaji
Here, the statistics of fluctuating velocity components are studied in the riser of a closed-loop circulating fluidized bed with fluid catalytic cracking catalyst particles. Our analysis shows distinct similarities as well as deviations compared to existing theories and bench-scale experiments. The study confirms anisotropic and non-Maxwellian distribution of fluctuating velocity components. The velocity distribution functions (VDFs) corresponding to transverse fluctuations exhibit symmetry, and follow a stretched-exponential behavior up to three standard deviations. The form of the transverse VDF is largely determined by interparticle interactions. The tails become more overpopulated with an increase in particle loading. The observed deviations from themore » Gaussian distribution are represented using the leading order term in the Sonine expansion, which is commonly used to approximate the VDFs in kinetic theory for granular flows. The vertical fluctuating VDFs are asymmetric and the skewness shifts as the wall is approached. In comparison to transverse fluctuations, the vertical VDF is determined by the local hydrodynamics. This is an observation of particle velocity fluctuations in a large-scale system and their quantitative comparison with the Maxwell-Boltzmann statistics.« less
The Population Tracking Model: A Simple, Scalable Statistical Model for Neural Population Data
O'Donnell, Cian; alves, J. Tiago Gonç; Whiteley, Nick; Portera-Cailliau, Carlos; Sejnowski, Terrence J.
2017-01-01
Our understanding of neural population coding has been limited by a lack of analysis methods to characterize spiking data from large populations. The biggest challenge comes from the fact that the number of possible network activity patterns scales exponentially with the number of neurons recorded (∼2Neurons). Here we introduce a new statistical method for characterizing neural population activity that requires semi-independent fitting of only as many parameters as the square of the number of neurons, requiring drastically smaller data sets and minimal computation time. The model works by matching the population rate (the number of neurons synchronously active) and the probability that each individual neuron fires given the population rate. We found that this model can accurately fit synthetic data from up to 1000 neurons. We also found that the model could rapidly decode visual stimuli from neural population data from macaque primary visual cortex about 65 ms after stimulus onset. Finally, we used the model to estimate the entropy of neural population activity in developing mouse somatosensory cortex and, surprisingly, found that it first increases, and then decreases during development. This statistical model opens new options for interrogating neural population data and can bolster the use of modern large-scale in vivo Ca2+ and voltage imaging tools. PMID:27870612
NASA Astrophysics Data System (ADS)
Massei, Nicolas; Dieppois, Bastien; Hannah, David; Lavers, David; Fossa, Manuel; Laignel, Benoit; Debret, Maxime
2017-04-01
Geophysical signals oscillate over several time-scales that explain different amount of their overall variability and may be related to different physical processes. Characterizing and understanding such variabilities in hydrological variations and investigating their determinism is one important issue in a context of climate change, as these variabilities can be occasionally superimposed to long-term trend possibly due to climate change. It is also important to refine our understanding of time-scale dependent linkages between large-scale climatic variations and hydrological responses on the regional or local-scale. Here we investigate such links by conducting a wavelet multiresolution statistical dowscaling approach of precipitation in northwestern France (Seine river catchment) over 1950-2016 using sea level pressure (SLP) and sea surface temperature (SST) as indicators of atmospheric and oceanic circulations, respectively. Previous results demonstrated that including multiresolution decomposition in a statistical downscaling model (within a so-called multiresolution ESD model) using SLP as large-scale predictor greatly improved simulation of low-frequency, i.e. interannual to interdecadal, fluctuations observed in precipitation. Building on these results, continuous wavelet transform of simulated precipiation using multiresolution ESD confirmed the good performance of the model to better explain variability at all time-scales. A sensitivity analysis of the model to the choice of the scale and wavelet function used was also tested. It appeared that whatever the wavelet used, the model performed similarly. The spatial patterns of SLP found as the best predictors for all time-scales, which resulted from the wavelet decomposition, revealed different structures according to time-scale, showing possible different determinisms. More particularly, some low-frequency components ( 3.2-yr and 19.3-yr) showed a much wide-spread spatial extentsion across the Atlantic. Moreover, in accordance with other previous studies, the wavelet components detected in SLP and precipitation on interannual to interdecadal time-scales could be interpreted in terms of influence of the Gulf-Stream oceanic front on atmospheric circulation. Current works are now conducted including SST over the Atlantic in order to get further insights into this mechanism.
Statistics of galaxy orientations - Morphology and large-scale structure
NASA Technical Reports Server (NTRS)
Lambas, Diego G.; Groth, Edward J.; Peebles, P. J. E.
1988-01-01
Using the Uppsala General Catalog of bright galaxies and the northern and southern maps of the Lick counts of galaxies, statistical evidence of a morphology-orientation effect is found. Major axes of elliptical galaxies are preferentially oriented along the large-scale features of the Lick maps. However, the orientations of the major axes of spiral and lenticular galaxies show no clear signs of significant nonrandom behavior at a level of less than about one-fifth of the effect seen for ellipticals. The angular scale of the detected alignment effect for Uppsala ellipticals extends to at least theta of about 2 deg, which at a redshift of z of about 0.02 corresponds to a linear scale of about 2/h Mpc.
Viewpoint: observations on scaled average bioequivalence.
Patterson, Scott D; Jones, Byron
2012-01-01
The two one-sided test procedure (TOST) has been used for average bioequivalence testing since 1992 and is required when marketing new formulations of an approved drug. TOST is known to require comparatively large numbers of subjects to demonstrate bioequivalence for highly variable drugs, defined as those drugs having intra-subject coefficients of variation greater than 30%. However, TOST has been shown to protect public health when multiple generic formulations enter the marketplace following patent expiration. Recently, scaled average bioequivalence (SABE) has been proposed as an alternative statistical analysis procedure for such products by multiple regulatory agencies. SABE testing requires that a three-period partial replicate cross-over or full replicate cross-over design be used. Following a brief summary of SABE analysis methods applied to existing data, we will consider three statistical ramifications of the proposed additional decision rules and the potential impact of implementation of scaled average bioequivalence in the marketplace using simulation. It is found that a constraint being applied is biased, that bias may also result from the common problem of missing data and that the SABE methods allow for much greater changes in exposure when generic-generic switching occurs in the marketplace. Copyright © 2011 John Wiley & Sons, Ltd.
Combined heat and power systems: economic and policy barriers to growth.
Kalam, Adil; King, Abigail; Moret, Ellen; Weerasinghe, Upekha
2012-04-23
Combined Heat and Power (CHP) systems can provide a range of benefits to users with regards to efficiency, reliability, costs and environmental impact. Furthermore, increasing the amount of electricity generated by CHP systems in the United States has been identified as having significant potential for impressive economic and environmental outcomes on a national scale. Given the benefits from increasing the adoption of CHP technologies, there is value in improving our understanding of how desired increases in CHP adoption can be best achieved. These obstacles are currently understood to stem from regulatory as well as economic and technological barriers. In our research, we answer the following questions: Given the current policy and economic environment facing the CHP industry, what changes need to take place in this space in order for CHP systems to be competitive in the energy market? We focus our analysis primarily on Combined Heat and Power Systems that use natural gas turbines. Our analysis takes a two-pronged approach. We first conduct a statistical analysis of the impact of state policies on increases in electricity generated from CHP system. Second, we conduct a Cost-Benefit analysis to determine in which circumstances funding incentives are necessary to make CHP technologies cost-competitive. Our policy analysis shows that regulatory improvements do not explain the growth in adoption of CHP technologies but hold the potential to encourage increases in electricity generated from CHP system in small-scale applications. Our Cost-Benefit analysis shows that CHP systems are only cost competitive in large-scale applications and that funding incentives would be necessary to make CHP technology cost-competitive in small-scale applications. From the synthesis of these analyses we conclude that because large-scale applications of natural gas turbines are already cost-competitive, policy initiatives aimed at a CHP market dominated primarily by large-scale (and therefore already cost-competitive) systems have not been effectively directed. Our recommendation is that for CHP technologies using natural gas turbines, policy focuses should be on increasing CHP growth in small-scale systems. This result can be best achieved through redirection of state and federal incentives, research and development, adoption of smart grid technology, and outreach and education.
Large-scale runoff generation - parsimonious parameterisation using high-resolution topography
NASA Astrophysics Data System (ADS)
Gong, L.; Halldin, S.; Xu, C.-Y.
2011-08-01
World water resources have primarily been analysed by global-scale hydrological models in the last decades. Runoff generation in many of these models are based on process formulations developed at catchments scales. The division between slow runoff (baseflow) and fast runoff is primarily governed by slope and spatial distribution of effective water storage capacity, both acting at very small scales. Many hydrological models, e.g. VIC, account for the spatial storage variability in terms of statistical distributions; such models are generally proven to perform well. The statistical approaches, however, use the same runoff-generation parameters everywhere in a basin. The TOPMODEL concept, on the other hand, links the effective maximum storage capacity with real-world topography. Recent availability of global high-quality, high-resolution topographic data makes TOPMODEL attractive as a basis for a physically-based runoff-generation algorithm at large scales, even if its assumptions are not valid in flat terrain or for deep groundwater systems. We present a new runoff-generation algorithm for large-scale hydrology based on TOPMODEL concepts intended to overcome these problems. The TRG (topography-derived runoff generation) algorithm relaxes the TOPMODEL equilibrium assumption so baseflow generation is not tied to topography. TRG only uses the topographic index to distribute average storage to each topographic index class. The maximum storage capacity is proportional to the range of topographic index and is scaled by one parameter. The distribution of storage capacity within large-scale grid cells is obtained numerically through topographic analysis. The new topography-derived distribution function is then inserted into a runoff-generation framework similar VIC's. Different basin parts are parameterised by different storage capacities, and different shapes of the storage-distribution curves depend on their topographic characteristics. The TRG algorithm is driven by the HydroSHEDS dataset with a resolution of 3" (around 90 m at the equator). The TRG algorithm was validated against the VIC algorithm in a common model framework in 3 river basins in different climates. The TRG algorithm performed equally well or marginally better than the VIC algorithm with one less parameter to be calibrated. The TRG algorithm also lacked equifinality problems and offered a realistic spatial pattern for runoff generation and evaporation.
Large-scale runoff generation - parsimonious parameterisation using high-resolution topography
NASA Astrophysics Data System (ADS)
Gong, L.; Halldin, S.; Xu, C.-Y.
2010-09-01
World water resources have primarily been analysed by global-scale hydrological models in the last decades. Runoff generation in many of these models are based on process formulations developed at catchments scales. The division between slow runoff (baseflow) and fast runoff is primarily governed by slope and spatial distribution of effective water storage capacity, both acting a very small scales. Many hydrological models, e.g. VIC, account for the spatial storage variability in terms of statistical distributions; such models are generally proven to perform well. The statistical approaches, however, use the same runoff-generation parameters everywhere in a basin. The TOPMODEL concept, on the other hand, links the effective maximum storage capacity with real-world topography. Recent availability of global high-quality, high-resolution topographic data makes TOPMODEL attractive as a basis for a physically-based runoff-generation algorithm at large scales, even if its assumptions are not valid in flat terrain or for deep groundwater systems. We present a new runoff-generation algorithm for large-scale hydrology based on TOPMODEL concepts intended to overcome these problems. The TRG (topography-derived runoff generation) algorithm relaxes the TOPMODEL equilibrium assumption so baseflow generation is not tied to topography. TGR only uses the topographic index to distribute average storage to each topographic index class. The maximum storage capacity is proportional to the range of topographic index and is scaled by one parameter. The distribution of storage capacity within large-scale grid cells is obtained numerically through topographic analysis. The new topography-derived distribution function is then inserted into a runoff-generation framework similar VIC's. Different basin parts are parameterised by different storage capacities, and different shapes of the storage-distribution curves depend on their topographic characteristics. The TRG algorithm is driven by the HydroSHEDS dataset with a resolution of 3'' (around 90 m at the equator). The TRG algorithm was validated against the VIC algorithm in a common model framework in 3 river basins in different climates. The TRG algorithm performed equally well or marginally better than the VIC algorithm with one less parameter to be calibrated. The TRG algorithm also lacked equifinality problems and offered a realistic spatial pattern for runoff generation and evaporation.
Heterogeneity and scale of sustainable development in cities.
Brelsford, Christa; Lobo, José; Hand, Joe; Bettencourt, Luís M A
2017-08-22
Rapid worldwide urbanization is at once the main cause and, potentially, the main solution to global sustainable development challenges. The growth of cities is typically associated with increases in socioeconomic productivity, but it also creates strong inequalities. Despite a growing body of evidence characterizing these heterogeneities in developed urban areas, not much is known systematically about their most extreme forms in developing cities and their consequences for sustainability. Here, we characterize the general patterns of income and access to services in a large number of developing cities, with an emphasis on an extensive, high-resolution analysis of the urban areas of Brazil and South Africa. We use detailed census data to construct sustainable development indices in hundreds of thousands of neighborhoods and show that their statistics are scale-dependent and point to the critical role of large cities in creating higher average incomes and greater access to services within their national context. We then quantify the general statistical trajectory toward universal basic service provision at different scales to show that it is characterized by varying levels of inequality, with initial increases in access being typically accompanied by growing disparities over characteristic spatial scales. These results demonstrate how extensions of these methods to other goals and data can be used over time and space to produce a simple but general quantitative assessment of progress toward internationally agreed sustainable development goals.
The Spatial Scaling of Global Rainfall Extremes
NASA Astrophysics Data System (ADS)
Devineni, N.; Xi, C.; Lall, U.; Rahill-Marier, B.
2013-12-01
Floods associated with severe storms are a significant source of risk for property, life and supply chains. These property losses tend to be determined as much by the duration of flooding as by the depth and velocity of inundation. High duration floods are typically induced by persistent rainfall (upto 30 day duration) as seen recently in Thailand, Pakistan, the Ohio and the Mississippi Rivers, France, and Germany. Events related to persistent and recurrent rainfall appear to correspond to the persistence of specific global climate patterns that may be identifiable from global, historical data fields, and also from climate models that project future conditions. A clear understanding of the space-time rainfall patterns for events or for a season will enable in assessing the spatial distribution of areas likely to have a high/low inundation potential for each type of rainfall forcing. In this paper, we investigate the statistical properties of the spatial manifestation of the rainfall exceedances. We also investigate the connection of persistent rainfall events at different latitudinal bands to large-scale climate phenomena such as ENSO. Finally, we present the scaling phenomena of contiguous flooded areas as a result of large scale organization of long duration rainfall events. This can be used for spatially distributed flood risk assessment conditional on a particular rainfall scenario. Statistical models for spatio-temporal loss simulation including model uncertainty to support regional and portfolio analysis can be developed.
Heterogeneity and scale of sustainable development in cities
Brelsford, Christa; Lobo, José; Hand, Joe
2017-01-01
Rapid worldwide urbanization is at once the main cause and, potentially, the main solution to global sustainable development challenges. The growth of cities is typically associated with increases in socioeconomic productivity, but it also creates strong inequalities. Despite a growing body of evidence characterizing these heterogeneities in developed urban areas, not much is known systematically about their most extreme forms in developing cities and their consequences for sustainability. Here, we characterize the general patterns of income and access to services in a large number of developing cities, with an emphasis on an extensive, high-resolution analysis of the urban areas of Brazil and South Africa. We use detailed census data to construct sustainable development indices in hundreds of thousands of neighborhoods and show that their statistics are scale-dependent and point to the critical role of large cities in creating higher average incomes and greater access to services within their national context. We then quantify the general statistical trajectory toward universal basic service provision at different scales to show that it is characterized by varying levels of inequality, with initial increases in access being typically accompanied by growing disparities over characteristic spatial scales. These results demonstrate how extensions of these methods to other goals and data can be used over time and space to produce a simple but general quantitative assessment of progress toward internationally agreed sustainable development goals. PMID:28461489
Drought Persistence Errors in Global Climate Models
NASA Astrophysics Data System (ADS)
Moon, H.; Gudmundsson, L.; Seneviratne, S. I.
2018-04-01
The persistence of drought events largely determines the severity of socioeconomic and ecological impacts, but the capability of current global climate models (GCMs) to simulate such events is subject to large uncertainties. In this study, the representation of drought persistence in GCMs is assessed by comparing state-of-the-art GCM model simulations to observation-based data sets. For doing so, we consider dry-to-dry transition probabilities at monthly and annual scales as estimates for drought persistence, where a dry status is defined as negative precipitation anomaly. Though there is a substantial spread in the drought persistence bias, most of the simulations show systematic underestimation of drought persistence at global scale. Subsequently, we analyzed to which degree (i) inaccurate observations, (ii) differences among models, (iii) internal climate variability, and (iv) uncertainty of the employed statistical methods contribute to the spread in drought persistence errors using an analysis of variance approach. The results show that at monthly scale, model uncertainty and observational uncertainty dominate, while the contribution from internal variability is small in most cases. At annual scale, the spread of the drought persistence error is dominated by the statistical estimation error of drought persistence, indicating that the partitioning of the error is impaired by the limited number of considered time steps. These findings reveal systematic errors in the representation of drought persistence in current GCMs and suggest directions for further model improvement.
Online estimation of the wavefront outer scale profile from adaptive optics telemetry
NASA Astrophysics Data System (ADS)
Guesalaga, A.; Neichel, B.; Correia, C. M.; Butterley, T.; Osborn, J.; Masciadri, E.; Fusco, T.; Sauvage, J.-F.
2017-02-01
We describe an online method to estimate the wavefront outer scale profile, L0(h), for very large and future extremely large telescopes. The stratified information on this parameter impacts the estimation of the main turbulence parameters [turbulence strength, Cn2(h); Fried's parameter, r0; isoplanatic angle, θ0; and coherence time, τ0) and determines the performance of wide-field adaptive optics (AO) systems. This technique estimates L0(h) using data from the AO loop available at the facility instruments by constructing the cross-correlation functions of the slopes between two or more wavefront sensors, which are later fitted to a linear combination of the simulated theoretical layers having different altitudes and outer scale values. We analyse some limitations found in the estimation process: (I) its insensitivity to large values of L0(h) as the telescope becomes blind to outer scales larger than its diameter; (II) the maximum number of observable layers given the limited number of independent inputs that the cross-correlation functions provide and (III) the minimum length of data required for a satisfactory convergence of the turbulence parameters without breaking the assumption of statistical stationarity of the turbulence. The method is applied to the Gemini South multiconjugate AO system that comprises five wavefront sensors and two deformable mirrors. Statistics of L0(h) at Cerro Pachón from data acquired during 3 yr of campaigns show interesting resemblance to other independent results in the literature. A final analysis suggests that the impact of error sources will be substantially reduced in instruments of the next generation of giant telescopes.
Quantum probability, choice in large worlds, and the statistical structure of reality.
Ross, Don; Ladyman, James
2013-06-01
Classical probability models of incentive response are inadequate in "large worlds," where the dimensions of relative risk and the dimensions of similarity in outcome comparisons typically differ. Quantum probability models for choice in large worlds may be motivated pragmatically - there is no third theory - or metaphysically: statistical processing in the brain adapts to the true scale-relative structure of the universe.
Wavelet analysis of myocardium polarization images in problems of diagnostic of necrotic changes
NASA Astrophysics Data System (ADS)
Ushenko, Yu. O.; Vanchuliak, O.; Bodnar, G. B.; Ushenko, V. O.; Pavlyukovich, N.; Pavlyukovich, O. V.; Antonyuk, O.
2017-08-01
The paper presents the results of polarization manifestations of small - and Large-scale phase anisotropy of dead in consequence of ischemic heart disease (IHD) and acute coronary insufficiency (ACI) people myocardial tissue structures to differentiate information, the wavelet analysis method is used. The resulting maps of the of the polarizationcorrelation parameters distributions (the phase of the two-point first and second parameters of the Stokes vector) are analyzed in the framework of statistical approach. On this basis, the criteria for differential diagnosis of IHD and ACI cases have been determined.
The application of satellite data in monitoring strip mines
NASA Technical Reports Server (NTRS)
Sharber, L. A.; Shahrokhi, F.
1977-01-01
Strip mines in the New River Drainage Basin of Tennessee were studied through use of Landsat-1 imagery and aircraft photography. A multilevel analysis, involving conventional photo interpretation techniques, densitometric methods, multispectral analysis and statistical testing was applied to the data. The Landsat imagery proved adequate for monitoring large-scale change resulting from active mining and land-reclamation projects. However, the spatial resolution of the satellite imagery rendered it inadequate for assessment of many smaller strip mines, in the region which may be as small as a few hectares.
Kinoshita, Manabu; Sakai, Mio; Arita, Hideyuki; Shofuda, Tomoko; Chiba, Yasuyoshi; Kagawa, Naoki; Watanabe, Yoshiyuki; Hashimoto, Naoya; Fujimoto, Yasunori; Yoshimine, Toshiki; Nakanishi, Katsuyuki; Kanemura, Yonehiro
2016-01-01
Reports have suggested that tumor textures presented on T2-weighted images correlate with the genetic status of glioma. Therefore, development of an image analyzing framework that is capable of objective and high throughput image texture analysis for large scale image data collection is needed. The current study aimed to address the development of such a framework by introducing two novel parameters for image textures on T2-weighted images, i.e., Shannon entropy and Prewitt filtering. Twenty-two WHO grade 2 and 28 grade 3 glioma patients were collected whose pre-surgical MRI and IDH1 mutation status were available. Heterogeneous lesions showed statistically higher Shannon entropy than homogenous lesions (p = 0.006) and ROC curve analysis proved that Shannon entropy on T2WI was a reliable indicator for discrimination of homogenous and heterogeneous lesions (p = 0.015, AUC = 0.73). Lesions with well-defined borders exhibited statistically higher Edge mean and Edge median values using Prewitt filtering than those with vague lesion borders (p = 0.0003 and p = 0.0005 respectively). ROC curve analysis also proved that both Edge mean and median values were promising indicators for discrimination of lesions with vague and well defined borders and both Edge mean and median values performed in a comparable manner (p = 0.0002, AUC = 0.81 and p < 0.0001, AUC = 0.83, respectively). Finally, IDH1 wild type gliomas showed statistically lower Shannon entropy on T2WI than IDH1 mutated gliomas (p = 0.007) but no difference was observed between IDH1 wild type and mutated gliomas in Edge median values using Prewitt filtering. The current study introduced two image metrics that reflect lesion texture described on T2WI. These two metrics were validated by readings of a neuro-radiologist who was blinded to the results. This observation will facilitate further use of this technique in future large scale image analysis of glioma.
Application of multivariate statistical techniques in microbial ecology
Paliy, O.; Shankar, V.
2016-01-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large scale ecological datasets. Especially noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions, and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amounts of data, powerful statistical techniques of multivariate analysis are well suited to analyze and interpret these datasets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular dataset. In this review we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive, and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and dataset structure. PMID:26786791
Villagómez-Ornelas, Paloma; Hernández-López, Pedro; Carrasco-Enríquez, Brenda; Barrios-Sánchez, Karina; Pérez-Escamilla, Rafael; Melgar-Quiñónez, Hugo
2014-01-01
This article validates the statistical consistency of two food security scales: the Mexican Food Security Scale (EMSA) and the Latin American and Caribbean Food Security Scale (ELCSA). Validity tests were conducted in order to verify that both scales were consistent instruments, conformed by independent, properly calibrated and adequately sorted items, arranged in a continuum of severity. The following tests were developed: sorting of items; Cronbach's alpha analysis; parallelism of prevalence curves; Rasch models; sensitivity analysis through mean differences' hypothesis test. The tests showed that both scales meet the required attributes and are robust statistical instruments for food security measurement. This is relevant given that the lack of access to food indicator, included in multidimensional poverty measurement in Mexico, is calculated with EMSA.
A Large-Scale Super-Structure at z=0.65 in the UKIDSS Ultra-Deep Survey Field
NASA Astrophysics Data System (ADS)
Galametz, Audrey; Candels Clustering Working Group
2017-07-01
In hierarchical structure formation scenarios, galaxies accrete along high density filaments. Superclusters represent the largest density enhancements in the cosmic web with scales of 100 to 200 Mpc. As they represent the largest components of LSS, they are very powerful tools to constrain cosmological models. Since they also offer a wide range of density, from infalling group to high density cluster core, they are also the perfect laboratory to study the influence of environment on galaxy evolution. I will present a newly discovered large scale structure at z=0.65 in the UKIDSS UDS field. Although statistically predicted, the presence of such structure in UKIDSS, one of the most extensively covered and studied extragalactic field, remains a serendipity. Our follow-up confirmed more than 15 group members including at least three galaxy clusters with M200 10^14Msol . Deep spectroscopy of the quiescent core galaxies reveals that the most massive structure knots are at very different formation stage with a range of red sequence properties. Statistics allow us to map formation age across the structure denser knots and identify where quenching is most probably occurring across the LSS. Spectral diagnostics analysis also reveals an interesting population of transition galaxies we suspect are transforming from star-forming to quiescent galaxies.
Zorick, Todd; Mandelkern, Mark A
2015-01-01
Electroencephalography (EEG) is typically viewed through the lens of spectral analysis. Recently, multiple lines of evidence have demonstrated that the underlying neuronal dynamics are characterized by scale-free avalanches. These results suggest that techniques from statistical physics may be used to analyze EEG signals. We utilized a publicly available database of fourteen subjects with waking and sleep stage 2 EEG tracings per subject, and observe that power-law dynamics of critical-state neuronal avalanches are not sufficient to fully describe essential features of EEG signals. We hypothesized that this could reflect the phenomenon of discrete scale invariance (DSI) in EEG large voltage deflections (LVDs) as being more prominent in waking consciousness. We isolated LVDs, and analyzed logarithmically transformed LVD size probability density functions (PDF) to assess for DSI. We find evidence of increased DSI in waking, as opposed to sleep stage 2 consciousness. We also show that the signatures of DSI are specific for EEG LVDs, and not a general feature of fractal simulations with similar statistical properties to EEG. Removing only LVDs from waking EEG produces a reduction in power in the alpha and beta frequency bands. These findings may represent a new insight into the understanding of the cortical dynamics underlying consciousness.
Danis, Ildiko; Scheuring, Noemi; Papp, Eszter; Czinner, Antal
2012-06-01
A new instrument for assessing depressive mood, the first version of Depression Scale Questionnaire (DS1K) was published in 2008 by Halmai et al. This scale was used in our large sample study, in the framework of the For Healthy Offspring project, involving parents of young children. The original questionnaire was developed in small samples, so our aim was to assist further development of the instrument by the psychometric analysis of the data in our large sample (n=1164). The DS1K scale was chosen to measure the parents' mood and mental state in the For Healthy Offspring project. The questionnaire was completed by 1063 mothers and 328 fathers, yielding a heterogenous sample with respect to age and socio-demographic status. Analyses included main descriptive statistics, establishing the scales' inner consistency and some comparisons. Results were checked in our original and multiple imputed datasets as well. According to our results the reliability of our scale was much worse than in the original study (Cronbach alpha: 0.61 versus 0.88). During the detailed item-analysis it became clear that two items contributed to the observed decreased coherence. We assumed a problem related to misreading in case of one of these items. This assumption was checked by cross-analysis by the assumed reading level. According to our results the reliability of the scale was increased in both the lower and higher education level groups if we did not include one or both of these problematic items. However, as the number of items decreased, the relative sensitivity of the scale was also reduced, with fewer persons categorized in the risk group compared to the original scale. We suggest for the authors as an alternative solution to redefine the problematic items and retest the reliability of the measurement in a sample with diverse socio-demographic characteristics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogan, Craig
It is argued by extrapolation of general relativity and quantum mechanics that a classical inertial frame corresponds to a statistically defined observable that rotationally fluctuates due to Planck scale indeterminacy. Physical effects of exotic nonlocal rotational correlations on large scale field states are estimated. Their entanglement with the strong interaction vacuum is estimated to produce a universal, statistical centrifugal acceleration that resembles the observed cosmological constant.
DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts.
Lee, Donghyung; Bigdeli, T Bernard; Williamson, Vernell S; Vladimirov, Vladimir I; Riley, Brien P; Fanous, Ayman H; Bacanu, Silviu-Alin
2015-10-01
To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts. To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources. DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix. dlee4@vcu.edu Supplementary Data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Li, Jun-Wei; Cao, Jun-Wei
2010-04-01
One challenge in large-scale scientific data analysis is to monitor data in real-time in a distributed environment. For the LIGO (Laser Interferometer Gravitational-wave Observatory) project, a dedicated suit of data monitoring tools (DMT) has been developed, yielding good extensibility to new data type and high flexibility to a distributed environment. Several services are provided, including visualization of data information in various forms and file output of monitoring results. In this work, a DMT monitor, OmegaMon, is developed for tracking statistics of gravitational-wave (OW) burst triggers that are generated from a specific OW burst data analysis pipeline, the Omega Pipeline. Such results can provide diagnostic information as reference of trigger post-processing and interferometer maintenance.
Friedman, David B
2012-01-01
All quantitative proteomics experiments measure variation between samples. When performing large-scale experiments that involve multiple conditions or treatments, the experimental design should include the appropriate number of individual biological replicates from each condition to enable the distinction between a relevant biological signal from technical noise. Multivariate statistical analyses, such as principal component analysis (PCA), provide a global perspective on experimental variation, thereby enabling the assessment of whether the variation describes the expected biological signal or the unanticipated technical/biological noise inherent in the system. Examples will be shown from high-resolution multivariable DIGE experiments where PCA was instrumental in demonstrating biologically significant variation as well as sample outliers, fouled samples, and overriding technical variation that would not be readily observed using standard univariate tests.
Exploring the statistics of magnetic reconnection X-points in kinetic particle-in-cell turbulence
NASA Astrophysics Data System (ADS)
Haggerty, C. C.; Parashar, T. N.; Matthaeus, W. H.; Shay, M. A.; Yang, Y.; Wan, M.; Wu, P.; Servidio, S.
2017-10-01
Magnetic reconnection is a ubiquitous phenomenon in turbulent plasmas. It is an important part of the turbulent dynamics and heating of space and astrophysical plasmas. We examine the statistics of magnetic reconnection using a quantitative local analysis of the magnetic vector potential, previously used in magnetohydrodynamics simulations, and now employed to fully kinetic particle-in-cell (PIC) simulations. Different ways of reducing the particle noise for analysis purposes, including multiple smoothing techniques, are explored. We find that a Fourier filter applied at the Debye scale is an optimal choice for analyzing PIC data. Finally, we find a broader distribution of normalized reconnection rates compared to the MHD limit with rates as large as 0.5 but with an average of approximately 0.1.
Terai, Asuka; Nakagawa, Masanori
2007-08-01
The purpose of this paper is to construct a model that represents the human process of understanding metaphors, focusing specifically on similes of the form an "A like B". Generally speaking, human beings are able to generate and understand many sorts of metaphors. This study constructs the model based on a probabilistic knowledge structure for concepts which is computed from a statistical analysis of a large-scale corpus. Consequently, this model is able to cover the many kinds of metaphors that human beings can generate. Moreover, the model implements the dynamic process of metaphor understanding by using a neural network with dynamic interactions. Finally, the validity of the model is confirmed by comparing model simulations with the results from a psychological experiment.
Signatures of criticality arise from random subsampling in simple population models.
Nonnenmacher, Marcel; Behrens, Christian; Berens, Philipp; Bethge, Matthias; Macke, Jakob H
2017-10-01
The rise of large-scale recordings of neuronal activity has fueled the hope to gain new insights into the collective activity of neural ensembles. How can one link the statistics of neural population activity to underlying principles and theories? One attempt to interpret such data builds upon analogies to the behaviour of collective systems in statistical physics. Divergence of the specific heat-a measure of population statistics derived from thermodynamics-has been used to suggest that neural populations are optimized to operate at a "critical point". However, these findings have been challenged by theoretical studies which have shown that common inputs can lead to diverging specific heat. Here, we connect "signatures of criticality", and in particular the divergence of specific heat, back to statistics of neural population activity commonly studied in neural coding: firing rates and pairwise correlations. We show that the specific heat diverges whenever the average correlation strength does not depend on population size. This is necessarily true when data with correlations is randomly subsampled during the analysis process, irrespective of the detailed structure or origin of correlations. We also show how the characteristic shape of specific heat capacity curves depends on firing rates and correlations, using both analytically tractable models and numerical simulations of a canonical feed-forward population model. To analyze these simulations, we develop efficient methods for characterizing large-scale neural population activity with maximum entropy models. We find that, consistent with experimental findings, increases in firing rates and correlation directly lead to more pronounced signatures. Thus, previous reports of thermodynamical criticality in neural populations based on the analysis of specific heat can be explained by average firing rates and correlations, and are not indicative of an optimized coding strategy. We conclude that a reliable interpretation of statistical tests for theories of neural coding is possible only in reference to relevant ground-truth models.
Past and present cosmic structure in the SDSS DR7 main sample
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jasche, J.; Leclercq, F.; Wandelt, B.D., E-mail: jasche@iap.fr, E-mail: florent.leclercq@polytechnique.org, E-mail: wandelt@iap.fr
2015-01-01
We present a chrono-cosmography project, aiming at the inference of the four dimensional formation history of the observed large scale structure from its origin to the present epoch. To do so, we perform a full-scale Bayesian analysis of the northern galactic cap of the Sloan Digital Sky Survey (SDSS) Data Release 7 main galaxy sample, relying on a fully probabilistic, physical model of the non-linearly evolved density field. Besides inferring initial conditions from observations, our methodology naturally and accurately reconstructs non-linear features at the present epoch, such as walls and filaments, corresponding to high-order correlation functions generated by late-time structuremore » formation. Our inference framework self-consistently accounts for typical observational systematic and statistical uncertainties such as noise, survey geometry and selection effects. We further account for luminosity dependent galaxy biases and automatic noise calibration within a fully Bayesian approach. As a result, this analysis provides highly-detailed and accurate reconstructions of the present density field on scales larger than ∼ 3 Mpc/h, constrained by SDSS observations. This approach also leads to the first quantitative inference of plausible formation histories of the dynamic large scale structure underlying the observed galaxy distribution. The results described in this work constitute the first full Bayesian non-linear analysis of the cosmic large scale structure with the demonstrated capability of uncertainty quantification. Some of these results will be made publicly available along with this work. The level of detail of inferred results and the high degree of control on observational uncertainties pave the path towards high precision chrono-cosmography, the subject of simultaneously studying the dynamics and the morphology of the inhomogeneous Universe.« less
NASA Astrophysics Data System (ADS)
Uritskaya, Olga Y.
2005-05-01
Results of fractal stability analysis of daily exchange rate fluctuations of more than 30 floating currencies for a 10-year period are presented. It is shown for the first time that small- and large-scale dynamical instabilities of national monetary systems correlate with deviations of the detrended fluctuation analysis (DFA) exponent from the value 1.5 predicted by the efficient market hypothesis. The observed dependence is used for classification of long-term stability of floating exchange rates as well as for revealing various forms of distortion of stable currency dynamics prior to large-scale crises. A normal range of DFA exponents consistent with crisis-free long-term exchange rate fluctuations is determined, and several typical scenarios of unstable currency dynamics with DFA exponents fluctuating beyond the normal range are identified. It is shown that monetary crashes are usually preceded by prolonged periods of abnormal (decreased or increased) DFA exponent, with the after-crash exponent tending to the value 1.5 indicating a more reliable exchange rate dynamics. Statistically significant regression relations (R=0.99, p<0.01) between duration and magnitude of currency crises and the degree of distortion of monofractal patterns of exchange rate dynamics are found. It is demonstrated that the parameters of these relations characterizing small- and large-scale crises are nearly equal, which implies a common instability mechanism underlying these events. The obtained dependences have been used as a basic ingredient of a forecasting technique which provided correct in-sample predictions of monetary crisis magnitude and duration over various time scales. The developed technique can be recommended for real-time monitoring of dynamical stability of floating exchange rate systems and creating advanced early-warning-system models for currency crisis prevention.
Penetration of Large Scale Electric Field to Inner Magnetosphere
NASA Astrophysics Data System (ADS)
Chen, S. H.; Fok, M. C. H.; Sibeck, D. G.; Wygant, J. R.; Spence, H. E.; Larsen, B.; Reeves, G. D.; Funsten, H. O.
2015-12-01
The direct penetration of large scale global electric field to the inner magnetosphere is a critical element in controlling how the background thermal plasma populates within the radiation belts. These plasma populations provide the source of particles and free energy needed for the generation and growth of various plasma waves that, at critical points of resonances in time and phase space, can scatter or energize radiation belt particles to regulate the flux level of the relativistic electrons in the system. At high geomagnetic activity levels, the distribution of large scale electric fields serves as an important indicator of how prevalence of strong wave-particle interactions extend over local times and radial distances. To understand the complex relationship between the global electric fields and thermal plasmas, particularly due to the ionospheric dynamo and the magnetospheric convection effects, and their relations to the geomagnetic activities, we analyze the electric field and cold plasma measurements from Van Allen Probes over more than two years period and simulate a geomagnetic storm event using Coupled Inner Magnetosphere-Ionosphere Model (CIMI). Our statistical analysis of the measurements from Van Allan Probes and CIMI simulations of the March 17, 2013 storm event indicate that: (1) Global dawn-dusk electric field can penetrate the inner magnetosphere inside the inner belt below L~2. (2) Stronger convections occurred in the dusk and midnight sectors than those in the noon and dawn sectors. (3) Strong convections at multiple locations exist at all activity levels but more complex at higher activity levels. (4) At the high activity levels, strongest convections occur in the midnight sectors at larger distances from the Earth and in the dusk sector at closer distances. (5) Two plasma populations of distinct ion temperature isotropies divided at L-Shell ~2, indicating distinct heating mechanisms between inner and outer radiation belts. (6) CIMI simulations reveal alternating penetration and shielding electric fields during the main phase of the geomagnetic storm, indicating an impulsive nature of the large scale penetrating electric field in regulating the gain and loss of radiation belt particles. We will present the statistical analysis and simulations results.
NASA Astrophysics Data System (ADS)
Moore, T. S.; Sanderman, J.; Baldock, J.; Plante, A. F.
2016-12-01
National-scale inventories typically include soil organic carbon (SOC) content, but not chemical composition or biogeochemical stability. Australia's Soil Carbon Research Programme (SCaRP) represents a national inventory of SOC content and composition in agricultural systems. The program used physical fractionation followed by 13C nuclear magnetic resonance (NMR) spectroscopy. While these techniques are highly effective, they are typically too expensive and time consuming for use in large-scale SOC monitoring. We seek to understand if analytical thermal analysis is a viable alternative. Coupled differential scanning calorimetry (DSC) and evolved gas analysis (CO2- and H2O-EGA) yields valuable data on SOC composition and stability via ramped combustion. The technique requires little training to use, and does not require fractionation or other sample pre-treatment. We analyzed 300 agricultural samples collected by SCaRP, divided into four fractions: whole soil, coarse particulates (POM), untreated mineral associated (HUM), and hydrofluoric acid (HF)-treated HUM. All samples were analyzed by DSC-EGA, but only the POM and HF-HUM fractions were analyzed by NMR. Multivariate statistical analyses were used to explore natural clustering in SOC composition and stability based on DSC-EGA data. A partial least-squares regression (PLSR) model was used to explore correlations among the NMR and DSC-EGA data. Correlations demonstrated regions of combustion attributable to specific functional groups, which may relate to SOC stability. We are increasingly challenged with developing an efficient technique to assess SOC composition and stability at large spatial and temporal scales. Correlations between NMR and DSC-EGA may demonstrate the viability of using thermal analysis in lieu of more demanding methods in future large-scale surveys, and may provide data that goes beyond chemical composition to better approach quantification of biogeochemical stability.
Multifractal analysis of geophysical time series in the urban lake of Créteil (France).
NASA Astrophysics Data System (ADS)
Mezemate, Yacine; Tchiguirinskaia, Ioulia; Bonhomme, Celine; Schertzer, Daniel; Lemaire, Bruno Jacques; Vinçon leite, Brigitte; Lovejoy, Shaun
2013-04-01
Urban water bodies take part in the environmental quality of the cities. They regulate heat, contribute to the beauty of landscape and give some space for leisure activities (aquatic sports, swimming). As they are often artificial they are only a few meters deep. It confers them some specific properties. Indeed, they are particularly sensitive to global environmental changes, including climate change, eutrophication and contamination by micro-pollutants due to the urbanization of the watershed. Monitoring their quality has become a major challenge for urban areas. The need for a tool for predicting short-term proliferation of potentially toxic phytoplankton therefore arises. In lakes, the behavior of biological and physical (temperature) fields is mainly driven by the turbulence regime in the water. Turbulence is highly non linear, nonstationary and intermittent. This is why statistical tools are needed to characterize the evolution of the fields. The knowledge of the probability distribution of all the statistical moments of a given field is necessary to fully characterize it. This possibility is offered by the multifractal analysis based on the assumption of scale invariance. To investigate the effect of space-time variability of temperature, chlorophyll and dissolved oxygen on the cyanobacteria proliferation in the urban lake of Creteil (France), a spectral analysis is first performed on each time series (or on subsamples) to have an overall estimate of their scaling behaviors. Then a multifractal analysis (Trace Moment, Double Trace Moment) estimates the statistical moments of different orders. This analysis is adapted to the specific properties of the studied time series, i. e. the presence of large scale gradients. The nonlinear behavior of the scaling functions K(q) confirms that the investigated aquatic time series are indeed multifractal and highly intermittent .The knowledge of the universal multifractal parameters is the key to calculate the different statistical moments and thus make some predictions on the fields. As a conclusion, the relationships between the fields will be highlighted with a discussion on the cross predictability of the different fields. This draws a prospective for the use of this kind of time series analysis in the field of limnology. The authors acknowledge the financial support from the R2DS-PLUMMME and Climate-KIC BlueGreenDream projects.
On the linearity of tracer bias around voids
NASA Astrophysics Data System (ADS)
Pollina, Giorgia; Hamaus, Nico; Dolag, Klaus; Weller, Jochen; Baldi, Marco; Moscardini, Lauro
2017-07-01
The large-scale structure of the Universe can be observed only via luminous tracers of the dark matter. However, the clustering statistics of tracers are biased and depend on various properties, such as their host-halo mass and assembly history. On very large scales, this tracer bias results in a constant offset in the clustering amplitude, known as linear bias. Towards smaller non-linear scales, this is no longer the case and tracer bias becomes a complicated function of scale and time. We focus on tracer bias centred on cosmic voids, I.e. depressions of the density field that spatially dominate the Universe. We consider three types of tracers: galaxies, galaxy clusters and active galactic nuclei, extracted from the hydrodynamical simulation Magneticum Pathfinder. In contrast to common clustering statistics that focus on auto-correlations of tracers, we find that void-tracer cross-correlations are successfully described by a linear bias relation. The tracer-density profile of voids can thus be related to their matter-density profile by a single number. We show that it coincides with the linear tracer bias extracted from the large-scale auto-correlation function and expectations from theory, if sufficiently large voids are considered. For smaller voids we observe a shift towards higher values. This has important consequences on cosmological parameter inference, as the problem of unknown tracer bias is alleviated up to a constant number. The smallest scales in existing data sets become accessible to simpler models, providing numerous modes of the density field that have been disregarded so far, but may help to further reduce statistical errors in constraining cosmology.
Detector Development for the MARE Neutrino Experiment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Galeazzi, M.; Bogorin, D.; Molina, R.
2009-12-16
The MARE experiment is designed to measure the mass of the neutrino with sub-eV sensitivity by measuring the beta decay of {sup 187}Re with cryogenic microcalorimeters. A preliminary analysis shows that, to achieve the necessary statistics, between 10,000 and 50,000 detectors are likely necessary. We have fabricated and characterized Iridium transition edge sensors with high reproducibility and uniformity for such a large scale experiment. We have also started a full scale simulation of the experimental setup for MARE, including thermalization in the absorber, detector response, and optimum filter analysis, to understand the issues related to reaching a sub-eV sensitivity andmore » to optimize the design of the MARE experiment. We present our characterization of the Ir devices, including reproducibility, uniformity, and sensitivity, and we discuss the implementation and capabilities of our full scale simulation.« less
Relationship of epithermal gold deposits to large-scale fractures in northern Nevada
Ponce, D.A.; Glen, J.M.G.
2002-01-01
Geophysical maps of northern Nevada reveal at least three and possibly six large-scale arcuate features, one of which corresponds to the northern Nevada rift that possibly extends more than 1,000 km from the Oregon- Idaho border to southern Nevada. These features may reflect deep discontinuities within the earth's crust, possibly related to the impact of the Yellowstone hot spot. Because mid-Miocene epithermal gold deposits have been shown to correlate with the northern Nevada rift, we investigate the association of other epithermal gold deposits to other similar arcuate features in northern Nevada. Mid-Miocene and younger epithermal gold- silver deposits also occur along two prominent aeromagnetic anomalies west of the northern Nevada rift. Here, we speculate that mid-Miocene deposits formed along deep fractures in association with mid-Miocene rift- related magmatism and that younger deposits preferentially followed these preexisting features. Statistical analysis of the proximity of epithermal gold deposits to these features suggests that epithermal gold deposits in northern Nevada are spatially associated with large-scale crustal features interpreted from geophysical data.
Techniques for automatic large scale change analysis of temporal multispectral imagery
NASA Astrophysics Data System (ADS)
Mercovich, Ryan A.
Change detection in remotely sensed imagery is a multi-faceted problem with a wide variety of desired solutions. Automatic change detection and analysis to assist in the coverage of large areas at high resolution is a popular area of research in the remote sensing community. Beyond basic change detection, the analysis of change is essential to provide results that positively impact an image analyst's job when examining potentially changed areas. Present change detection algorithms are geared toward low resolution imagery, and require analyst input to provide anything more than a simple pixel level map of the magnitude of change that has occurred. One major problem with this approach is that change occurs in such large volume at small spatial scales that a simple change map is no longer useful. This research strives to create an algorithm based on a set of metrics that performs a large area search for change in high resolution multispectral image sequences and utilizes a variety of methods to identify different types of change. Rather than simply mapping the magnitude of any change in the scene, the goal of this research is to create a useful display of the different types of change in the image. The techniques presented in this dissertation are used to interpret large area images and provide useful information to an analyst about small regions that have undergone specific types of change while retaining image context to make further manual interpretation easier. This analyst cueing to reduce information overload in a large area search environment will have an impact in the areas of disaster recovery, search and rescue situations, and land use surveys among others. By utilizing a feature based approach founded on applying existing statistical methods and new and existing topological methods to high resolution temporal multispectral imagery, a novel change detection methodology is produced that can automatically provide useful information about the change occurring in large area and high resolution image sequences. The change detection and analysis algorithm developed could be adapted to many potential image change scenarios to perform automatic large scale analysis of change.
Land change variability and human-environment dynamics in the United States Great Plains
Drummond, M.A.; Auch, Roger F.; Karstensen, K.A.; Sayler, K. L.; Taylor, Janis L.; Loveland, Thomas R.
2012-01-01
Land use and land cover changes have complex linkages to climate variability and change, biophysical resources, and socioeconomic driving forces. To assess these land change dynamics and their causes in the Great Plains, we compare and contrast contemporary changes across 16 ecoregions using Landsat satellite data and statistical analysis. Large-area change analysis of agricultural regions is often hampered by change detection error and the tendency for land conversions to occur at the local-scale. To facilitate a regional-scale analysis, a statistical sampling design of randomly selected 10 km × 10 km blocks is used to efficiently identify the types and rates of land conversions for four time intervals between 1973 and 2000, stratified by relatively homogenous ecoregions. Nearly 8% of the overall Great Plains region underwent land-use and land-cover change during the study period, with a substantial amount of ecoregion variability that ranged from less than 2% to greater than 13%. Agricultural land cover declined by more than 2% overall, with variability contingent on the differential characteristics of regional human–environment systems. A large part of the Great Plains is in relatively stable land cover. However, other land systems with significant biophysical and climate limitations for agriculture have high rates of land change when pushed by economic, policy, technology, or climate forcing factors. The results indicate the regionally based potential for land cover to persist or fluctuate as land uses are adapted to spatially and temporally variable forcing factors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilson, Eric J
The ResStock analysis tool is helping states, municipalities, utilities, and manufacturers identify which home upgrades save the most energy and money. Across the country there's a vast diversity in the age, size, construction practices, installed equipment, appliances, and resident behavior of the housing stock, not to mention the range of climates. These variations have hindered the accuracy of predicting savings for existing homes. Researchers at the National Renewable Energy Laboratory (NREL) developed ResStock. It's a versatile tool that takes a new approach to large-scale residential energy analysis by combining: large public and private data sources, statistical sampling, detailed subhourly buildingmore » simulations, high-performance computing. This combination achieves unprecedented granularity and most importantly - accuracy - in modeling the diversity of the single-family housing stock.« less
Identifying the scale-dependent motifs in atmospheric surface layer by ordinal pattern analysis
NASA Astrophysics Data System (ADS)
Li, Qinglei; Fu, Zuntao
2018-07-01
Ramp-like structures in various atmospheric surface layer time series have been long studied, but the presence of motifs with the finer scale embedded within larger scale ramp-like structures has largely been overlooked in the reported literature. Here a novel, objective and well-adapted methodology, the ordinal pattern analysis, is adopted to study the finer-scaled motifs in atmospheric boundary-layer (ABL) time series. The studies show that the motifs represented by different ordinal patterns take clustering properties and 6 dominated motifs out of the whole 24 motifs account for about 45% of the time series under particular scales, which indicates the higher contribution of motifs with the finer scale to the series. Further studies indicate that motif statistics are similar for both stable conditions and unstable conditions at larger scales, but large discrepancies are found at smaller scales, and the frequencies of motifs "1234" and/or "4321" are a bit higher under stable conditions than unstable conditions. Under stable conditions, there are great changes for the occurrence frequencies of motifs "1234" and "4321", where the occurrence frequencies of motif "1234" decrease from nearly 24% to 4.5% with the scale factor increasing, and the occurrence frequencies of motif "4321" change nonlinearly with the scale increasing. These great differences of dominated motifs change with scale can be taken as an indicator to quantify the flow structure changes under different stability conditions, and motif entropy can be defined just by only 6 dominated motifs to quantify this time-scale independent property of the motifs. All these results suggest that the defined scale of motifs with the finer scale should be carefully taken into consideration in the interpretation of turbulence coherent structures.
Ou, Sai-Hong Ignatius; Tang, Yiyun; Polli, Anna; Wilner, Keith D; Schnell, Patrick
2016-04-01
Decreases in heart rate (HR) have been described in patients receiving crizotinib. We performed a large retrospective analysis of HR changes during crizotinib therapy. HRs from vital-sign data for patients with anaplastic lymphoma kinase (ALK)-positive nonsmall cell lung cancer enrolled in PROFILE 1005 and the crizotinib arm of PROFILE 1007 were analyzed. Sinus bradycardia (SB) was defined as HR <60 beats per minute (bpm). Magnitude and timing of HR changes were assessed. Potential risk factors for SB were investigated by logistic regression analysis. Progression-free survival (PFS) was evaluated according to HR decrease by <20 versus ≥ 20 bpm within the first 50 days of starting treatment. For the 1053 patients analyzed, the mean maximum postbaseline HR decrease was 25 bpm (standard deviation 15.8). Overall, 441 patients (41.9%) had at least one episode of postbaseline SB. The mean precrizotinib treatment HR was significantly lower among patients with versus without postbaseline SB (82.2 bpm vs. 92.6 bpm). The likelihood of experiencing SB was statistically significantly higher among patients with a precrizotinib treatment HR <70 bpm. PFS was comparable among patients with or without HR decrease of ≥ 20 bpm within the first 50 days of starting crizotinib. Decrease in HR is very common among patients on crizotinib. The likelihood of experiencing SB was statistically significantly higher among patients with a precrizotinib treatment HR <70 bpm. This is the first large-scale report investigating the association between treatment with a tyrosine kinase inhibitor and the development of bradycardia. HRs should be closely monitored during crizotinib treatment. © 2016 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.
Statistical analysis of subjective preferences for video enhancement
NASA Astrophysics Data System (ADS)
Woods, Russell L.; Satgunam, PremNandhini; Bronstad, P. Matthew; Peli, Eli
2010-02-01
Measuring preferences for moving video quality is harder than for static images due to the fleeting and variable nature of moving video. Subjective preferences for image quality can be tested by observers indicating their preference for one image over another. Such pairwise comparisons can be analyzed using Thurstone scaling (Farrell, 1999). Thurstone (1927) scaling is widely used in applied psychology, marketing, food tasting and advertising research. Thurstone analysis constructs an arbitrary perceptual scale for the items that are compared (e.g. enhancement levels). However, Thurstone scaling does not determine the statistical significance of the differences between items on that perceptual scale. Recent papers have provided inferential statistical methods that produce an outcome similar to Thurstone scaling (Lipovetsky and Conklin, 2004). Here, we demonstrate that binary logistic regression can analyze preferences for enhanced video.
Spectral enstrophy budget in a shear-less flow with turbulent/non-turbulent interface
NASA Astrophysics Data System (ADS)
Cimarelli, Andrea; Cocconi, Giacomo; Frohnapfel, Bettina; De Angelis, Elisabetta
2015-12-01
A numerical analysis of the interaction between decaying shear free turbulence and quiescent fluid is performed by means of global statistical budgets of enstrophy, both, at the single-point and two point levels. The single-point enstrophy budget allows us to recognize three physically relevant layers: a bulk turbulent region, an inhomogeneous turbulent layer, and an interfacial layer. Within these layers, enstrophy is produced, transferred, and finally destroyed while leading to a propagation of the turbulent front. These processes do not only depend on the position in the flow field but are also strongly scale dependent. In order to tackle this multi-dimensional behaviour of enstrophy in the space of scales and in physical space, we analyse the spectral enstrophy budget equation. The picture consists of an inviscid spatial cascade of enstrophy from large to small scales parallel to the interface moving towards the interface. At the interface, this phenomenon breaks, leaving place to an anisotropic cascade where large scale structures exhibit only a cascade process normal to the interface thus reducing their thickness while retaining their lengths parallel to the interface. The observed behaviour could be relevant for both the theoretical and the modelling approaches to flow with interacting turbulent/nonturbulent regions. The scale properties of the turbulent propagation mechanisms highlight that the inviscid turbulent transport is a large-scale phenomenon. On the contrary, the viscous diffusion, commonly associated with small scale mechanisms, highlights a much richer physics involving small lengths, normal to the interface, but at the same time large scales, parallel to the interface.
Statistical genetics concepts and approaches in schizophrenia and related neuropsychiatric research.
Schork, Nicholas J; Greenwood, Tiffany A; Braff, David L
2007-01-01
Statistical genetics is a research field that focuses on mathematical models and statistical inference methodologies that relate genetic variations (ie, naturally occurring human DNA sequence variations or "polymorphisms") to particular traits or diseases (phenotypes) usually from data collected on large samples of families or individuals. The ultimate goal of such analysis is the identification of genes and genetic variations that influence disease susceptibility. Although of extreme interest and importance, the fact that many genes and environmental factors contribute to neuropsychiatric diseases of public health importance (eg, schizophrenia, bipolar disorder, and depression) complicates relevant studies and suggests that very sophisticated mathematical and statistical modeling may be required. In addition, large-scale contemporary human DNA sequencing and related projects, such as the Human Genome Project and the International HapMap Project, as well as the development of high-throughput DNA sequencing and genotyping technologies have provided statistical geneticists with a great deal of very relevant and appropriate information and resources. Unfortunately, the use of these resources and their interpretation are not straightforward when applied to complex, multifactorial diseases such as schizophrenia. In this brief and largely nonmathematical review of the field of statistical genetics, we describe many of the main concepts, definitions, and issues that motivate contemporary research. We also provide a discussion of the most pressing contemporary problems that demand further research if progress is to be made in the identification of genes and genetic variations that predispose to complex neuropsychiatric diseases.
DEMNUni: massive neutrinos and the bispectrum of large scale structures
NASA Astrophysics Data System (ADS)
Ruggeri, Rossana; Castorina, Emanuele; Carbone, Carmelita; Sefusatti, Emiliano
2018-03-01
The main effect of massive neutrinos on the large-scale structure consists in a few percent suppression of matter perturbations on all scales below their free-streaming scale. Such effect is of particular importance as it allows to constraint the value of the sum of neutrino masses from measurements of the galaxy power spectrum. In this work, we present the first measurements of the next higher-order correlation function, the bispectrum, from N-body simulations that include massive neutrinos as particles. This is the simplest statistics characterising the non-Gaussian properties of the matter and dark matter halos distributions. We investigate, in the first place, the suppression due to massive neutrinos on the matter bispectrum, comparing our measurements with the simplest perturbation theory predictions, finding the approximation of neutrinos contributing at quadratic order in perturbation theory to provide a good fit to the measurements in the simulations. On the other hand, as expected, a linear approximation for neutrino perturbations would lead to Script O(fν) errors on the total matter bispectrum at large scales. We then attempt an extension of previous results on the universality of linear halo bias in neutrino cosmologies, to non-linear and non-local corrections finding consistent results with the power spectrum analysis.
ERIC Educational Resources Information Center
O'Bryant, Monique J.
2017-01-01
The aim of this study was to validate an instrument that can be used by instructors or social scientist who are interested in evaluating statistics anxiety. The psychometric properties of the English version of the Statistical Anxiety Scale (SAS) was examined through a confirmatory factor analysis of scores from a sample of 323 undergraduate…
Impact of Design Effects in Large-Scale District and State Assessments
ERIC Educational Resources Information Center
Phillips, Gary W.
2015-01-01
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Evaluating the Effectiveness of a Large-Scale Professional Development Programme
ERIC Educational Resources Information Center
Main, Katherine; Pendergast, Donna
2017-01-01
An evaluation of the effectiveness of a large-scale professional development (PD) programme delivered to 258 schools in Queensland, Australia is presented. Formal evaluations were conducted at two stages during the programme using a tool developed from Desimone's five core features of effective PD. Descriptive statistics of 38 questions and…
Fire management over large landscapes: a hierarchical approach
Kenneth G. Boykin
2008-01-01
Management planning for fires becomes increasingly difficult as scale increases. Stratification provides land managers with multiple scales in which to prepare plans. Using statistical techniques, Geographic Information Systems (GIS), and meetings with land managers, we divided a large landscape of over 2 million acres (White Sands Missile Range) into parcels useful in...
Effect of Variable Spatial Scales on USLE-GIS Computations
NASA Astrophysics Data System (ADS)
Patil, R. J.; Sharma, S. K.
2017-12-01
Use of appropriate spatial scale is very important in Universal Soil Loss Equation (USLE) based spatially distributed soil erosion modelling. This study aimed at assessment of annual rates of soil erosion at different spatial scales/grid sizes and analysing how changes in spatial scales affect USLE-GIS computations using simulation and statistical variabilities. Efforts have been made in this study to recommend an optimum spatial scale for further USLE-GIS computations for management and planning in the study area. The present research study was conducted in Shakkar River watershed, situated in Narsinghpur and Chhindwara districts of Madhya Pradesh, India. Remote Sensing and GIS techniques were integrated with Universal Soil Loss Equation (USLE) to predict spatial distribution of soil erosion in the study area at four different spatial scales viz; 30 m, 50 m, 100 m, and 200 m. Rainfall data, soil map, digital elevation model (DEM) and an executable C++ program, and satellite image of the area were used for preparation of the thematic maps for various USLE factors. Annual rates of soil erosion were estimated for 15 years (1992 to 2006) at four different grid sizes. The statistical analysis of four estimated datasets showed that sediment loss dataset at 30 m spatial scale has a minimum standard deviation (2.16), variance (4.68), percent deviation from observed values (2.68 - 18.91 %), and highest coefficient of determination (R2 = 0.874) among all the four datasets. Thus, it is recommended to adopt this spatial scale for USLE-GIS computations in the study area due to its minimum statistical variability and better agreement with the observed sediment loss data. This study also indicates large scope for use of finer spatial scales in spatially distributed soil erosion modelling.
Randomized central limit theorems: A unified theory.
Eliazar, Iddo; Klafter, Joseph
2010-08-01
The central limit theorems (CLTs) characterize the macroscopic statistical behavior of large ensembles of independent and identically distributed random variables. The CLTs assert that the universal probability laws governing ensembles' aggregate statistics are either Gaussian or Lévy, and that the universal probability laws governing ensembles' extreme statistics are Fréchet, Weibull, or Gumbel. The scaling schemes underlying the CLTs are deterministic-scaling all ensemble components by a common deterministic scale. However, there are "random environment" settings in which the underlying scaling schemes are stochastic-scaling the ensemble components by different random scales. Examples of such settings include Holtsmark's law for gravitational fields and the Stretched Exponential law for relaxation times. In this paper we establish a unified theory of randomized central limit theorems (RCLTs)-in which the deterministic CLT scaling schemes are replaced with stochastic scaling schemes-and present "randomized counterparts" to the classic CLTs. The RCLT scaling schemes are shown to be governed by Poisson processes with power-law statistics, and the RCLTs are shown to universally yield the Lévy, Fréchet, and Weibull probability laws.
Randomized central limit theorems: A unified theory
NASA Astrophysics Data System (ADS)
Eliazar, Iddo; Klafter, Joseph
2010-08-01
The central limit theorems (CLTs) characterize the macroscopic statistical behavior of large ensembles of independent and identically distributed random variables. The CLTs assert that the universal probability laws governing ensembles’ aggregate statistics are either Gaussian or Lévy, and that the universal probability laws governing ensembles’ extreme statistics are Fréchet, Weibull, or Gumbel. The scaling schemes underlying the CLTs are deterministic—scaling all ensemble components by a common deterministic scale. However, there are “random environment” settings in which the underlying scaling schemes are stochastic—scaling the ensemble components by different random scales. Examples of such settings include Holtsmark’s law for gravitational fields and the Stretched Exponential law for relaxation times. In this paper we establish a unified theory of randomized central limit theorems (RCLTs)—in which the deterministic CLT scaling schemes are replaced with stochastic scaling schemes—and present “randomized counterparts” to the classic CLTs. The RCLT scaling schemes are shown to be governed by Poisson processes with power-law statistics, and the RCLTs are shown to universally yield the Lévy, Fréchet, and Weibull probability laws.
Grid-Enabled Quantitative Analysis of Breast Cancer
2010-10-01
large-scale, multi-modality computerized image analysis . The central hypothesis of this research is that large-scale image analysis for breast cancer...research, we designed a pilot study utilizing large scale parallel Grid computing harnessing nationwide infrastructure for medical image analysis . Also
Cloud encounter statistics in the 28.5-43.5 KFT altitude region from four years of GASP observations
NASA Technical Reports Server (NTRS)
Jasperson, W. H.; Nastrom, G. D.; Davis, R. E.; Holdeman, J. D.
1983-01-01
The results of an analysis of cloud encounter measurements taken at aircraft flight altitudes as part of the Global Atmospheric Sampling Program are summarized. The results can be used in estimating the probability of cloud encounter and in assessing the economic feasibility of laminar flow control aircraft along particular routes. The data presented clearly show the tropical circulation and its seasonal migration; characteristics of the mid-latitude regime, such as the large-scale traveling cyclones in the winter and increased convective activity in the summer, can be isolated in the data. The cloud encounter statistics are shown to be consistent with the mid-latitude cyclone model. A model for TIC (time-in-clouds), a cloud encounter statistic, is presented for several common airline routes.
Observed and Simulated Radiative and Microphysical Properties of Tropical Convective Storms
NASA Technical Reports Server (NTRS)
DelGenio, Anthony D.; Hansen, James E. (Technical Monitor)
2001-01-01
Increases in the ice content, albedo and cloud cover of tropical convective storms in a warmer climate produce a large negative contribution to cloud feedback in the GISS GCM. Unfortunately, the physics of convective upward water transport, detrainment, and ice sedimentation, and the relationship of microphysical to radiative properties, are all quite uncertain. We apply a clustering algorithm to TRMM satellite microwave rainfall retrievals to identify contiguous deep precipitating storms throughout the tropics. Each storm is characterized according to its size, albedo, OLR, rain rate, microphysical structure, and presence/absence of lightning. A similar analysis is applied to ISCCP data during the TOGA/COARE experiment to identify optically thick deep cloud systems and relate them to large-scale environmental conditions just before storm onset. We examine the statistics of these storms to understand the relative climatic roles of small and large storms and the factors that regulate convective storm size and albedo. The results are compared to GISS GCM simulated statistics of tropical convective storms to identify areas of agreement and disagreement.
Karain, Wael I
2017-11-28
Proteins undergo conformational transitions over different time scales. These transitions are closely intertwined with the protein's function. Numerous standard techniques such as principal component analysis are used to detect these transitions in molecular dynamics simulations. In this work, we add a new method that has the ability to detect transitions in dynamics based on the recurrences in the dynamical system. It combines bootstrapping and recurrence quantification analysis. We start from the assumption that a protein has a "baseline" recurrence structure over a given period of time. Any statistically significant deviation from this recurrence structure, as inferred from complexity measures provided by recurrence quantification analysis, is considered a transition in the dynamics of the protein. We apply this technique to a 132 ns long molecular dynamics simulation of the β-Lactamase Inhibitory Protein BLIP. We are able to detect conformational transitions in the nanosecond range in the recurrence dynamics of the BLIP protein during the simulation. The results compare favorably to those extracted using the principal component analysis technique. The recurrence quantification analysis based bootstrap technique is able to detect transitions between different dynamics states for a protein over different time scales. It is not limited to linear dynamics regimes, and can be generalized to any time scale. It also has the potential to be used to cluster frames in molecular dynamics trajectories according to the nature of their recurrence dynamics. One shortcoming for this method is the need to have large enough time windows to insure good statistical quality for the recurrence complexity measures needed to detect the transitions.
NASA Technical Reports Server (NTRS)
Strub, P. Ted; James, Corinne; Thomas, Andrew C.; Abbott, Mark R.
1990-01-01
The large-scale patterns of satellite-derived surface pigment concentration off the west coast of North America are presented and are averaged into monthly mean surface wind fields over the California Current system (CCS) for the July 1979 to June 1986 period. The patterns are discussed in terms of both seasonal and nonseasonal variability for the indicated time period. The large-scale seasonal characteristics of the California Current are summarized. The data and methods used are described, and the problems known to affect the satellite-derived pigment concentrations and the wind data used in the study are discussed. The statistical analysis results are then presented and discussed in light of past observations and theory. Details of the CZCS data processing are described, and details of the principal estimator pattern methodology used here are given.
N-point statistics of large-scale structure in the Zel'dovich approximation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tassev, Svetlin, E-mail: tassev@astro.princeton.edu
2014-06-01
Motivated by the results presented in a companion paper, here we give a simple analytical expression for the matter n-point functions in the Zel'dovich approximation (ZA) both in real and in redshift space (including the angular case). We present numerical results for the 2-dimensional redshift-space correlation function, as well as for the equilateral configuration for the real-space 3-point function. We compare those to the tree-level results. Our analysis is easily extendable to include Lagrangian bias, as well as higher-order perturbative corrections to the ZA. The results should be especially useful for modelling probes of large-scale structure in the linear regime,more » such as the Baryon Acoustic Oscillations. We make the numerical code used in this paper freely available.« less
NASA Astrophysics Data System (ADS)
Gruzdev, A. N.
2017-07-01
Using the data of the ERA-Interim reanalysis, we have obtained estimates of changes in temperature, the geopotential and its large-scale zonal harmonics, wind velocity, and potential vorticity in the troposphere and stratosphere of the Northern and Southern hemispheres during the 11-year solar cycle. The estimates have been obtained using the method of multiple linear regression. Specific features of response of the indicated atmospheric parameters to the solar cycle have been revealed in particular regions of the atmosphere for a whole year and depending on the season. The results of the analysis indicate the existence of a reliable statistical relationship of large-scale dynamic and thermodynamic processes in the troposphere and stratosphere with the 11-year solar cycle.
Flow topologies and turbulence scales in a jet-in-cross-flow
Oefelein, Joseph C.; Ruiz, Anthony M.; Lacaze, Guilhem
2015-04-03
This study presents a detailed analysis of the flow topologies and turbulence scales in the jet-in-cross-flow experiment of [Su and Mungal JFM 2004]. The analysis is performed using the Large Eddy Simulation (LES) technique with a highly resolved grid and time-step and well controlled boundary conditions. This enables quantitative agreement with the first and second moments of turbulence statistics measured in the experiment. LES is used to perform the analysis since experimental measurements of time-resolved 3D fields are still in their infancy and because sampling periods are generally limited with direct numerical simulation. A major focal point is the comprehensivemore » characterization of the turbulence scales and their evolution. Time-resolved probes are used with long sampling periods to obtain maps of the integral scales, Taylor microscales, and turbulent kinetic energy spectra. Scalar-fluctuation scales are also quantified. In the near-field, coherent structures are clearly identified, both in physical and spectral space. Along the jet centerline, turbulence scales grow according to a classical one-third power law. However, the derived maps of turbulence scales reveal strong inhomogeneities in the flow. From the modeling perspective, these insights are useful to design optimized grids and improve numerical predictions in similar configurations.« less
Vortex dynamics and Lagrangian statistics in a model for active turbulence.
James, Martin; Wilczek, Michael
2018-02-14
Cellular suspensions such as dense bacterial flows exhibit a turbulence-like phase under certain conditions. We study this phenomenon of "active turbulence" statistically by using numerical tools. Following Wensink et al. (Proc. Natl. Acad. Sci. U.S.A. 109, 14308 (2012)), we model active turbulence by means of a generalized Navier-Stokes equation. Two-point velocity statistics of active turbulence, both in the Eulerian and the Lagrangian frame, is explored. We characterize the scale-dependent features of two-point statistics in this system. Furthermore, we extend this statistical study with measurements of vortex dynamics in this system. Our observations suggest that the large-scale statistics of active turbulence is close to Gaussian with sub-Gaussian tails.
2013-01-01
Background The binding of transcription factors to DNA plays an essential role in the regulation of gene expression. Numerous experiments elucidated binding sequences which subsequently have been used to derive statistical models for predicting potential transcription factor binding sites (TFBS). The rapidly increasing number of genome sequence data requires sophisticated computational approaches to manage and query experimental and predicted TFBS data in the context of other epigenetic factors and across different organisms. Results We have developed D-Light, a novel client-server software package to store and query large amounts of TFBS data for any number of genomes. Users can add small-scale data to the server database and query them in a large scale, genome-wide promoter context. The client is implemented in Java and provides simple graphical user interfaces and data visualization. Here we also performed a statistical analysis showing what a user can expect for certain parameter settings and we illustrate the usage of D-Light with the help of a microarray data set. Conclusions D-Light is an easy to use software tool to integrate, store and query annotation data for promoters. A public D-Light server, the client and server software for local installation and the source code under GNU GPL license are available at http://biwww.che.sbg.ac.at/dlight. PMID:23617301
Statistical simulation of the magnetorotational dynamo.
Squire, J; Bhattacharjee, A
2015-02-27
Turbulence and dynamo induced by the magnetorotational instability (MRI) are analyzed using quasilinear statistical simulation methods. It is found that homogenous turbulence is unstable to a large-scale dynamo instability, which saturates to an inhomogenous equilibrium with a strong dependence on the magnetic Prandtl number (Pm). Despite its enormously reduced nonlinearity, the dependence of the angular momentum transport on Pm in the quasilinear model is qualitatively similar to that of nonlinear MRI turbulence. This demonstrates the importance of the large-scale dynamo and suggests how dramatically simplified models may be used to gain insight into the astrophysically relevant regimes of very low or high Pm.
Non-Hookean statistical mechanics of clamped graphene ribbons
NASA Astrophysics Data System (ADS)
Bowick, Mark J.; Košmrlj, Andrej; Nelson, David R.; Sknepnek, Rastko
2017-03-01
Thermally fluctuating sheets and ribbons provide an intriguing forum in which to investigate strong violations of Hooke's Law: Large distance elastic parameters are in fact not constant but instead depend on the macroscopic dimensions. Inspired by recent experiments on free-standing graphene cantilevers, we combine the statistical mechanics of thin elastic plates and large-scale numerical simulations to investigate the thermal renormalization of the bending rigidity of graphene ribbons clamped at one end. For ribbons of dimensions W ×L (with L ≥W ), the macroscopic bending rigidity κR determined from cantilever deformations is independent of the width when W <ℓth , where ℓth is a thermal length scale, as expected. When W >ℓth , however, this thermally renormalized bending rigidity begins to systematically increase, in agreement with the scaling theory, although in our simulations we were not quite able to reach the system sizes necessary to determine the fully developed power law dependence on W . When the ribbon length L >ℓp , where ℓp is the W -dependent thermally renormalized ribbon persistence length, we observe a scaling collapse and the beginnings of large scale random walk behavior.
A scaling procedure for the response of an isolated system with high modal overlap factor
NASA Astrophysics Data System (ADS)
De Rosa, S.; Franco, F.
2008-10-01
The paper deals with a numerical approach that reduces some physical sizes of the solution domain to compute the dynamic response of an isolated system: it has been named Asymptotical Scaled Modal Analysis (ASMA). The proposed numerical procedure alters the input data needed to obtain the classic modal responses to increase the frequency band of validity of the discrete or continuous coordinates model through the definition of a proper scaling coefficient. It is demonstrated that the computational cost remains acceptable while the frequency range of analysis increases. Moreover, with reference to the flexural vibrations of a rectangular plate, the paper discusses the ASMA vs. the statistical energy analysis and the energy distribution approach. Some insights are also given about the limits of the scaling coefficient. Finally it is shown that the linear dynamic response, predicted with the scaling procedure, has the same quality and characteristics of the statistical energy analysis, but it can be useful when the system cannot be solved appropriately by the standard Statistical Energy Analysis (SEA).
Unicomb, Rachael; Colyvas, Kim; Harrison, Elisabeth; Hewat, Sally
2015-06-01
Case-study methodology studying change is often used in the field of speech-language pathology, but it can be criticized for not being statistically robust. Yet with the heterogeneous nature of many communication disorders, case studies allow clinicians and researchers to closely observe and report on change. Such information is valuable and can further inform large-scale experimental designs. In this research note, a statistical analysis for case-study data is outlined that employs a modification to the Reliable Change Index (Jacobson & Truax, 1991). The relationship between reliable change and clinical significance is discussed. Example data are used to guide the reader through the use and application of this analysis. A method of analysis is detailed that is suitable for assessing change in measures with binary categorical outcomes. The analysis is illustrated using data from one individual, measured before and after treatment for stuttering. The application of this approach to assess change in categorical, binary data has potential application in speech-language pathology. It enables clinicians and researchers to analyze results from case studies for their statistical and clinical significance. This new method addresses a gap in the research design literature, that is, the lack of analysis methods for noncontinuous data (such as counts, rates, proportions of events) that may be used in case-study designs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kilpua, E. K. J.; Olspert, N.; Grigorievskiy, A.
We study the relation between strong and extreme geomagnetic storms and solar cycle characteristics. The analysis uses an extensive geomagnetic index AA data set spanning over 150 yr complemented by the Kakioka magnetometer recordings. We apply Pearson correlation statistics and estimate the significance of the correlation with a bootstrapping technique. We show that the correlation between the storm occurrence and the strength of the solar cycle decreases from a clear positive correlation with increasing storm magnitude toward a negligible relationship. Hence, the quieter Sun can also launch superstorms that may lead to significant societal and economic impact. Our results show thatmore » while weaker storms occur most frequently in the declining phase, the stronger storms have the tendency to occur near solar maximum. Our analysis suggests that the most extreme solar eruptions do not have a direct connection between the solar large-scale dynamo-generated magnetic field, but are rather associated with smaller-scale dynamo and resulting turbulent magnetic fields. The phase distributions of sunspots and storms becoming increasingly in phase with increasing storm strength, on the other hand, may indicate that the extreme storms are related to the toroidal component of the solar large-scale field.« less
NASA Astrophysics Data System (ADS)
Witte, M.; Morrison, H.; Jensen, J. B.; Bansemer, A.; Gettelman, A.
2017-12-01
The spatial covariance of cloud and rain water (or in simpler terms, small and large drops, respectively) is an important quantity for accurate prediction of the accretion rate in bulk microphysical parameterizations that account for subgrid variability using assumed probability density functions (pdfs). Past diagnoses of this covariance from remote sensing, in situ measurements and large eddy simulation output have implicitly assumed that the magnitude of the covariance is insensitive to grain size (i.e. horizontal resolution) and averaging length, but this is not the case because both cloud and rain water exhibit scale invariance across a wide range of scales - from tens of centimeters to tens of kilometers in the case of cloud water, a range that we will show is primarily limited by instrumentation and sampling issues. Since the individual variances systematically vary as a function of spatial scale, it should be expected that the covariance follows a similar relationship. In this study, we quantify the scaling properties of cloud and rain water content and their covariability from high frequency in situ aircraft measurements of marine stratocumulus taken over the southeastern Pacific Ocean aboard the NSF/NCAR C-130 during the VOCALS-REx field experiment of October-November 2008. First we confirm that cloud and rain water scale in distinct manners, indicating that there is a statistically and potentially physically significant difference in the spatial structure of the two fields. Next, we demonstrate that the covariance is a strong function of spatial scale, which implies important caveats regarding the ability of limited-area models with domains smaller than a few tens of kilometers across to accurately reproduce the spatial organization of precipitation. Finally, we present preliminary work on the development of a scale-aware parameterization of cloud-rain water subgrid covariability based in multifractal analysis intended for application in large-scale model microphysics schemes.
75 FR 72611 - Assessments, Large Bank Pricing
Federal Register 2010, 2011, 2012, 2013, 2014
2010-11-24
... the worst risk ranking and are included in the statistical analysis. Appendix 1 to the NPR describes the statistical analysis in detail. \\12\\ The percentage approximated by factors is based on the statistical model for that particual year. Actual weights assigned to each scorecard measure are largely based...
Zhu, Wensheng; Yuan, Ying; Zhang, Jingwen; Zhou, Fan; Knickmeyer, Rebecca C; Zhu, Hongtu
2017-02-01
The aim of this paper is to systematically evaluate a biased sampling issue associated with genome-wide association analysis (GWAS) of imaging phenotypes for most imaging genetic studies, including the Alzheimer's Disease Neuroimaging Initiative (ADNI). Specifically, the original sampling scheme of these imaging genetic studies is primarily the retrospective case-control design, whereas most existing statistical analyses of these studies ignore such sampling scheme by directly correlating imaging phenotypes (called the secondary traits) with genotype. Although it has been well documented in genetic epidemiology that ignoring the case-control sampling scheme can produce highly biased estimates, and subsequently lead to misleading results and suspicious associations, such findings are not well documented in imaging genetics. We use extensive simulations and a large-scale imaging genetic data analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data to evaluate the effects of the case-control sampling scheme on GWAS results based on some standard statistical methods, such as linear regression methods, while comparing it with several advanced statistical methods that appropriately adjust for the case-control sampling scheme. Copyright © 2016 Elsevier Inc. All rights reserved.
A flexible, interpretable framework for assessing sensitivity to unmeasured confounding.
Dorie, Vincent; Harada, Masataka; Carnegie, Nicole Bohme; Hill, Jennifer
2016-09-10
When estimating causal effects, unmeasured confounding and model misspecification are both potential sources of bias. We propose a method to simultaneously address both issues in the form of a semi-parametric sensitivity analysis. In particular, our approach incorporates Bayesian Additive Regression Trees into a two-parameter sensitivity analysis strategy that assesses sensitivity of posterior distributions of treatment effects to choices of sensitivity parameters. This results in an easily interpretable framework for testing for the impact of an unmeasured confounder that also limits the number of modeling assumptions. We evaluate our approach in a large-scale simulation setting and with high blood pressure data taken from the Third National Health and Nutrition Examination Survey. The model is implemented as open-source software, integrated into the treatSens package for the R statistical programming language. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Geoffrey H. Donovan; David T. Butry; Megan Y. Mao
2016-01-01
Past research has examined the effect of urban trees, and other vegetation, on stormwater runoff using hydrological models or small-scale experiments. However, there has been no statistical analysis of the influence of vegetation on runoff in an intact urban watershed, and it is not clear how results from small-scale studies scale up to the city level. Researchers...
Computation of large-scale statistics in decaying isotropic turbulence
NASA Technical Reports Server (NTRS)
Chasnov, Jeffrey R.
1993-01-01
We have performed large-eddy simulations of decaying isotropic turbulence to test the prediction of self-similar decay of the energy spectrum and to compute the decay exponents of the kinetic energy. In general, good agreement between the simulation results and the assumption of self-similarity were obtained. However, the statistics of the simulations were insufficient to compute the value of gamma which corrects the decay exponent when the spectrum follows a k(exp 4) wave number behavior near k = 0. To obtain good statistics, it was found necessary to average over a large ensemble of turbulent flows.
Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul
2016-01-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257
ERIC Educational Resources Information Center
Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H.
2017-01-01
Competence data from low-stakes educational large-scale assessment studies allow for evaluating relationships between competencies and other variables. The impact of item-level nonresponse has not been investigated with regard to statistics that determine the size of these relationships (e.g., correlations, regression coefficients). Classical…
Workflow based framework for life science informatics.
Tiwari, Abhishek; Sekhar, Arvind K T
2007-10-01
Workflow technology is a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services) which facilitate knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. In this article, we have discussed the existing workflow systems and the trends in applications of workflow based systems.
Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies
Liu, Zhonghua; Lin, Xihong
2017-01-01
Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391
Multiple phenotype association tests using summary statistics in genome-wide association studies.
Liu, Zhonghua; Lin, Xihong
2018-03-01
We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.
Overdamped large-eddy simulations of turbulent pipe flow up to Reτ = 1500
NASA Astrophysics Data System (ADS)
Feldmann, Daniel; Avila, Marc
2018-04-01
We present results from large-eddy simulations (LES) of turbulent pipe flow in a computational domain of 42 radii in length. Wide ranges of shear the Reynolds number and Smagorinsky model parameter are covered, 180 ≤ Reτ ≤ 1500 and 0.05 ≤ Cs ≤ 1.2, respectively. The aim is to asses the effect of Cs on the resolved flow field and turbulence statistics as well as to test whether very large scale motions (VLSM) in pipe flow can be isolated from the near-wall cycle by enhancing the dissipative character of the static Smagorinsky model with elevated Cs values. We found that the optimal Cs to achieve best agreement with reference data varies with Reτ and further depends on the wall normal location and the quantity of interest. Furthermore, for increasing Reτ , the optimal Cs for pipe flow LES seems to approach the theoretically optimal value for LES of isotropic turbulence. In agreement with previous studies, we found that for increasing Cs small-scale streaks in simple flow field visualisations are gradually quenched and replaced by much larger smooth streaks. Our analysis of low-order turbulence statistics suggests, that these structures originate from an effective reduction of the Reynolds number and thus represent modified low-Reynolds number near-wall streaks rather than VLSM. We argue that overdamped LES with the static Smagorinsky model cannot be used to unambiguously determine the origin and the dynamics of VLSM in pipe flow. The approach might be salvaged by e.g. using more sophisticated LES models accounting for energy flux towards large scales or explicit anisotropic filter kernels.
A non-perturbative exploration of the high energy regime in Nf=3 QCD. ALPHA Collaboration
NASA Astrophysics Data System (ADS)
Dalla Brida, Mattia; Fritzsch, Patrick; Korzec, Tomasz; Ramos, Alberto; Sint, Stefan; Sommer, Rainer
2018-05-01
Using continuum extrapolated lattice data we trace a family of running couplings in three-flavour QCD over a large range of scales from about 4 to 128 GeV. The scale is set by the finite space time volume so that recursive finite size techniques can be applied, and Schrödinger functional (SF) boundary conditions enable direct simulations in the chiral limit. Compared to earlier studies we have improved on both statistical and systematic errors. Using the SF coupling to implicitly define a reference scale 1/L_0≈ 4 GeV through \\bar{g}^2(L_0) =2.012, we quote L_0 Λ ^{N_f=3}_{{\\overline{MS}}} =0.0791(21). This error is dominated by statistics; in particular, the remnant perturbative uncertainty is negligible and very well controlled, by connecting to infinite renormalization scale from different scales 2^n/L_0 for n=0,1,\\ldots ,5. An intermediate step in this connection may involve any member of a one-parameter family of SF couplings. This provides an excellent opportunity for tests of perturbation theory some of which have been published in a letter (ALPHA collaboration, M. Dalla Brida et al. in Phys Rev Lett 117(18):182001, 2016). The results indicate that for our target precision of 3 per cent in L_0 Λ ^{N_f=3}_{{\\overline{MS}}}, a reliable estimate of the truncation error requires non-perturbative data for a sufficiently large range of values of α _s=\\bar{g}^2/(4π ). In the present work we reach this precision by studying scales that vary by a factor 2^5= 32, reaching down to α _s≈ 0.1. We here provide the details of our analysis and an extended discussion.
The FRIGG project: From intermediate galactic scales to self-gravitating cores
NASA Astrophysics Data System (ADS)
Hennebelle, Patrick
2018-03-01
Context. Understanding the detailed structure of the interstellar gas is essential for our knowledge of the star formation process. Aim. The small-scale structure of the interstellar medium (ISM) is a direct consequence of the galactic scales and making the link between the two is essential. Methods: We perform adaptive mesh simulations that aim to bridge the gap between the intermediate galactic scales and the self-gravitating prestellar cores. For this purpose we use stratified supernova regulated ISM magneto-hydrodynamical simulations at the kpc scale to set up the initial conditions. We then zoom, performing a series of concentric uniform refinement and then refining on the Jeans length for the last levels. This allows us to reach a spatial resolution of a few 10-3 pc. The cores are identified using a clump finder and various criteria based on virial analysis. Their most relevant properties are computed and, due to the large number of objects formed in the simulations, reliable statistics are obtained. Results: The cores' properties show encouraging agreements with observations. The mass spectrum presents a clear powerlaw at high masses with an exponent close to ≃-1.3 and a peak at about 1-2 M⊙. The velocity dispersion and the angular momentum distributions are respectively a few times the local sound speed and a few 10-2 pc km s-1. We also find that the distribution of thermally supercritical cores present a range of magnetic mass-to-flux over critical mass-to-flux ratios, typically between ≃0.3 and 3 indicating that they are significantly magnetized. Investigating the time and spatial dependence of these statistical properties, we conclude that they are not significantly affected by the zooming procedure and that they do not present very large fluctuations. The most severe issue appears to be the dependence on the numerical resolution of the core mass function (CMF). While the core definition process may possibly introduce some biases, the peak tends to shift to smaller values when the resolution improves. Conclusions: Our simulations, which use self-consistently generated initial conditions at the kpc scale, produce a large number of prestellar cores from which reliable statistics can be inferred. Preliminary comparisons with observations show encouraging agreements. In particular the inferred CMFs resemble the ones inferred from recent observations. We stress, however, a possible issue with the peak position shifting with numerical resolution.
Structures in magnetohydrodynamic turbulence: Detection and scaling
NASA Astrophysics Data System (ADS)
Uritsky, V. M.; Pouquet, A.; Rosenberg, D.; Mininni, P. D.; Donovan, E. F.
2010-11-01
We present a systematic analysis of statistical properties of turbulent current and vorticity structures at a given time using cluster analysis. The data stem from numerical simulations of decaying three-dimensional magnetohydrodynamic turbulence in the absence of an imposed uniform magnetic field; the magnetic Prandtl number is taken equal to unity, and we use a periodic box with grids of up to 15363 points and with Taylor Reynolds numbers up to 1100. The initial conditions are either an X -point configuration embedded in three dimensions, the so-called Orszag-Tang vortex, or an Arn’old-Beltrami-Childress configuration with a fully helical velocity and magnetic field. In each case two snapshots are analyzed, separated by one turn-over time, starting just after the peak of dissipation. We show that the algorithm is able to select a large number of structures (in excess of 8000) for each snapshot and that the statistical properties of these clusters are remarkably similar for the two snapshots as well as for the two flows under study in terms of scaling laws for the cluster characteristics, with the structures in the vorticity and in the current behaving in the same way. We also study the effect of Reynolds number on cluster statistics, and we finally analyze the properties of these clusters in terms of their velocity-magnetic-field correlation. Self-organized criticality features have been identified in the dissipative range of scales. A different scaling arises in the inertial range, which cannot be identified for the moment with a known self-organized criticality class consistent with magnetohydrodynamics. We suggest that this range can be governed by turbulence dynamics as opposed to criticality and propose an interpretation of intermittency in terms of propagation of local instabilities.
Zhang, Xiao-Bo; Qu, Xian-You; Li, Meng; Wang, Hui; Jing, Zhi-Xian; Liu, Xiang; Zhang, Zhi-Wei; Guo, Lan-Ping; Huang, Lu-Qi
2017-11-01
After the end of the national and local medicine resources census work, a large number of Chinese medicine resources and distribution of data will be summarized. The species richness between the regions is a valid indicator for objective reflection of inter-regional resources of Chinese medicine. Due to the large difference in the size of the county area, the assessment of the intercropping of the resources of the traditional Chinese medicine by the county as a statistical unit will lead to the deviation of the regional abundance statistics. Based on the rule grid or grid statistical methods, the size of the statistical unit due to different can be reduced, the differences in the richness of traditional Chinese medicine resources are caused. Taking Chongqing as an example, based on the existing survey data, the difference of richness of traditional Chinese medicine resources under different grid scale were compared and analyzed. The results showed that the 30 km grid could be selected and the richness of Chinese medicine resources in Chongqing could reflect the objective situation of intercropping resources richness in traditional Chinese medicine better. Copyright© by the Chinese Pharmaceutical Association.
NASA Astrophysics Data System (ADS)
Amin, Asad; Nasim, Wajid; Mubeen, Muhammad; Kazmi, Dildar Hussain; Lin, Zhaohui; Wahid, Abdul; Sultana, Syeda Refat; Gibbs, Jim; Fahad, Shah
2017-09-01
Unpredictable precipitation trends have largely influenced by climate change which prolonged droughts or floods in South Asia. Statistical analysis of monthly, seasonal, and annual precipitation trend carried out for different temporal (1996-2015 and 2041-2060) and spatial scale (39 meteorological stations) in Pakistan. Statistical downscaling model (SimCLIM) was used for future precipitation projection (2041-2060) and analyzed by statistical approach. Ensemble approach combined with representative concentration pathways (RCPs) at medium level used for future projections. The magnitude and slop of trends were derived by applying Mann-Kendal and Sen's slop statistical approaches. Geo-statistical application used to generate precipitation trend maps. Comparison of base and projected precipitation by statistical analysis represented by maps and graphical visualization which facilitate to detect trends. Results of this study projects that precipitation trend was increasing more than 70% of weather stations for February, March, April, August, and September represented as base years. Precipitation trend was decreased in February to April but increase in July to October in projected years. Highest decreasing trend was reported in January for base years which was also decreased in projected years. Greater variation in precipitation trends for projected and base years was reported in February to April. Variations in projected precipitation trend for Punjab and Baluchistan highly accredited in March and April. Seasonal analysis shows large variation in winter, which shows increasing trend for more than 30% of weather stations and this increased trend approaches 40% for projected precipitation. High risk was reported in base year pre-monsoon season where 90% of weather station shows increasing trend but in projected years this trend decreased up to 33%. Finally, the annual precipitation trend has increased for more than 90% of meteorological stations in base (1996-2015) which has decreased for projected year (2041-2060) up to 76%. These result revealed that overall precipitation trend is decreasing in future year which may prolonged the drought in 14% of weather stations under study.
Combined heat and power systems: economic and policy barriers to growth
2012-01-01
Background Combined Heat and Power (CHP) systems can provide a range of benefits to users with regards to efficiency, reliability, costs and environmental impact. Furthermore, increasing the amount of electricity generated by CHP systems in the United States has been identified as having significant potential for impressive economic and environmental outcomes on a national scale. Given the benefits from increasing the adoption of CHP technologies, there is value in improving our understanding of how desired increases in CHP adoption can be best achieved. These obstacles are currently understood to stem from regulatory as well as economic and technological barriers. In our research, we answer the following questions: Given the current policy and economic environment facing the CHP industry, what changes need to take place in this space in order for CHP systems to be competitive in the energy market? Methods We focus our analysis primarily on Combined Heat and Power Systems that use natural gas turbines. Our analysis takes a two-pronged approach. We first conduct a statistical analysis of the impact of state policies on increases in electricity generated from CHP system. Second, we conduct a Cost-Benefit analysis to determine in which circumstances funding incentives are necessary to make CHP technologies cost-competitive. Results Our policy analysis shows that regulatory improvements do not explain the growth in adoption of CHP technologies but hold the potential to encourage increases in electricity generated from CHP system in small-scale applications. Our Cost-Benefit analysis shows that CHP systems are only cost competitive in large-scale applications and that funding incentives would be necessary to make CHP technology cost-competitive in small-scale applications. Conclusion From the synthesis of these analyses we conclude that because large-scale applications of natural gas turbines are already cost-competitive, policy initiatives aimed at a CHP market dominated primarily by large-scale (and therefore already cost-competitive) systems have not been effectively directed. Our recommendation is that for CHP technologies using natural gas turbines, policy focuses should be on increasing CHP growth in small-scale systems. This result can be best achieved through redirection of state and federal incentives, research and development, adoption of smart grid technology, and outreach and education. PMID:22540988
Not a Copernican observer: biased peculiar velocity statistics in the local Universe
NASA Astrophysics Data System (ADS)
Hellwing, Wojciech A.; Nusser, Adi; Feix, Martin; Bilicki, Maciej
2017-05-01
We assess the effect of the local large-scale structure on the estimation of two-point statistics of the observed radial peculiar velocities of galaxies. A large N-body simulation is used to examine these statistics from the perspective of random observers as well as 'Local Group-like' observers conditioned to reside in an environment resembling the observed Universe within 20 Mpc. The local environment systematically distorts the shape and amplitude of velocity statistics with respect to ensemble-averaged measurements made by a Copernican (random) observer. The Virgo cluster has the most significant impact, introducing large systematic deviations in all the statistics. For a simple 'top-hat' selection function, an idealized survey extending to ˜160 h-1 Mpc or deeper is needed to completely mitigate the effects of the local environment. Using shallower catalogues leads to systematic deviations of the order of 50-200 per cent depending on the scale considered. For a flat redshift distribution similar to the one of the CosmicFlows-3 survey, the deviations are even more prominent in both the shape and amplitude at all separations considered (≲100 h-1 Mpc). Conclusions based on statistics calculated without taking into account the impact of the local environment should be revisited.
Wavelet analysis of polarization maps of polycrystalline biological fluids networks
NASA Astrophysics Data System (ADS)
Ushenko, Y. A.
2011-12-01
The optical model of human joints synovial fluid is proposed. The statistic (statistic moments), correlation (autocorrelation function) and self-similar (Log-Log dependencies of power spectrum) structure of polarization two-dimensional distributions (polarization maps) of synovial fluid has been analyzed. It has been shown that differentiation of polarization maps of joint synovial fluid with different physiological state samples is expected of scale-discriminative analysis. To mark out of small-scale domain structure of synovial fluid polarization maps, the wavelet analysis has been used. The set of parameters, which characterize statistic, correlation and self-similar structure of wavelet coefficients' distributions of different scales of polarization domains for diagnostics and differentiation of polycrystalline network transformation connected with the pathological processes, has been determined.
Alignments of parity even/odd-only multipoles in CMB
NASA Astrophysics Data System (ADS)
Aluri, Pavan K.; Ralston, John P.; Weltman, Amanda
2017-12-01
We compare the statistics of parity even and odd multipoles of the cosmic microwave background (CMB) sky from Planck full mission temperature measurements. An excess power in odd multipoles compared to even multipoles has previously been found on large angular scales. Motivated by this apparent parity asymmetry, we evaluate directional statistics associated with even compared to odd multipoles, along with their significances. Primary tools are the Power tensor and Alignment tensor statistics. We limit our analysis to the first 60 multipoles i.e. l = [2, 61]. We find no evidence for statistically unusual alignments of even parity multipoles. More than one independent statistic finds evidence for alignments of anisotropy axes of odd multipoles, with a significance equivalent to ∼2σ or more. The robustness of alignment axes is tested by making Galactic cuts and varying the multipole range. Very interestingly, the region spanned by the (a)symmetry axes is found to broadly contain other parity (a)symmetry axes previously observed in the literature.
Monroe, Scott; Cai, Li
2015-01-01
This research is concerned with two topics in assessing model fit for categorical data analysis. The first topic involves the application of a limited-information overall test, introduced in the item response theory literature, to structural equation modeling (SEM) of categorical outcome variables. Most popular SEM test statistics assess how well the model reproduces estimated polychoric correlations. In contrast, limited-information test statistics assess how well the underlying categorical data are reproduced. Here, the recently introduced C2 statistic of Cai and Monroe (2014) is applied. The second topic concerns how the root mean square error of approximation (RMSEA) fit index can be affected by the number of categories in the outcome variable. This relationship creates challenges for interpreting RMSEA. While the two topics initially appear unrelated, they may conveniently be studied in tandem since RMSEA is based on an overall test statistic, such as C2. The results are illustrated with an empirical application to data from a large-scale educational survey.
Statistical Compression of Wind Speed Data
NASA Astrophysics Data System (ADS)
Tagle, F.; Castruccio, S.; Crippa, P.; Genton, M.
2017-12-01
In this work we introduce a lossy compression approach that utilizes a stochastic wind generator based on a non-Gaussian distribution to reproduce the internal climate variability of daily wind speed as represented by the CESM Large Ensemble over Saudi Arabia. Stochastic wind generators, and stochastic weather generators more generally, are statistical models that aim to match certain statistical properties of the data on which they are trained. They have been used extensively in applications ranging from agricultural models to climate impact studies. In this novel context, the parameters of the fitted model can be interpreted as encoding the information contained in the original uncompressed data. The statistical model is fit to only 3 of the 30 ensemble members and it adequately captures the variability of the ensemble in terms of seasonal internannual variability of daily wind speed. To deal with such a large spatial domain, it is partitioned into 9 region, and the model is fit independently to each of these. We further discuss a recent refinement of the model, which relaxes this assumption of regional independence, by introducing a large-scale component that interacts with the fine-scale regional effects.
Meta-analysis of gene-level associations for rare variants based on single-variant statistics.
Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu
2013-08-08
Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Grotjahn, Richard; Black, Robert; Leung, Ruby; Wehner, Michael F.; Barlow, Mathew; Bosilovich, Michael G.; Gershunov, Alexander; Gutowski, William J., Jr.; Gyakum, John R.; Katz, Richard W.;
2015-01-01
The objective of this paper is to review statistical methods, dynamics, modeling efforts, and trends related to temperature extremes, with a focus upon extreme events of short duration that affect parts of North America. These events are associated with large scale meteorological patterns (LSMPs). The statistics, dynamics, and modeling sections of this paper are written to be autonomous and so can be read separately. Methods to define extreme events statistics and to identify and connect LSMPs to extreme temperature events are presented. Recent advances in statistical techniques connect LSMPs to extreme temperatures through appropriately defined covariates that supplement more straightforward analyses. Various LSMPs, ranging from synoptic to planetary scale structures, are associated with extreme temperature events. Current knowledge about the synoptics and the dynamical mechanisms leading to the associated LSMPs is incomplete. Systematic studies of: the physics of LSMP life cycles, comprehensive model assessment of LSMP-extreme temperature event linkages, and LSMP properties are needed. Generally, climate models capture observed properties of heat waves and cold air outbreaks with some fidelity. However they overestimate warm wave frequency and underestimate cold air outbreak frequency, and underestimate the collective influence of low-frequency modes on temperature extremes. Modeling studies have identified the impact of large-scale circulation anomalies and landatmosphere interactions on changes in extreme temperatures. However, few studies have examined changes in LSMPs to more specifically understand the role of LSMPs on past and future extreme temperature changes. Even though LSMPs are resolvable by global and regional climate models, they are not necessarily well simulated. The paper concludes with unresolved issues and research questions.
Hydrodynamic Simulations and Tomographic Reconstructions of the Intergalactic Medium
NASA Astrophysics Data System (ADS)
Stark, Casey William
The Intergalactic Medium (IGM) is the dominant reservoir of matter in the Universe from which the cosmic web and galaxies form. The structure and physical state of the IGM provides insight into the cosmological model of the Universe, the origin and timeline of the reionization of the Universe, as well as being an essential ingredient in our understanding of galaxy formation and evolution. Our primary handle on this information is a signal known as the Lyman-alpha forest (or Ly-alpha forest) -- the collection of absorption features in high-redshift sources due to intervening neutral hydrogen, which scatters HI Ly-alpha photons out of the line of sight. The Ly-alpha forest flux traces density fluctuations at high redshift and at moderate overdensities, making it an excellent tool for mapping large-scale structure and constraining cosmological parameters. Although the computational methodology for simulating the Ly-alpha forest has existed for over a decade, we are just now approaching the scale of computing power required to simultaneously capture large cosmological scales and the scales of the smallest absorption systems. My thesis focuses on using simulations at the edge of modern computing to produce precise predictions of the statistics of the Ly-alpha forest and to better understand the structure of the IGM. In the first part of my thesis, I review the state of hydrodynamic simulations of the IGM, including pitfalls of the existing under-resolved simulations. Our group developed a new cosmological hydrodynamics code to tackle the computational challenge, and I developed a distributed analysis framework to compute flux statistics from our simulations. I present flux statistics derived from a suite of our large hydrodynamic simulations and demonstrate convergence to the per cent level. I also compare flux statistics derived from simulations using different discretizations and hydrodynamic schemes (Eulerian finite volume vs. smoothed particle hydrodynamics) and discuss differences in their convergence behavior, their overall agreement, and the implications for cosmological constraints. In the second part of my thesis, I present a tomographic reconstruction method that allows us to make 3D maps of the IGM with Mpc resolution. In order to make reconstructions of large surveys computationally feasible, I developed a new Wiener Filter application with an algorithm specialized to our problem, which significantly reduces the space and time complexity compared to previous implementations. I explore two scientific applications of the maps: finding protoclusters by searching the maps for large, contiguous regions of low flux and finding cosmic voids by searching the maps for regions of high flux. Using a large N-body simulation, I identify and characterize both protoclusters and voids at z = 2.5, in the middle of the redshift range being mapped by ongoing surveys. I provide simple methods for identifying protocluster and void candidates in the tomographic flux maps, and then test them on mock surveys and reconstructions. I present forecasts for sample purity and completeness and other scientific applications of these large, high-redshift objects.
Multi-scale pixel-based image fusion using multivariate empirical mode decomposition.
Rehman, Naveed ur; Ehsan, Shoaib; Abdullah, Syed Muhammad Umer; Akhtar, Muhammad Jehanzaib; Mandic, Danilo P; McDonald-Maier, Klaus D
2015-05-08
A novel scheme to perform the fusion of multiple images using the multivariate empirical mode decomposition (MEMD) algorithm is proposed. Standard multi-scale fusion techniques make a priori assumptions regarding input data, whereas standard univariate empirical mode decomposition (EMD)-based fusion techniques suffer from inherent mode mixing and mode misalignment issues, characterized respectively by either a single intrinsic mode function (IMF) containing multiple scales or the same indexed IMFs corresponding to multiple input images carrying different frequency information. We show that MEMD overcomes these problems by being fully data adaptive and by aligning common frequency scales from multiple channels, thus enabling their comparison at a pixel level and subsequent fusion at multiple data scales. We then demonstrate the potential of the proposed scheme on a large dataset of real-world multi-exposure and multi-focus images and compare the results against those obtained from standard fusion algorithms, including the principal component analysis (PCA), discrete wavelet transform (DWT) and non-subsampled contourlet transform (NCT). A variety of image fusion quality measures are employed for the objective evaluation of the proposed method. We also report the results of a hypothesis testing approach on our large image dataset to identify statistically-significant performance differences.
Multi-Scale Pixel-Based Image Fusion Using Multivariate Empirical Mode Decomposition
Rehman, Naveed ur; Ehsan, Shoaib; Abdullah, Syed Muhammad Umer; Akhtar, Muhammad Jehanzaib; Mandic, Danilo P.; McDonald-Maier, Klaus D.
2015-01-01
A novel scheme to perform the fusion of multiple images using the multivariate empirical mode decomposition (MEMD) algorithm is proposed. Standard multi-scale fusion techniques make a priori assumptions regarding input data, whereas standard univariate empirical mode decomposition (EMD)-based fusion techniques suffer from inherent mode mixing and mode misalignment issues, characterized respectively by either a single intrinsic mode function (IMF) containing multiple scales or the same indexed IMFs corresponding to multiple input images carrying different frequency information. We show that MEMD overcomes these problems by being fully data adaptive and by aligning common frequency scales from multiple channels, thus enabling their comparison at a pixel level and subsequent fusion at multiple data scales. We then demonstrate the potential of the proposed scheme on a large dataset of real-world multi-exposure and multi-focus images and compare the results against those obtained from standard fusion algorithms, including the principal component analysis (PCA), discrete wavelet transform (DWT) and non-subsampled contourlet transform (NCT). A variety of image fusion quality measures are employed for the objective evaluation of the proposed method. We also report the results of a hypothesis testing approach on our large image dataset to identify statistically-significant performance differences. PMID:26007714
Systematic Studies of Cosmic-Ray Anisotropy and Energy Spectrum with IceCube and IceTop
NASA Astrophysics Data System (ADS)
McNally, Frank
Anisotropy in the cosmic-ray arrival direction distribution has been well documented over a large energy range, but its origin remains largely a mystery. In the TeV to PeV energy range, the galactic magnetic field thoroughly scatters cosmic rays, but anisotropy at the part-per-mille level and smaller persists, potentially carrying information about nearby cosmic-ray accelerators and the galactic magnetic field. The IceCube Neutrino Observatory was the first detector to observe anisotropy at these energies in the Southern sky. This work uses 318 billion cosmic-ray induced muon events, collected between May 2009 and May 2015 from both the in-ice component of IceCube as well as the surface component, IceTop. The observed global anisotropy features large regions of relative excess and deficit, with amplitudes on the order of 10-3. While a decomposition of the arrival direction distribution into spherical harmonics shows that most of the power is contained in the low-multipole (ℓ ≤ 4) moments, higher-multipole components are found to be statistically significant down to an angular scale of less than 10°, approaching the angular resolution of the detector. Above 100TeV, a change in the topology of the arrival direction distribution is observed, and the anisotropy is characterized by a wide relative deficit whose amplitude increases with primary energy up to at least 5PeV, the highest energies currently accessible to IceCube with sufficient event statistics. No time dependence of the large- and small-scale structures is observed in the six-year period covered by this analysis within statistical and systematic uncertainties. Analysis of the energy spectrum and composition in the PeV energy range as a function of sky position is performed with IceTop data over a five-year period using a likelihood-based reconstruction. Both the energy spectrum and the composition distribution are found to be consistent with a single source population over declination bands. This work represents an early attempt at understanding the anisotropy through the study of the spectrum and composition. The high-statistics data set reveals more details on the properties of the anisotropy, potentially able to shed light on the various physical processes responsible for the complex angular structure and energy evolution.
Transport Coefficients from Large Deviation Functions
NASA Astrophysics Data System (ADS)
Gao, Chloe; Limmer, David
2017-10-01
We describe a method for computing transport coefficients from the direct evaluation of large deviation function. This method is general, relying on only equilibrium fluctuations, and is statistically efficient, employing trajectory based importance sampling. Equilibrium fluctuations of molecular currents are characterized by their large deviation functions, which is a scaled cumulant generating function analogous to the free energy. A diffusion Monte Carlo algorithm is used to evaluate the large deviation functions, from which arbitrary transport coefficients are derivable. We find significant statistical improvement over traditional Green-Kubo based calculations. The systematic and statistical errors of this method are analyzed in the context of specific transport coefficient calculations, including the shear viscosity, interfacial friction coefficient, and thermal conductivity.
Seman, Ali; Sapawi, Azizian Mohd; Salleh, Mohd Zaki
2015-06-01
Y-chromosome short tandem repeats (Y-STRs) are genetic markers with practical applications in human identification. However, where mass identification is required (e.g., in the aftermath of disasters with significant fatalities), the efficiency of the process could be improved with new statistical approaches. Clustering applications are relatively new tools for large-scale comparative genotyping, and the k-Approximate Modal Haplotype (k-AMH), an efficient algorithm for clustering large-scale Y-STR data, represents a promising method for developing these tools. In this study we improved the k-AMH and produced three new algorithms: the Nk-AMH I (including a new initial cluster center selection), the Nk-AMH II (including a new dominant weighting value), and the Nk-AMH III (combining I and II). The Nk-AMH III was the superior algorithm, with mean clustering accuracy that increased in four out of six datasets and remained at 100% in the other two. Additionally, the Nk-AMH III achieved a 2% higher overall mean clustering accuracy score than the k-AMH, as well as optimal accuracy for all datasets (0.84-1.00). With inclusion of the two new methods, the Nk-AMH III produced an optimal solution for clustering Y-STR data; thus, the algorithm has potential for further development towards fully automatic clustering of any large-scale genotypic data.
NASA Astrophysics Data System (ADS)
Kube, R.; Garcia, O. E.; Theodorsen, A.; Brunner, D.; Kuang, A. Q.; LaBombard, B.; Terry, J. L.
2018-06-01
The Alcator C-Mod mirror Langmuir probe system has been used to sample data time series of fluctuating plasma parameters in the outboard mid-plane far scrape-off layer. We present a statistical analysis of one second long time series of electron density, temperature, radial electric drift velocity and the corresponding particle and electron heat fluxes. These are sampled during stationary plasma conditions in an ohmically heated, lower single null diverted discharge. The electron density and temperature are strongly correlated and feature fluctuation statistics similar to the ion saturation current. Both electron density and temperature time series are dominated by intermittent, large-amplitude burst with an exponential distribution of both burst amplitudes and waiting times between them. The characteristic time scale of the large-amplitude bursts is approximately 15 μ {{s}}. Large-amplitude velocity fluctuations feature a slightly faster characteristic time scale and appear at a faster rate than electron density and temperature fluctuations. Describing these time series as a superposition of uncorrelated exponential pulses, we find that probability distribution functions, power spectral densities as well as auto-correlation functions of the data time series agree well with predictions from the stochastic model. The electron particle and heat fluxes present large-amplitude fluctuations. For this low-density plasma, the radial electron heat flux is dominated by convection, that is, correlations of fluctuations in the electron density and radial velocity. Hot and dense blobs contribute only a minute fraction of the total fluctuation driven heat flux.
Zonal average earth radiation budget measurements from satellites for climate studies
NASA Technical Reports Server (NTRS)
Ellis, J. S.; Haar, T. H. V.
1976-01-01
Data from 29 months of satellite radiation budget measurements, taken intermittently over the period 1964 through 1971, are composited into mean month, season and annual zonally averaged meridional profiles. Individual months, which comprise the 29 month set, were selected as representing the best available total flux data for compositing into large scale statistics for climate studies. A discussion of spatial resolution of the measurements along with an error analysis, including both the uncertainty and standard error of the mean, are presented.
Advanced Artificial Intelligence Technology Testbed
NASA Technical Reports Server (NTRS)
Anken, Craig S.
1993-01-01
The Advanced Artificial Intelligence Technology Testbed (AAITT) is a laboratory testbed for the design, analysis, integration, evaluation, and exercising of large-scale, complex, software systems, composed of both knowledge-based and conventional components. The AAITT assists its users in the following ways: configuring various problem-solving application suites; observing and measuring the behavior of these applications and the interactions between their constituent modules; gathering and analyzing statistics about the occurrence of key events; and flexibly and quickly altering the interaction of modules within the applications for further study.
GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome.
Simovski, Boris; Vodák, Daniel; Gundersen, Sveinung; Domanska, Diana; Azab, Abdulrahman; Holden, Lars; Holden, Marit; Grytten, Ivar; Rand, Knut; Drabløs, Finn; Johansen, Morten; Mora, Antonio; Lund-Andersen, Christin; Fromm, Bastian; Eskeland, Ragnhild; Gabrielsen, Odd Stokke; Ferkingstad, Egil; Nakken, Sigve; Bengtsen, Mads; Nederbragt, Alexander Johan; Thorarensen, Hildur Sif; Akse, Johannes Andreas; Glad, Ingrid; Hovig, Eivind; Sandve, Geir Kjetil
2017-07-01
Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell types. Despite the high potential value of these publicly available data for a broad variety of investigations, little attention has been given to the analytical methodology necessary for their widespread utilisation. We here present a first principled treatment of the analysis of collections of genomic tracks. We have developed novel computational and statistical methodology to permit comparative and confirmatory analyses across multiple and disparate data sources. We delineate a set of generic questions that are useful across a broad range of investigations and discuss the implications of choosing different statistical measures and null models. Examples include contrasting analyses across different tissues or diseases. The methodology has been implemented in a comprehensive open-source software system, the GSuite HyperBrowser. To make the functionality accessible to biologists, and to facilitate reproducible analysis, we have also developed a web-based interface providing an expertly guided and customizable way of utilizing the methodology. With this system, many novel biological questions can flexibly be posed and rapidly answered. Through a combination of streamlined data acquisition, interoperable representation of dataset collections, and customizable statistical analysis with guided setup and interpretation, the GSuite HyperBrowser represents a first comprehensive solution for integrative analysis of track collections across the genome and epigenome. The software is available at: https://hyperbrowser.uio.no. © The Author 2017. Published by Oxford University Press.
Shen, Lu; Mickley, Loretta J
2017-03-07
We develop a statistical model to predict June-July-August (JJA) daily maximum 8-h average (MDA8) ozone concentrations in the eastern United States based on large-scale climate patterns during the previous spring. We find that anomalously high JJA ozone in the East is correlated with these springtime patterns: warm tropical Atlantic and cold northeast Pacific sea surface temperatures (SSTs), as well as positive sea level pressure (SLP) anomalies over Hawaii and negative SLP anomalies over the Atlantic and North America. We then develop a linear regression model to predict JJA MDA8 ozone from 1980 to 2013, using the identified SST and SLP patterns from the previous spring. The model explains ∼45% of the variability in JJA MDA8 ozone concentrations and ∼30% variability in the number of JJA ozone episodes (>70 ppbv) when averaged over the eastern United States. This seasonal predictability results from large-scale ocean-atmosphere interactions. Warm tropical Atlantic SSTs can trigger diabatic heating in the atmosphere and influence the extratropical climate through stationary wave propagation, leading to greater subsidence, less precipitation, and higher temperatures in the East, which increases surface ozone concentrations there. Cooler SSTs in the northeast Pacific are also associated with more summertime heatwaves and high ozone in the East. On average, models participating in the Atmospheric Model Intercomparison Project fail to capture the influence of this ocean-atmosphere interaction on temperatures in the eastern United States, implying that such models would have difficulty simulating the interannual variability of surface ozone in this region.
Automatic location of L/H transition times for physical studies with a large statistical basis
NASA Astrophysics Data System (ADS)
González, S.; Vega, J.; Murari, A.; Pereira, A.; Dormido-Canto, S.; Ramírez, J. M.; contributors, JET-EFDA
2012-06-01
Completely automatic techniques to estimate and validate L/H transition times can be essential in L/H transition analyses. The generation of databases with hundreds of transition times and without human intervention is an important step to accomplish (a) L/H transition physics analysis, (b) validation of L/H theoretical models and (c) creation of L/H scaling laws. An entirely unattended methodology is presented in this paper to build large databases of transition times in JET using time series. The proposed technique has been applied to a dataset of 551 JET discharges between campaigns C21 and C26. A prediction with discharges that show a clear signature in time series is made through the locating properties of the wavelet transform. It is an accurate prediction and the uncertainty interval is ±3.2 ms. The discharges with a non-clear pattern in the time series use an L/H mode classifier based on discharges with a clear signature. In this case, the estimation error shows a distribution with mean and standard deviation of 27.9 ms and 37.62 ms, respectively. Two different regression methods have been applied to the measurements acquired at the transition times identified by the automatic system. The obtained scaling laws for the threshold power are not significantly different from those obtained using the data at the transition times determined manually by the experts. The automatic methods allow performing physical studies with a large number of discharges, showing, for example, that there are statistically different types of transitions characterized by different scaling laws.
Mickley, Loretta J.
2017-01-01
We develop a statistical model to predict June–July–August (JJA) daily maximum 8-h average (MDA8) ozone concentrations in the eastern United States based on large-scale climate patterns during the previous spring. We find that anomalously high JJA ozone in the East is correlated with these springtime patterns: warm tropical Atlantic and cold northeast Pacific sea surface temperatures (SSTs), as well as positive sea level pressure (SLP) anomalies over Hawaii and negative SLP anomalies over the Atlantic and North America. We then develop a linear regression model to predict JJA MDA8 ozone from 1980 to 2013, using the identified SST and SLP patterns from the previous spring. The model explains ∼45% of the variability in JJA MDA8 ozone concentrations and ∼30% variability in the number of JJA ozone episodes (>70 ppbv) when averaged over the eastern United States. This seasonal predictability results from large-scale ocean–atmosphere interactions. Warm tropical Atlantic SSTs can trigger diabatic heating in the atmosphere and influence the extratropical climate through stationary wave propagation, leading to greater subsidence, less precipitation, and higher temperatures in the East, which increases surface ozone concentrations there. Cooler SSTs in the northeast Pacific are also associated with more summertime heatwaves and high ozone in the East. On average, models participating in the Atmospheric Model Intercomparison Project fail to capture the influence of this ocean–atmosphere interaction on temperatures in the eastern United States, implying that such models would have difficulty simulating the interannual variability of surface ozone in this region. PMID:28223483
NASA Astrophysics Data System (ADS)
Mekanik, F.; Imteaz, M. A.; Gato-Trinidad, S.; Elmahdi, A.
2013-10-01
In this study, the application of Artificial Neural Networks (ANN) and Multiple regression analysis (MR) to forecast long-term seasonal spring rainfall in Victoria, Australia was investigated using lagged El Nino Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) as potential predictors. The use of dual (combined lagged ENSO-IOD) input sets for calibrating and validating ANN and MR Models is proposed to investigate the simultaneous effect of past values of these two major climate modes on long-term spring rainfall prediction. The MR models that did not violate the limits of statistical significance and multicollinearity were selected for future spring rainfall forecast. The ANN was developed in the form of multilayer perceptron using Levenberg-Marquardt algorithm. Both MR and ANN modelling were assessed statistically using mean square error (MSE), mean absolute error (MAE), Pearson correlation (r) and Willmott index of agreement (d). The developed MR and ANN models were tested on out-of-sample test sets; the MR models showed very poor generalisation ability for east Victoria with correlation coefficients of -0.99 to -0.90 compared to ANN with correlation coefficients of 0.42-0.93; ANN models also showed better generalisation ability for central and west Victoria with correlation coefficients of 0.68-0.85 and 0.58-0.97 respectively. The ability of multiple regression models to forecast out-of-sample sets is compatible with ANN for Daylesford in central Victoria and Kaniva in west Victoria (r = 0.92 and 0.67 respectively). The errors of the testing sets for ANN models are generally lower compared to multiple regression models. The statistical analysis suggest the potential of ANN over MR models for rainfall forecasting using large scale climate modes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Fangyan; Zhang, Song; Chung Wong, Pak
Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less
NASA Astrophysics Data System (ADS)
Ghosh, Sayantan; Manimaran, P.; Panigrahi, Prasanta K.
2011-11-01
We make use of wavelet transform to study the multi-scale, self-similar behavior and deviations thereof, in the stock prices of large companies, belonging to different economic sectors. The stock market returns exhibit multi-fractal characteristics, with some of the companies showing deviations at small and large scales. The fact that, the wavelets belonging to the Daubechies’ (Db) basis enables one to isolate local polynomial trends of different degrees, plays the key role in isolating fluctuations at different scales. One of the primary motivations of this work is to study the emergence of the k-3 behavior [X. Gabaix, P. Gopikrishnan, V. Plerou, H. Stanley, A theory of power law distributions in financial market fluctuations, Nature 423 (2003) 267-270] of the fluctuations starting with high frequency fluctuations. We make use of Db4 and Db6 basis sets to respectively isolate local linear and quadratic trends at different scales in order to study the statistical characteristics of these financial time series. The fluctuations reveal fat tail non-Gaussian behavior, unstable periodic modulations, at finer scales, from which the characteristic k-3 power law behavior emerges at sufficiently large scales. We further identify stable periodic behavior through the continuous Morlet wavelet.
Application of multivariate statistical techniques in microbial ecology.
Paliy, O; Shankar, V
2016-03-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.
SQC: secure quality control for meta-analysis of genome-wide association studies.
Huang, Zhicong; Lin, Huang; Fellay, Jacques; Kutalik, Zoltán; Hubaux, Jean-Pierre
2017-08-01
Due to the limited power of small-scale genome-wide association studies (GWAS), researchers tend to collaborate and establish a larger consortium in order to perform large-scale GWAS. Genome-wide association meta-analysis (GWAMA) is a statistical tool that aims to synthesize results from multiple independent studies to increase the statistical power and reduce false-positive findings of GWAS. However, it has been demonstrated that the aggregate data of individual studies are subject to inference attacks, hence privacy concerns arise when researchers share study data in GWAMA. In this article, we propose a secure quality control (SQC) protocol, which enables checking the quality of data in a privacy-preserving way without revealing sensitive information to a potential adversary. SQC employs state-of-the-art cryptographic and statistical techniques for privacy protection. We implement the solution in a meta-analysis pipeline with real data to demonstrate the efficiency and scalability on commodity machines. The distributed execution of SQC on a cluster of 128 cores for one million genetic variants takes less than one hour, which is a modest cost considering the 10-month time span usually observed for the completion of the QC procedure that includes timing of logistics. SQC is implemented in Java and is publicly available at https://github.com/acs6610987/secureqc. jean-pierre.hubaux@epfl.ch. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Risk of large-scale fires in boreal forests of Finland under changing climate
NASA Astrophysics Data System (ADS)
Lehtonen, I.; Venäläinen, A.; Kämäräinen, M.; Peltola, H.; Gregow, H.
2016-01-01
The target of this work was to assess the impact of projected climate change on forest-fire activity in Finland with special emphasis on large-scale fires. In addition, we were particularly interested to examine the inter-model variability of the projected change of fire danger. For this purpose, we utilized fire statistics covering the period 1996-2014 and consisting of almost 20 000 forest fires, as well as daily meteorological data from five global climate models under representative concentration pathway RCP4.5 and RCP8.5 scenarios. The model data were statistically downscaled onto a high-resolution grid using the quantile-mapping method before performing the analysis. In examining the relationship between weather and fire danger, we applied the Canadian fire weather index (FWI) system. Our results suggest that the number of large forest fires may double or even triple during the present century. This would increase the risk that some of the fires could develop into real conflagrations which have become almost extinct in Finland due to active and efficient fire suppression. However, the results reveal substantial inter-model variability in the rate of the projected increase of forest-fire danger, emphasizing the large uncertainty related to the climate change signal in fire activity. We moreover showed that the majority of large fires in Finland occur within a relatively short period in May and June due to human activities and that FWI correlates poorer with the fire activity during this time of year than later in summer when lightning is a more important cause of fires.
Periodontal disease and carotid atherosclerosis: A meta-analysis of 17,330 participants.
Zeng, Xian-Tao; Leng, Wei-Dong; Lam, Yat-Yin; Yan, Bryan P; Wei, Xue-Mei; Weng, Hong; Kwong, Joey S W
2016-01-15
The association between periodontal disease and carotid atherosclerosis has been evaluated primarily in single-center studies, and whether periodontal disease is an independent risk factor of carotid atherosclerosis remains uncertain. This meta-analysis aimed to evaluate the association between periodontal disease and carotid atherosclerosis. We searched PubMed and Embase for relevant observational studies up to February 20, 2015. Two authors independently extracted data from included studies, and odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for overall and subgroup meta-analyses. Statistical heterogeneity was assessed by the chi-squared test (P<0.1 for statistical significance) and quantified by the I(2) statistic. Data analysis was conducted using the Comprehensive Meta-Analysis (CMA) software. Fifteen observational studies involving 17,330 participants were included in the meta-analysis. The overall pooled result showed that periodontal disease was associated with carotid atherosclerosis (OR: 1.27, 95% CI: 1.14-1.41; P<0.001) but statistical heterogeneity was substantial (I(2)=78.90%). Subgroup analysis of adjusted smoking and diabetes mellitus showed borderline significance (OR: 1.08; 95% CI: 1.00-1.18; P=0.05). Sensitivity and cumulative analyses both indicated that our results were robust. Findings of our meta-analysis indicated that the presence of periodontal disease was associated with carotid atherosclerosis; however, further large-scale, well-conducted clinical studies are needed to explore the precise risk of developing carotid atherosclerosis in patients with periodontal disease. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Flaxman, Abraham D; Stewart, Andrea; Joseph, Jonathan C; Alam, Nurul; Alam, Sayed Saidul; Chowdhury, Hafizur; Mooney, Meghan D; Rampatige, Rasika; Remolador, Hazel; Sanvictores, Diozele; Serina, Peter T; Streatfield, Peter Kim; Tallo, Veronica; Murray, Christopher J L; Hernandez, Bernardo; Lopez, Alan D; Riley, Ian Douglas
2018-02-01
There is increasing interest in using verbal autopsy to produce nationally representative population-level estimates of causes of death. However, the burden of processing a large quantity of surveys collected with paper and pencil has been a barrier to scaling up verbal autopsy surveillance. Direct electronic data capture has been used in other large-scale surveys and can be used in verbal autopsy as well, to reduce time and cost of going from collected data to actionable information. We collected verbal autopsy interviews using paper and pencil and using electronic tablets at two sites, and measured the cost and time required to process the surveys for analysis. From these cost and time data, we extrapolated costs associated with conducting large-scale surveillance with verbal autopsy. We found that the median time between data collection and data entry for surveys collected on paper and pencil was approximately 3 months. For surveys collected on electronic tablets, this was less than 2 days. For small-scale surveys, we found that the upfront costs of purchasing electronic tablets was the primary cost and resulted in a higher total cost. For large-scale surveys, the costs associated with data entry exceeded the cost of the tablets, so electronic data capture provides both a quicker and cheaper method of data collection. As countries increase verbal autopsy surveillance, it is important to consider the best way to design sustainable systems for data collection. Electronic data capture has the potential to greatly reduce the time and costs associated with data collection. For long-term, large-scale surveillance required by national vital statistical systems, electronic data capture reduces costs and allows data to be available sooner.
MRMPROBS suite for metabolomics using large-scale MRM assays.
Tsugawa, Hiroshi; Kanazawa, Mitsuhiro; Ogiwara, Atsushi; Arita, Masanori
2014-08-15
We developed new software environment for the metabolome analysis of large-scale multiple reaction monitoring (MRM) assays. It supports the data format of four major mass spectrometer vendors and mzML common data format. This program provides a process pipeline from the raw-format import to high-dimensional statistical analyses. The novel aspect is graphical user interface-based visualization to perform peak quantification, to interpolate missing values and to normalize peaks interactively based on quality control samples. Together with the software platform, the MRM standard library of 301 metabolites with 775 transitions is also available, which contributes to the reliable peak identification by using retention time and ion abundances. MRMPROBS is available for Windows OS under the creative-commons by-attribution license at http://prime.psc.riken.jp. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Testing alternative ground water models using cross-validation and other methods
Foglia, L.; Mehl, S.W.; Hill, M.C.; Perona, P.; Burlando, P.
2007-01-01
Many methods can be used to test alternative ground water models. Of concern in this work are methods able to (1) rank alternative models (also called model discrimination) and (2) identify observations important to parameter estimates and predictions (equivalent to the purpose served by some types of sensitivity analysis). Some of the measures investigated are computationally efficient; others are computationally demanding. The latter are generally needed to account for model nonlinearity. The efficient model discrimination methods investigated include the information criteria: the corrected Akaike information criterion, Bayesian information criterion, and generalized cross-validation. The efficient sensitivity analysis measures used are dimensionless scaled sensitivity (DSS), composite scaled sensitivity, and parameter correlation coefficient (PCC); the other statistics are DFBETAS, Cook's D, and observation-prediction statistic. Acronyms are explained in the introduction. Cross-validation (CV) is a computationally intensive nonlinear method that is used for both model discrimination and sensitivity analysis. The methods are tested using up to five alternative parsimoniously constructed models of the ground water system of the Maggia Valley in southern Switzerland. The alternative models differ in their representation of hydraulic conductivity. A new method for graphically representing CV and sensitivity analysis results for complex models is presented and used to evaluate the utility of the efficient statistics. The results indicate that for model selection, the information criteria produce similar results at much smaller computational cost than CV. For identifying important observations, the only obviously inferior linear measure is DSS; the poor performance was expected because DSS does not include the effects of parameter correlation and PCC reveals large parameter correlations. ?? 2007 National Ground Water Association.
New Probe of Departures from General Relativity Using Minkowski Functionals.
Fang, Wenjuan; Li, Baojiu; Zhao, Gong-Bo
2017-05-05
The morphological properties of the large scale structure of the Universe can be fully described by four Minkowski functionals (MFs), which provide important complementary information to other statistical observables such as the widely used 2-point statistics in configuration and Fourier spaces. In this work, for the first time, we present the differences in the morphology of the large scale structure caused by modifications to general relativity (to address the cosmic acceleration problem), by measuring the MFs from N-body simulations of modified gravity and general relativity. We find strong statistical power when using the MFs to constrain modified theories of gravity: with a galaxy survey that has survey volume ∼0.125(h^{-1} Gpc)^{3} and galaxy number density ∼1/(h^{-1} Mpc)^{3}, the two normal-branch Dvali-Gabadadze-Porrati models and the F5 f(R) model that we simulated can be discriminated from the ΛCDM model at a significance level ≳5σ with an individual MF measurement. Therefore, the MF of the large scale structure is potentially a powerful probe of gravity, and its application to real data deserves active exploration.
EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.
Tong, Xiaoxiao; Bentler, Peter M
2013-01-01
Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.
Planck data versus large scale structure: Methods to quantify discordance
NASA Astrophysics Data System (ADS)
Charnock, Tom; Battye, Richard A.; Moss, Adam
2017-06-01
Discordance in the Λ cold dark matter cosmological model can be seen by comparing parameters constrained by cosmic microwave background (CMB) measurements to those inferred by probes of large scale structure. Recent improvements in observations, including final data releases from both Planck and SDSS-III BOSS, as well as improved astrophysical uncertainty analysis of CFHTLenS, allows for an update in the quantification of any tension between large and small scales. This paper is intended, primarily, as a discussion on the quantifications of discordance when comparing the parameter constraints of a model when given two different data sets. We consider Kullback-Leibler divergence, comparison of Bayesian evidences and other statistics which are sensitive to the mean, variance and shape of the distributions. However, as a byproduct, we present an update to the similar analysis in [R. A. Battye, T. Charnock, and A. Moss, Phys. Rev. D 91, 103508 (2015), 10.1103/PhysRevD.91.103508], where we find that, considering new data and treatment of priors, the constraints from the CMB and from a combination of large scale structure (LSS) probes are in greater agreement and any tension only persists to a minor degree. In particular, we find the parameter constraints from the combination of LSS probes which are most discrepant with the Planck 2015 +Pol +BAO parameter distributions can be quantified at a ˜2.55 σ tension using the method introduced in [R. A. Battye, T. Charnock, and A. Moss, Phys. Rev. D 91, 103508 (2015), 10.1103/PhysRevD.91.103508]. If instead we use the distributions constrained by the combination of LSS probes which are in greatest agreement with those from Planck 2015 +Pol +BAO this tension is only 0.76 σ .
Scalable Performance Measurement and Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gamblin, Todd
2009-01-01
Concurrency levels in large-scale, distributed-memory supercomputers are rising exponentially. Modern machines may contain 100,000 or more microprocessor cores, and the largest of these, IBM's Blue Gene/L, contains over 200,000 cores. Future systems are expected to support millions of concurrent tasks. In this dissertation, we focus on efficient techniques for measuring and analyzing the performance of applications running on very large parallel machines. Tuning the performance of large-scale applications can be a subtle and time-consuming task because application developers must measure and interpret data from many independent processes. While the volume of the raw data scales linearly with the number ofmore » tasks in the running system, the number of tasks is growing exponentially, and data for even small systems quickly becomes unmanageable. Transporting performance data from so many processes over a network can perturb application performance and make measurements inaccurate, and storing such data would require a prohibitive amount of space. Moreover, even if it were stored, analyzing the data would be extremely time-consuming. In this dissertation, we present novel methods for reducing performance data volume. The first draws on multi-scale wavelet techniques from signal processing to compress systemwide, time-varying load-balance data. The second uses statistical sampling to select a small subset of running processes to generate low-volume traces. A third approach combines sampling and wavelet compression to stratify performance data adaptively at run-time and to reduce further the cost of sampled tracing. We have integrated these approaches into Libra, a toolset for scalable load-balance analysis. We present Libra and show how it can be used to analyze data from large scientific applications scalably.« less
Lensing corrections to the Eg(z) statistics from large scale structure
NASA Astrophysics Data System (ADS)
Moradinezhad Dizgah, Azadeh; Durrer, Ruth
2016-09-01
We study the impact of the often neglected lensing contribution to galaxy number counts on the Eg statistics which is used to constrain deviations from GR. This contribution affects both the galaxy-galaxy and the convergence-galaxy spectra, while it is larger for the latter. At higher redshifts probed by upcoming surveys, for instance at z = 1.5, neglecting this term induces an error of (25-40)% in the spectra and therefore on the Eg statistics which is constructed from the combination of the two. Moreover, including it, renders the Eg statistics scale and bias-dependent and hence puts into question its very objective.
On the spatial distribution of small heavy particles in homogeneous shear turbulence
NASA Astrophysics Data System (ADS)
Nicolai, C.; Jacob, B.; Piva, R.
2013-08-01
We report on a novel experiment aimed at investigating the effects induced by a large-scale velocity gradient on the turbulent transport of small heavy particles. To this purpose, a homogeneous shear flow at Reλ = 540 and shear parameter S* = 4.5 is set-up and laden with glass spheres whose size d is comparable with the Kolmogorov lengthscale η of the flow (d/η ≈ 1). The particle Stokes number is approximately 0.3. The analysis of the instantaneous particle fields by means of Voronoï diagrams confirms the occurrence of intense turbulent clustering at small scales, as observed in homogeneous isotropic flows. It also indicates that the anisotropy of the velocity fluctuations induces a preferential orientation of the particle clusters. In order to characterize the fine-scale features of the dispersed phase, spatial correlations of the particle field are employed in conjunction with statistical tools recently developed for anisotropic turbulence. The scale-by-scale analysis of the particle field clarifies that isotropy of the particle distribution is tendentially recovered at small separations, even though the signatures of the mean shear persist down to smaller scales as compared to the fluid velocity field.
A New Technique for Personality Scale Construction. Preliminary Findings.
ERIC Educational Resources Information Center
Schaffner, Paul E.; Darlington, Richard B.
Most methods of personality scale construction have clear statistical disadvantages. A hybrid method (Darlington and Bishop, 1966) was found to increase scale validity more than any other method, with large item pools. A simple modification of the Darlington-Bishop method (algebraically and conceptually similar to ridge regression, but…
2009-01-01
Background Insertional mutagenesis is an effective method for functional genomic studies in various organisms. It can rapidly generate easily tractable mutations. A large-scale insertional mutagenesis with the piggyBac (PB) transposon is currently performed in mice at the Institute of Developmental Biology and Molecular Medicine (IDM), Fudan University in Shanghai, China. This project is carried out via collaborations among multiple groups overseeing interconnected experimental steps and generates a large volume of experimental data continuously. Therefore, the project calls for an efficient database system for recording, management, statistical analysis, and information exchange. Results This paper presents a database application called MP-PBmice (insertional mutation mapping system of PB Mutagenesis Information Center), which is developed to serve the on-going large-scale PB insertional mutagenesis project. A lightweight enterprise-level development framework Struts-Spring-Hibernate is used here to ensure constructive and flexible support to the application. The MP-PBmice database system has three major features: strict access-control, efficient workflow control, and good expandability. It supports the collaboration among different groups that enter data and exchange information on daily basis, and is capable of providing real time progress reports for the whole project. MP-PBmice can be easily adapted for other large-scale insertional mutation mapping projects and the source code of this software is freely available at http://www.idmshanghai.cn/PBmice. Conclusion MP-PBmice is a web-based application for large-scale insertional mutation mapping onto the mouse genome, implemented with the widely used framework Struts-Spring-Hibernate. This system is already in use by the on-going genome-wide PB insertional mutation mapping project at IDM, Fudan University. PMID:19958505
Large-Scale Optimization for Bayesian Inference in Complex Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Willcox, Karen; Marzouk, Youssef
2013-11-12
The SAGUARO (Scalable Algorithms for Groundwater Uncertainty Analysis and Robust Optimization) Project focused on the development of scalable numerical algorithms for large-scale Bayesian inversion in complex systems that capitalize on advances in large-scale simulation-based optimization and inversion methods. The project was a collaborative effort among MIT, the University of Texas at Austin, Georgia Institute of Technology, and Sandia National Laboratories. The research was directed in three complementary areas: efficient approximations of the Hessian operator, reductions in complexity of forward simulations via stochastic spectral approximations and model reduction, and employing large-scale optimization concepts to accelerate sampling. The MIT--Sandia component of themore » SAGUARO Project addressed the intractability of conventional sampling methods for large-scale statistical inverse problems by devising reduced-order models that are faithful to the full-order model over a wide range of parameter values; sampling then employs the reduced model rather than the full model, resulting in very large computational savings. Results indicate little effect on the computed posterior distribution. On the other hand, in the Texas--Georgia Tech component of the project, we retain the full-order model, but exploit inverse problem structure (adjoint-based gradients and partial Hessian information of the parameter-to-observation map) to implicitly extract lower dimensional information on the posterior distribution; this greatly speeds up sampling methods, so that fewer sampling points are needed. We can think of these two approaches as ``reduce then sample'' and ``sample then reduce.'' In fact, these two approaches are complementary, and can be used in conjunction with each other. Moreover, they both exploit deterministic inverse problem structure, in the form of adjoint-based gradient and Hessian information of the underlying parameter-to-observation map, to achieve their speedups.« less
Final Report: Large-Scale Optimization for Bayesian Inference in Complex Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghattas, Omar
2013-10-15
The SAGUARO (Scalable Algorithms for Groundwater Uncertainty Analysis and Robust Optimiza- tion) Project focuses on the development of scalable numerical algorithms for large-scale Bayesian inversion in complex systems that capitalize on advances in large-scale simulation-based optimiza- tion and inversion methods. Our research is directed in three complementary areas: efficient approximations of the Hessian operator, reductions in complexity of forward simulations via stochastic spectral approximations and model reduction, and employing large-scale optimization concepts to accelerate sampling. Our efforts are integrated in the context of a challenging testbed problem that considers subsurface reacting flow and transport. The MIT component of the SAGUAROmore » Project addresses the intractability of conventional sampling methods for large-scale statistical inverse problems by devising reduced-order models that are faithful to the full-order model over a wide range of parameter values; sampling then employs the reduced model rather than the full model, resulting in very large computational savings. Results indicate little effect on the computed posterior distribution. On the other hand, in the Texas-Georgia Tech component of the project, we retain the full-order model, but exploit inverse problem structure (adjoint-based gradients and partial Hessian information of the parameter-to- observation map) to implicitly extract lower dimensional information on the posterior distribution; this greatly speeds up sampling methods, so that fewer sampling points are needed. We can think of these two approaches as "reduce then sample" and "sample then reduce." In fact, these two approaches are complementary, and can be used in conjunction with each other. Moreover, they both exploit deterministic inverse problem structure, in the form of adjoint-based gradient and Hessian information of the underlying parameter-to-observation map, to achieve their speedups.« less
The role of large scale motions on passive scalar transport
NASA Astrophysics Data System (ADS)
Dharmarathne, Suranga; Araya, Guillermo; Tutkun, Murat; Leonardi, Stefano; Castillo, Luciano
2014-11-01
We study direct numerical simulation (DNS) of turbulent channel flow at Reτ = 394 to investigate effect of large scale motions on fluctuating temperature field which forms a passive scalar field. Statistical description of the large scale features of the turbulent channel flow is obtained using two-point correlations of velocity components. Two-point correlations of fluctuating temperature field is also examined in order to identify possible similarities between velocity and temperature fields. The two-point cross-correlations betwen the velocity and temperature fluctuations are further analyzed to establish connections between these two fields. In addition, we use proper orhtogonal decompotion (POD) to extract most dominant modes of the fields and discuss the coupling of large scale features of turbulence and the temperature field.
Metabolomic Modularity Analysis (MMA) to Quantify Human Liver Perfusion Dynamics.
Sridharan, Gautham Vivek; Bruinsma, Bote Gosse; Bale, Shyam Sundhar; Swaminathan, Anandh; Saeidi, Nima; Yarmush, Martin L; Uygun, Korkut
2017-11-13
Large-scale -omics data are now ubiquitously utilized to capture and interpret global responses to perturbations in biological systems, such as the impact of disease states on cells, tissues, and whole organs. Metabolomics data, in particular, are difficult to interpret for providing physiological insight because predefined biochemical pathways used for analysis are inherently biased and fail to capture more complex network interactions that span multiple canonical pathways. In this study, we introduce a nov-el approach coined Metabolomic Modularity Analysis (MMA) as a graph-based algorithm to systematically identify metabolic modules of reactions enriched with metabolites flagged to be statistically significant. A defining feature of the algorithm is its ability to determine modularity that highlights interactions between reactions mediated by the production and consumption of cofactors and other hub metabolites. As a case study, we evaluated the metabolic dynamics of discarded human livers using time-course metabolomics data and MMA to identify modules that explain the observed physiological changes leading to liver recovery during subnormothermic machine perfusion (SNMP). MMA was performed on a large scale liver-specific human metabolic network that was weighted based on metabolomics data and identified cofactor-mediated modules that would not have been discovered by traditional metabolic pathway analyses.
Community health worker programs in India: a rights-based review.
Bhatia, Kavita
2014-09-01
This article presents a historical review of national community health worker (CHW) programs in India using a gender- and rights-based lens. The aim is to derive relevant policy implications to stem attrition and enable sustenance of large-scale CHW programs. For the literature review, relevant government policies, minutes of meetings, reports, newspaper articles and statistics were accessed through official websites and a hand search was conducted for studies on the rights-based aspects of large-scale CHW programs. The analysis shows that the CHWs in three successive Indian national CHW programs have consistently asked for reforms in their service conditions, including increased remuneration. Despite an evolution in stakeholder perspectives regarding the rights of CHWs, service reforms are slow. Performance-based payments do not provide the financial security expected by CHWs as demonstrated in the recent Accredited Social Health Activist (ASHA) program. In most countries, CHWs, who are largely women, have never been integrated into the established, salaried team of health system workers. The two hallmark characteristics of CHWs, namely, their volunteer status and the flexibility of their tasks and timings, impede their rights. The consequences of initiating or neglecting standardization should be considered by all countries with large-scale CHW programs like the ASHA program. © Royal Society for Public Health 2014.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramanathan, Arvind; Pullum, Laura L; Steed, Chad A
In this position paper, we describe the design and implementation of the Oak Ridge Bio-surveillance Toolkit (ORBiT): a collection of novel statistical and machine learning tools implemented for (1) integrating heterogeneous traditional (e.g. emergency room visits, prescription sales data, etc.) and non-traditional (social media such as Twitter and Instagram) data sources, (2) analyzing large-scale datasets and (3) presenting the results from the analytics as a visual interface for the end-user to interact and provide feedback. We present examples of how ORBiT can be used to summarize ex- tremely large-scale datasets effectively and how user interactions can translate into the datamore » analytics process for bio-surveillance. We also present a strategy to estimate parameters relevant to dis- ease spread models from near real time data feeds and show how these estimates can be integrated with disease spread models for large-scale populations. We conclude with a perspective on how integrating data and visual analytics could lead to better forecasting and prediction of disease spread as well as improved awareness of disease susceptible regions.« less
Quantifying the seismicity on Taiwan
NASA Astrophysics Data System (ADS)
Wu, Yi-Hsuan; Chen, Chien-Chih; Turcotte, Donald L.; Rundle, John B.
2013-07-01
We quantify the seismicity on the island of Taiwan using the frequency-magnitude statistics of earthquakes since 1900. A break in Gutenberg-Richter scaling for large earthquakes in global seismicity has been observed, this break is also observed in our Taiwan study. The seismic data from the Central Weather Bureau Seismic Network are in good agreement with the Gutenberg-Richter relation taking b ≈ 1 when M < 7. For large earthquakes, M ≥ 7, the seismic data fit Gutenberg-Richter scaling with b ≈ 1.5. If the Gutenberg-Richter scaling for M < 7 earthquakes is extrapolated to larger earthquakes, we would expect a M > 8 earthquake in the study region about every 25 yr. However, our analysis shows a lower frequency of occurrence of large earthquakes so that the expected frequency of M > 8 earthquakes is about 200 yr. The level of seismicity for smaller earthquakes on Taiwan is about 12 times greater than in Southern California and the possibility of a M ≈ 9 earthquake north or south of Taiwan cannot be ruled out. In light of the Fukushima, Japan nuclear disaster, we also discuss the implications of our study for the three operating nuclear power plants on the coast of Taiwan.
Global maps of the magnetic thickness and magnetization of the Earth's lithosphere
NASA Astrophysics Data System (ADS)
Vervelidou, Foteini; Thébault, Erwan
2015-10-01
We have constructed global maps of the large-scale magnetic thickness and magnetization of Earth's lithosphere. Deriving such large-scale maps based on lithospheric magnetic field measurements faces the challenge of the masking effect of the core field. In this study, the maps were obtained through analyses in the spectral domain by means of a new regional spatial power spectrum based on the Revised Spherical Cap Harmonic Analysis (R-SCHA) formalism. A series of regional spectral analyses were conducted covering the entire Earth. The R-SCHA surface power spectrum for each region was estimated using the NGDC-720 spherical harmonic (SH) model of the lithospheric magnetic field, which is based on satellite, aeromagnetic, and marine measurements. These observational regional spectra were fitted to a recently proposed statistical expression of the power spectrum of Earth's lithospheric magnetic field, whose free parameters include the thickness and magnetization of the magnetic sources. The resulting global magnetic thickness map is compared to other crustal and magnetic thickness maps based upon different geophysical data. We conclude that the large-scale magnetic thickness of the lithosphere is on average confined to a layer that does not exceed the Moho.
Visual analysis of inter-process communication for large-scale parallel computing.
Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu
2009-01-01
In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.
NASA Technical Reports Server (NTRS)
Alexandrov, Mikhail Dmitrievic; Geogdzhayev, Igor V.; Tsigaridis, Konstantinos; Marshak, Alexander; Levy, Robert; Cairns, Brian
2016-01-01
A novel model for the variability in aerosol optical thickness (AOT) is presented. This model is based on the consideration of AOT fields as realizations of a stochastic process, that is the exponent of an underlying Gaussian process with a specific autocorrelation function. In this approach AOT fields have lognormal PDFs and structure functions having the correct asymptotic behavior at large scales. The latter is an advantage compared with fractal (scale-invariant) approaches. The simple analytical form of the structure function in the proposed model facilitates its use for the parameterization of AOT statistics derived from remote sensing data. The new approach is illustrated using a month-long global MODIS AOT dataset (over ocean) with 10 km resolution. It was used to compute AOT statistics for sample cells forming a grid with 5deg spacing. The observed shapes of the structure functions indicated that in a large number of cases the AOT variability is split into two regimes that exhibit different patterns of behavior: small-scale stationary processes and trends reflecting variations at larger scales. The small-scale patterns are suggested to be generated by local aerosols within the marine boundary layer, while the large-scale trends are indicative of elevated aerosols transported from remote continental sources. This assumption is evaluated by comparison of the geographical distributions of these patterns derived from MODIS data with those obtained from the GISS GCM. This study shows considerable potential to enhance comparisons between remote sensing datasets and climate models beyond regional mean AOTs.
SNPassoc: an R package to perform whole genome association studies.
González, Juan R; Armengol, Lluís; Solé, Xavier; Guinó, Elisabet; Mercader, Josep M; Estivill, Xavier; Moreno, Víctor
2007-03-01
The popularization of large-scale genotyping projects has led to the widespread adoption of genetic association studies as the tool of choice in the search for single nucleotide polymorphisms (SNPs) underlying susceptibility to complex diseases. Although the analysis of individual SNPs is a relatively trivial task, when the number is large and multiple genetic models need to be explored it becomes necessary a tool to automate the analyses. In order to address this issue, we developed SNPassoc, an R package to carry out most common analyses in whole genome association studies. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Package SNPassoc is available at CRAN from http://cran.r-project.org. A tutorial is available on Bioinformatics online and in http://davinci.crg.es/estivill_lab/snpassoc.
Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M; Grün, Sonja
2017-01-01
Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis.
Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M.; Grün, Sonja
2017-01-01
Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis. PMID:28596729
Benstoem, Frank; Nahrstedt, Andreas; Boehler, Marc; Knopp, Gregor; Montag, David; Siegrist, Hansruedi; Pinnekamp, Johannes
2017-10-01
For reducing organic micropollutants (MP) in municipal wastewater effluents, granular activated carbon (GAC) has been tested in various studies. We did systematic literature research and found 44 studies dealing with the adsorption of MPs (carbamazepine, diclofenac, sulfamethoxazole) from municipal wastewater on GAC in pilot- and large-scale plants. Within our meta-analysis we plot the bed volumes (BV [m 3 water /m 3 GAC ]) until the breakthrough criterion of MP-BV20% was reached, dependent on potential relevant parameters (empty bed contact time EBCT, influent DOC DOC 0 and manufacturing method). Moreover, we performed statistical tests (ANOVAs) to check the results for significance. Single adsorbers operating time differs i.e. by 2500% until breakthrough of diclofenac-BV20% was reached (800-20,000 BV). There was still elimination of the "very well/well" adsorbable MPs such as carbamazepine and diclofenac even when the equilibrium of DOC had already been reached. No strong statistical significance of EBCT and DOC 0 on MP-BV20% could be found due to lack of data and the high heterogeneity of the studies using GAC of different qualities. In further studies, adsorbers should be operated ≫20,000 BV for exact calculation of breakthrough curves, and the following parameters should be recorded: selected MPs; DOC 0; UVA 254 ; EBCT; product name, manufacturing method and raw material of GAC; suspended solids (TSS); backwash interval; backwash program and pressure drop within adsorber. Based on our investigations we generally recommend using reactivated GAC to reduce the environmental impact and to carry out tests on pilot scale to collect reliable data for process design. Copyright © 2017 Elsevier Ltd. All rights reserved.
Data series embedding and scale invariant statistics.
Michieli, I; Medved, B; Ristov, S
2010-06-01
Data sequences acquired from bio-systems such as human gait data, heart rate interbeat data, or DNA sequences exhibit complex dynamics that is frequently described by a long-memory or power-law decay of autocorrelation function. One way of characterizing that dynamics is through scale invariant statistics or "fractal-like" behavior. For quantifying scale invariant parameters of physiological signals several methods have been proposed. Among them the most common are detrended fluctuation analysis, sample mean variance analyses, power spectral density analysis, R/S analysis, and recently in the realm of the multifractal approach, wavelet analysis. In this paper it is demonstrated that embedding the time series data in the high-dimensional pseudo-phase space reveals scale invariant statistics in the simple fashion. The procedure is applied on different stride interval data sets from human gait measurements time series (Physio-Bank data library). Results show that introduced mapping adequately separates long-memory from random behavior. Smaller gait data sets were analyzed and scale-free trends for limited scale intervals were successfully detected. The method was verified on artificially produced time series with known scaling behavior and with the varying content of noise. The possibility for the method to falsely detect long-range dependence in the artificially generated short range dependence series was investigated. (c) 2009 Elsevier B.V. All rights reserved.
SOCRAT Platform Design: A Web Architecture for Interactive Visual Analytics Applications
Kalinin, Alexandr A.; Palanimalai, Selvam; Dinov, Ivo D.
2018-01-01
The modern web is a successful platform for large scale interactive web applications, including visualizations. However, there are no established design principles for building complex visual analytics (VA) web applications that could efficiently integrate visualizations with data management, computational transformation, hypothesis testing, and knowledge discovery. This imposes a time-consuming design and development process on many researchers and developers. To address these challenges, we consider the design requirements for the development of a module-based VA system architecture, adopting existing practices of large scale web application development. We present the preliminary design and implementation of an open-source platform for Statistics Online Computational Resource Analytical Toolbox (SOCRAT). This platform defines: (1) a specification for an architecture for building VA applications with multi-level modularity, and (2) methods for optimizing module interaction, re-usage, and extension. To demonstrate how this platform can be used to integrate a number of data management, interactive visualization, and analysis tools, we implement an example application for simple VA tasks including raw data input and representation, interactive visualization and analysis. PMID:29630069
SOCRAT Platform Design: A Web Architecture for Interactive Visual Analytics Applications.
Kalinin, Alexandr A; Palanimalai, Selvam; Dinov, Ivo D
2017-04-01
The modern web is a successful platform for large scale interactive web applications, including visualizations. However, there are no established design principles for building complex visual analytics (VA) web applications that could efficiently integrate visualizations with data management, computational transformation, hypothesis testing, and knowledge discovery. This imposes a time-consuming design and development process on many researchers and developers. To address these challenges, we consider the design requirements for the development of a module-based VA system architecture, adopting existing practices of large scale web application development. We present the preliminary design and implementation of an open-source platform for Statistics Online Computational Resource Analytical Toolbox (SOCRAT). This platform defines: (1) a specification for an architecture for building VA applications with multi-level modularity, and (2) methods for optimizing module interaction, re-usage, and extension. To demonstrate how this platform can be used to integrate a number of data management, interactive visualization, and analysis tools, we implement an example application for simple VA tasks including raw data input and representation, interactive visualization and analysis.
Autocorrelation and cross-correlation in time series of homicide and attempted homicide
NASA Astrophysics Data System (ADS)
Machado Filho, A.; da Silva, M. F.; Zebende, G. F.
2014-04-01
We propose in this paper to establish the relationship between homicides and attempted homicides by a non-stationary time-series analysis. This analysis will be carried out by Detrended Fluctuation Analysis (DFA), Detrended Cross-Correlation Analysis (DCCA), and DCCA cross-correlation coefficient, ρ(n). Through this analysis we can identify a positive cross-correlation between homicides and attempted homicides. At the same time, looked at from the point of view of autocorrelation (DFA), this analysis can be more informative depending on time scale. For short scale (days), we cannot identify auto-correlations, on the scale of weeks DFA presents anti-persistent behavior, and for long time scales (n>90 days) DFA presents a persistent behavior. Finally, the application of this new type of statistical analysis proved to be efficient and, in this sense, this paper can contribute to a more accurate descriptive statistics of crime.
Understanding the origins of uncertainty in landscape-scale variations of emissions of nitrous oxide
NASA Astrophysics Data System (ADS)
Milne, Alice; Haskard, Kathy; Webster, Colin; Truan, Imogen; Goulding, Keith
2014-05-01
Nitrous oxide is a potent greenhouse gas which is over 300 times more radiatively effective than carbon dioxide. In the UK, the agricultural sector is estimated to be responsible for over 80% of nitrous oxide emissions, with these emissions resulting from livestock and farmers adding nitrogen fertilizer to soils. For the purposes of reporting emissions to the IPCC, the estimates are calculated using simple models whereby readily-available national or international statistics are combined with IPCC default emission factors. The IPCC emission factor for direct emissions of nitrous oxide from soils has a very large uncertainty. This is primarily because the variability of nitrous oxide emissions in space is large and this results in uncertainty that may be regarded as sample noise. To both reduce uncertainty through improved modelling, and to communicate an understanding of this uncertainty, we must understand the origins of the variation. We analysed data on nitrous oxide emission rate and some other soil properties collected from a 7.5-km transect across contrasting land uses and parent materials in eastern England. We investigated the scale-dependence and spatial uniformity of the correlations between soil properties and emission rates from farm to landscape scale using wavelet analysis. The analysis revealed a complex pattern of scale-dependence. Emission rates were strongly correlated with a process-specific function of the water-filled pore space at the coarsest scale and nitrate at intermediate and coarsest scales. We also found significant correlations between pH and emission rates at the intermediate scales. The wavelet analysis showed that these correlations were not spatially uniform and that at certain scales changes in parent material coincided with significant changes in correlation. Our results indicate that, at the landscape scale, nitrate content and water-filled pore space are key soil properties for predicting nitrous oxide emissions and should therefore be incorporated into process models and emission factors for inventory calculations.
Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R
2016-12-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Algorithm for computing descriptive statistics for very large data sets and the exa-scale era
NASA Astrophysics Data System (ADS)
Beekman, Izaak
2017-11-01
An algorithm for Single-point, Parallel, Online, Converging Statistics (SPOCS) is presented. It is suited for in situ analysis that traditionally would be relegated to post-processing, and can be used to monitor the statistical convergence and estimate the error/residual in the quantity-useful for uncertainty quantification too. Today, data may be generated at an overwhelming rate by numerical simulations and proliferating sensing apparatuses in experiments and engineering applications. Monitoring descriptive statistics in real time lets costly computations and experiments be gracefully aborted if an error has occurred, and monitoring the level of statistical convergence allows them to be run for the shortest amount of time required to obtain good results. This algorithm extends work by Pébay (Sandia Report SAND2008-6212). Pébay's algorithms are recast into a converging delta formulation, with provably favorable properties. The mean, variance, covariances and arbitrary higher order statistical moments are computed in one pass. The algorithm is tested using Sillero, Jiménez, & Moser's (2013, 2014) publicly available UPM high Reynolds number turbulent boundary layer data set, demonstrating numerical robustness, efficiency and other favorable properties.
Xia, Li C; Ai, Dongmei; Cram, Jacob A; Liang, Xiaoyi; Fuhrman, Jed A; Sun, Fengzhu
2015-09-21
Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.
Climate Change and Macro-Economic Cycles in Pre-Industrial Europe
Pei, Qing; Zhang, David D.; Lee, Harry F.; Li, Guodong
2014-01-01
Climate change has been proven to be the ultimate cause of social crisis in pre-industrial Europe at a large scale. However, detailed analyses on climate change and macro-economic cycles in the pre-industrial era remain lacking, especially within different temporal scales. Therefore, fine-grained, paleo-climate, and economic data were employed with statistical methods to quantitatively assess the relations between climate change and agrarian economy in Europe during AD 1500 to 1800. In the study, the Butterworth filter was adopted to filter the data series into a long-term trend (low-frequency) and short-term fluctuations (high-frequency). Granger Causality Analysis was conducted to scrutinize the associations between climate change and macro-economic cycle at different frequency bands. Based on quantitative results, climate change can only show significant effects on the macro-economic cycle within the long-term. In terms of the short-term effects, society can relieve the influences from climate variations by social adaptation methods and self-adjustment mechanism. On a large spatial scale, temperature holds higher importance for the European agrarian economy than precipitation. By examining the supply-demand mechanism in the grain market, population during the study period acted as the producer in the long term, whereas as the consumer in the short term. These findings merely reflect the general interactions between climate change and macro-economic cycles at the large spatial region with a long-term study period. The findings neither illustrate individual incidents that can temporarily distort the agrarian economy nor explain some specific cases. In the study, the scale thinking in the analysis is raised as an essential methodological issue for the first time to interpret the associations between climatic impact and macro-economy in the past agrarian society within different temporal scales. PMID:24516601
Climate change and macro-economic cycles in pre-industrial europe.
Pei, Qing; Zhang, David D; Lee, Harry F; Li, Guodong
2014-01-01
Climate change has been proven to be the ultimate cause of social crisis in pre-industrial Europe at a large scale. However, detailed analyses on climate change and macro-economic cycles in the pre-industrial era remain lacking, especially within different temporal scales. Therefore, fine-grained, paleo-climate, and economic data were employed with statistical methods to quantitatively assess the relations between climate change and agrarian economy in Europe during AD 1500 to 1800. In the study, the Butterworth filter was adopted to filter the data series into a long-term trend (low-frequency) and short-term fluctuations (high-frequency). Granger Causality Analysis was conducted to scrutinize the associations between climate change and macro-economic cycle at different frequency bands. Based on quantitative results, climate change can only show significant effects on the macro-economic cycle within the long-term. In terms of the short-term effects, society can relieve the influences from climate variations by social adaptation methods and self-adjustment mechanism. On a large spatial scale, temperature holds higher importance for the European agrarian economy than precipitation. By examining the supply-demand mechanism in the grain market, population during the study period acted as the producer in the long term, whereas as the consumer in the short term. These findings merely reflect the general interactions between climate change and macro-economic cycles at the large spatial region with a long-term study period. The findings neither illustrate individual incidents that can temporarily distort the agrarian economy nor explain some specific cases. In the study, the scale thinking in the analysis is raised as an essential methodological issue for the first time to interpret the associations between climatic impact and macro-economy in the past agrarian society within different temporal scales.
Computational Complexity of Bosons in Linear Networks
2017-03-01
photon statistics while strongly reducing emission probabilities: thus leading experimental teams pursuing large-scale BOSONSAMPLING have faced a hard...Potentially, this could motivate new validation protocols exploiting statistics that include this temporal degree of freedom. The impact of...photon- statistics polluted by higher-order terms, which can be mistakenly interpreted as decreased photon-indistinguishability. In fact, in many cases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kollias, Pavlos
This is a multi-institutional, collaborative project using a three-tier modeling approach to bridge field observations and global cloud-permitting models, with emphases on cloud population structural evolution through various large-scale environments. Our contribution was in data analysis for the generation of high value cloud and precipitation products and derive cloud statistics for model validation. There are two areas in data analysis that we contributed: the development of a synergistic cloud and precipitation cloud classification that identify different cloud (e.g. shallow cumulus, cirrus) and precipitation types (shallow, deep, convective, stratiform) using profiling ARM observations and the development of a quantitative precipitation ratemore » retrieval algorithm using profiling ARM observations. Similar efforts have been developed in the past for precipitation (weather radars), but not for the millimeter-wavelength (cloud) radar deployed at the ARM sites.« less
Inference in the age of big data: Future perspectives on neuroscience.
Bzdok, Danilo; Yeo, B T Thomas
2017-07-15
Neuroscience is undergoing faster changes than ever before. Over 100 years our field qualitatively described and invasively manipulated single or few organisms to gain anatomical, physiological, and pharmacological insights. In the last 10 years neuroscience spawned quantitative datasets of unprecedented breadth (e.g., microanatomy, synaptic connections, and optogenetic brain-behavior assays) and size (e.g., cognition, brain imaging, and genetics). While growing data availability and information granularity have been amply discussed, we direct attention to a less explored question: How will the unprecedented data richness shape data analysis practices? Statistical reasoning is becoming more important to distill neurobiological knowledge from healthy and pathological brain measurements. We argue that large-scale data analysis will use more statistical models that are non-parametric, generative, and mixing frequentist and Bayesian aspects, while supplementing classical hypothesis testing with out-of-sample predictions. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Visual analysis of geocoded twin data puts nature and nurture on the map.
Davis, O S P; Haworth, C M A; Lewis, C M; Plomin, R
2012-09-01
Twin studies allow us to estimate the relative contributions of nature and nurture to human phenotypes by comparing the resemblance of identical and fraternal twins. Variation in complex traits is a balance of genetic and environmental influences; these influences are typically estimated at a population level. However, what if the balance of nature and nurture varies depending on where we grow up? Here we use statistical and visual analysis of geocoded data from over 6700 families to show that genetic and environmental contributions to 45 childhood cognitive and behavioral phenotypes vary geographically in the United Kingdom. This has implications for detecting environmental exposures that may interact with the genetic influences on complex traits, and for the statistical power of samples recruited for genetic association studies. More broadly, our experience demonstrates the potential for collaborative exploratory visualization to act as a lingua franca for large-scale interdisciplinary research.
NASA Astrophysics Data System (ADS)
Nascetti, A.; Di Rita, M.; Ravanelli, R.; Amicuzi, M.; Esposito, S.; Crespi, M.
2017-05-01
The high-performance cloud-computing platform Google Earth Engine has been developed for global-scale analysis based on the Earth observation data. In particular, in this work, the geometric accuracy of the two most used nearly-global free DSMs (SRTM and ASTER) has been evaluated on the territories of four American States (Colorado, Michigan, Nevada, Utah) and one Italian Region (Trentino Alto- Adige, Northern Italy) exploiting the potentiality of this platform. These are large areas characterized by different terrain morphology, land covers and slopes. The assessment has been performed using two different reference DSMs: the USGS National Elevation Dataset (NED) and a LiDAR acquisition. The DSMs accuracy has been evaluated through computation of standard statistic parameters, both at global scale (considering the whole State/Region) and in function of the terrain morphology using several slope classes. The geometric accuracy in terms of Standard deviation and NMAD, for SRTM range from 2-3 meters in the first slope class to about 45 meters in the last one, whereas for ASTER, the values range from 5-6 to 30 meters. In general, the performed analysis shows a better accuracy for the SRTM in the flat areas whereas the ASTER GDEM is more reliable in the steep areas, where the slopes increase. These preliminary results highlight the GEE potentialities to perform DSM assessment on a global scale.
Spatiotemporal Drought Analysis and Drought Indices Comparison in India
NASA Astrophysics Data System (ADS)
Janardhanan, A.
2017-12-01
Droughts and floods are an ever-occurring phenomenon that has been wreaking havoc on humans since the start of time. As droughts are on a very large scale, studying them within a regional context can minimize confounding factors such as climate change. Droughts and floods are extremely erratic and very difficult to predict and therefore necessitate modeling through advanced statistics. The SPI (Standard Precipitation Index) and the SPEI (Standard Precipitation Evapotranspiration Index) are two ways to temporally model drought and flood patterns across each metrological sub basin in India over a variety of different time scales. SPI only accounts for precipitation values, while the SPEI accounts for both precipitation and temperature and is commonly regarded as a more reliable drought index. Using monthly rainfall and temperature data from 1871-2016, these two indices were calculated. The results depict the drought and flood severity index, length of drought, and average SPI or SPEI value for each meteorological sub region in India. A Wilcox Ranksum test was then conducted to determine whether these two indices differed over the long term for drought analysis. The drought return periods were analyzed to determine if the population mean differed between the SPI and SPEI values. Our analysis found no statistical difference between SPI and SPEI with regards to long-term drought analysis. This indicates that temperature is not needed when modeling drought on a long-term time scale and that SPI is just as effective as SPEI, which has the potential to save a lot of time and resources on calculating drought indices.
Craven, Stephen; Shirsat, Nishikant; Whelan, Jessica; Glennon, Brian
2013-01-01
A Monod kinetic model, logistic equation model, and statistical regression model were developed for a Chinese hamster ovary cell bioprocess operated under three different modes of operation (batch, bolus fed-batch, and continuous fed-batch) and grown on two different bioreactor scales (3 L bench-top and 15 L pilot-scale). The Monod kinetic model was developed for all modes of operation under study and predicted cell density, glucose glutamine, lactate, and ammonia concentrations well for the bioprocess. However, it was computationally demanding due to the large number of parameters necessary to produce a good model fit. The transferability of the Monod kinetic model structure and parameter set across bioreactor scales and modes of operation was investigated and a parameter sensitivity analysis performed. The experimentally determined parameters had the greatest influence on model performance. They changed with scale and mode of operation, but were easily calculated. The remaining parameters, which were fitted using a differential evolutionary algorithm, were not as crucial. Logistic equation and statistical regression models were investigated as alternatives to the Monod kinetic model. They were less computationally intensive to develop due to the absence of a large parameter set. However, modeling of the nutrient and metabolite concentrations proved to be troublesome due to the logistic equation model structure and the inability of both models to incorporate a feed. The complexity, computational load, and effort required for model development has to be balanced with the necessary level of model sophistication when choosing which model type to develop for a particular application. Copyright © 2012 American Institute of Chemical Engineers (AIChE).
Grid-Enabled Quantitative Analysis of Breast Cancer
2009-10-01
large-scale, multi-modality computerized image analysis . The central hypothesis of this research is that large-scale image analysis for breast cancer...pilot study to utilize large scale parallel Grid computing to harness the nationwide cluster infrastructure for optimization of medical image ... analysis parameters. Additionally, we investigated the use of cutting edge dataanalysis/ mining techniques as applied to Ultrasound, FFDM, and DCE-MRI Breast
The cosmological principle is not in the sky
NASA Astrophysics Data System (ADS)
Park, Chan-Gyung; Hyun, Hwasu; Noh, Hyerim; Hwang, Jai-chan
2017-08-01
The homogeneity of matter distribution at large scales, known as the cosmological principle, is a central assumption in the standard cosmological model. The case is testable though, thus no longer needs to be a principle. Here we perform a test for spatial homogeneity using the Sloan Digital Sky Survey Luminous Red Galaxies (LRG) sample by counting galaxies within a specified volume with the radius scale varying up to 300 h-1 Mpc. We directly confront the large-scale structure data with the definition of spatial homogeneity by comparing the averages and dispersions of galaxy number counts with allowed ranges of the random distribution with homogeneity. The LRG sample shows significantly larger dispersions of number counts than the random catalogues up to 300 h-1 Mpc scale, and even the average is located far outside the range allowed in the random distribution; the deviations are statistically impossible to be realized in the random distribution. This implies that the cosmological principle does not hold even at such large scales. The same analysis of mock galaxies derived from the N-body simulation, however, suggests that the LRG sample is consistent with the current paradigm of cosmology, thus the simulation is also not homogeneous in that scale. We conclude that the cosmological principle is neither in the observed sky nor demanded to be there by the standard cosmological world model. This reveals the nature of the cosmological principle adopted in the modern cosmology paradigm, and opens a new field of research in theoretical cosmology.
Broken Symmetries and Magnetic Dynamos
NASA Technical Reports Server (NTRS)
Shebalin, John V.
2007-01-01
Phase space symmetries inherent in the statistical theory of ideal magnetohydrodynamic (MHD) turbulence are known to be broken dynamically to produce large-scale coherent magnetic structure. Here, results of a numerical study of decaying MHD turbulence are presented that show large-scale coherent structure also arises and persists in the presence of dissipation. Dynamically broken symmetries in MHD turbulence may thus play a fundamental role in the dynamo process.
ERIC Educational Resources Information Center
EdSource, 2010
2010-01-01
This appendix focuses on the descriptive statistics of the middle study schools that participated in the "Gaining Ground in the Middle Grades: Why Some Schools Do Better. A Large-Scale Study of Middle Grades Practices and Student Outcomes. Initial Research." This appendix contains the following figures: (1) Student…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Busuioc, A.; Storch, H. von; Schnur, R.
Empirical downscaling procedures relate large-scale atmospheric features with local features such as station rainfall in order to facilitate local scenarios of climate change. The purpose of the present paper is twofold: first, a downscaling technique is used as a diagnostic tool to verify the performance of climate models on the regional scale; second, a technique is proposed for verifying the validity of empirical downscaling procedures in climate change applications. The case considered is regional seasonal precipitation in Romania. The downscaling model is a regression based on canonical correlation analysis between observed station precipitation and European-scale sea level pressure (SLP). Themore » climate models considered here are the T21 and T42 versions of the Hamburg ECHAM3 atmospheric GCM run in time-slice mode. The climate change scenario refers to the expected time of doubled carbon dioxide concentrations around the year 2050. Generally, applications of statistical downscaling to climate change scenarios have been based on the assumption that the empirical link between the large-scale and regional parameters remains valid under a changed climate. In this study, a rationale is proposed for this assumption by showing the consistency of the 2 x CO{sub 2} GCM scenarios in winter, derived directly from the gridpoint data, with the regional scenarios obtained through empirical downscaling. Since the skill of the GCMs in regional terms is already established, it is concluded that the downscaling technique is adequate for describing climatically changing regional and local conditions, at least for precipitation in Romania during winter.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aartsen, M. G.; Abraham, K.; Ackermann, M.
The IceCube Neutrino Observatory accumulated a total of 318 billion cosmic-ray-induced muon events between 2009 May and 2015 May. This data set was used for a detailed analysis of the sidereal anisotropy in the arrival directions of cosmic rays in the TeV to PeV energy range. The observed global sidereal anisotropy features large regions of relative excess and deficit, with amplitudes of the order of 10{sup 3} up to about 100 TeV. A decomposition of the arrival direction distribution into spherical harmonics shows that most of the power is contained in the low-multipole ( ℓ ≤ 4) moments. However, highermore » multipole components are found to be statistically significant down to an angular scale of less than 10°, approaching the angular resolution of the detector. Above 100 TeV, a change in the morphology of the arrival direction distribution is observed, and the anisotropy is characterized by a wide relative deficit whose amplitude increases with primary energy up to at least 5 PeV, the highest energies currently accessible to IceCube. No time dependence of the large- and small-scale structures is observed in the period of six years covered by this analysis. The high-statistics data set reveals more details of the properties of the anisotropy and is potentially able to shed light on the various physical processes that are responsible for the complex angular structure and energy evolution.« less
Nonlinear detection of large-scale transitions in Plio-Pleistocene African climate
NASA Astrophysics Data System (ADS)
Donges, J. F.; Donner, R. V.; Trauth, M. H.; Marwan, N.; Schellnhuber, H. J.; Kurths, J.
2011-12-01
Potential paleoclimatic driving mechanisms acting on human development present an open problem of cross-disciplinary scientific interest. The analysis of paleoclimate archives encoding the environmental variability in East Africa during the last 5 Ma (million years) has triggered an ongoing debate about possible candidate processes and evolutionary mechanisms. In this work, we apply a novel nonlinear statistical technique, recurrence network analysis, to three distinct marine records of terrigenous dust flux. Our method enables us to identify three epochs with transitions between qualitatively different types of environmental variability in North and East Africa during the (i) Mid-Pliocene (3.35-3.15 Ma BP (before present)), (ii) Early Pleistocene (2.25-1.6 Ma BP), and (iii) Mid-Pleistocene (1.1-0.7 Ma BP). A deeper examination of these transition periods reveals potential climatic drivers, including (i) large-scale changes in ocean currents due to a spatial shift of the Indonesian throughflow in combination with an intensification of Northern Hemisphere glaciation, (ii) a global reorganization of the atmospheric Walker circulation induced in the tropical Pacific and Indian Ocean, and (iii) shifts in the dominating temporal variability pattern of glacial activity during the Mid-Pleistocene, respectively. A statistical reexamination of the available fossil record demonstrates a remarkable coincidence between the detected transition periods and major steps in hominin evolution. This suggests that the observed shifts between more regular and more erratic environmental variability have acted as a trigger for rapid change in the development of humankind in Africa.
NASA Astrophysics Data System (ADS)
Aartsen, M. G.; Abraham, K.; Ackermann, M.; Adams, J.; Aguilar, J. A.; Ahlers, M.; Ahrens, M.; Altmann, D.; Anderson, T.; Ansseau, I.; Anton, G.; Archinger, M.; Arguelles, C.; Arlen, T. C.; Auffenberg, J.; Bai, X.; Barwick, S. W.; Baum, V.; Bay, R.; Beatty, J. J.; Becker Tjus, J.; Becker, K.-H.; Beiser, E.; BenZvi, S.; Berghaus, P.; Berley, D.; Bernardini, E.; Bernhard, A.; Besson, D. Z.; Binder, G.; Bindig, D.; Bissok, M.; Blaufuss, E.; Blumenthal, J.; Boersma, D. J.; Bohm, C.; Börner, M.; Bos, F.; Bose, D.; Böser, S.; Botner, O.; Braun, J.; Brayeur, L.; Bretz, H.-P.; Buzinsky, N.; Casey, J.; Casier, M.; Cheung, E.; Chirkin, D.; Christov, A.; Clark, K.; Classen, L.; Coenders, S.; Collin, G. H.; Conrad, J. M.; Cowen, D. F.; Cruz Silva, A. H.; Daughhetee, J.; Davis, J. C.; Day, M.; de André, J. P. A. M.; De Clercq, C.; del Pino Rosendo, E.; Dembinski, H.; De Ridder, S.; Desiati, P.; de Vries, K. D.; de Wasseige, G.; de With, M.; DeYoung, T.; Díaz-Vélez, J. C.; di Lorenzo, V.; Dujmovic, H.; Dumm, J. P.; Dunkman, M.; Eberhardt, B.; Ehrhardt, T.; Eichmann, B.; Euler, S.; Evenson, P. A.; Fahey, S.; Fazely, A. R.; Feintzeig, J.; Felde, J.; Filimonov, K.; Finley, C.; Flis, S.; Fösig, C.-C.; Fuchs, T.; Gaisser, T. K.; Gaior, R.; Gallagher, J.; Gerhardt, L.; Ghorbani, K.; Gier, D.; Gladstone, L.; Glagla, M.; Glüsenkamp, T.; Goldschmidt, A.; Golup, G.; Gonzalez, J. G.; Góra, D.; Grant, D.; Griffith, Z.; Ha, C.; Haack, C.; Haj Ismail, A.; Hallgren, A.; Halzen, F.; Hansen, E.; Hansmann, B.; Hansmann, T.; Hanson, K.; Hebecker, D.; Heereman, D.; Helbing, K.; Hellauer, R.; Hickford, S.; Hignight, J.; Hill, G. C.; Hoffman, K. D.; Hoffmann, R.; Holzapfel, K.; Homeier, A.; Hoshina, K.; Huang, F.; Huber, M.; Huelsnitz, W.; Hulth, P. O.; Hultqvist, K.; In, S.; Ishihara, A.; Jacobi, E.; Japaridze, G. S.; Jeong, M.; Jero, K.; Jones, B. J. P.; Jurkovic, M.; Kappes, A.; Karg, T.; Karle, A.; Katz, U.; Kauer, M.; Keivani, A.; Kelley, J. L.; Kemp, J.; Kheirandish, A.; Kim, M.; Kintscher, T.; Kiryluk, J.; Klein, S. R.; Kohnen, G.; Koirala, R.; Kolanoski, H.; Konietz, R.; Köpke, L.; Kopper, C.; Kopper, S.; Koskinen, D. J.; Kowalski, M.; Krings, K.; Kroll, G.; Kroll, M.; Krückl, G.; Kunnen, J.; Kunwar, S.; Kurahashi, N.; Kuwabara, T.; Labare, M.; Lanfranchi, J. L.; Larson, M. J.; Lennarz, D.; Lesiak-Bzdak, M.; Leuermann, M.; Leuner, J.; Lu, L.; Lünemann, J.; Madsen, J.; Maggi, G.; Mahn, K. B. M.; Mandelartz, M.; Maruyama, R.; Mase, K.; Matis, H. S.; Maunu, R.; McNally, F.; Meagher, K.; Medici, M.; Meier, M.; Meli, A.; Menne, T.; Merino, G.; Meures, T.; Miarecki, S.; Middell, E.; Mohrmann, L.; Montaruli, T.; Morse, R.; Nahnhauer, R.; Naumann, U.; Neer, G.; Niederhausen, H.; Nowicki, S. C.; Nygren, D. R.; Obertacke Pollmann, A.; Olivas, A.; Omairat, A.; O'Murchadha, A.; Palczewski, T.; Pandya, H.; Pankova, D. V.; Paul, L.; Pepper, J. A.; Pérez de los Heros, C.; Pfendner, C.; Pieloth, D.; Pinat, E.; Posselt, J.; Price, P. B.; Przybylski, G. T.; Quinnan, M.; Raab, C.; Rädel, L.; Rameez, M.; Rawlins, K.; Reimann, R.; Relich, M.; Resconi, E.; Rhode, W.; Richman, M.; Richter, S.; Riedel, B.; Robertson, S.; Rongen, M.; Rott, C.; Ruhe, T.; Ryckbosch, D.; Sabbatini, L.; Sander, H.-G.; Sandrock, A.; Sandroos, J.; Sarkar, S.; Schatto, K.; Schimp, M.; Schlunder, P.; Schmidt, T.; Schoenen, S.; Schöneberg, S.; Schönwald, A.; Schumacher, L.; Seckel, D.; Seunarine, S.; Soldin, D.; Song, M.; Spiczak, G. M.; Spiering, C.; Stahlberg, M.; Stamatikos, M.; Stanev, T.; Stasik, A.; Steuer, A.; Stezelberger, T.; Stokstad, R. G.; Stössl, A.; Ström, R.; Strotjohann, N. L.; Sullivan, G. W.; Sutherland, M.; Taavola, H.; Taboada, I.; Tatar, J.; Ter-Antonyan, S.; Terliuk, A.; Tešić, G.; Tilav, S.; Toale, P. A.; Tobin, M. N.; Toscano, S.; Tosi, D.; Tselengidou, M.; Turcati, A.; Unger, E.; Usner, M.; Vallecorsa, S.; Vandenbroucke, J.; van Eijndhoven, N.; Vanheule, S.; van Santen, J.; Veenkamp, J.; Vehring, M.; Voge, M.; Vraeghe, M.; Walck, C.; Wallace, A.; Wallraff, M.; Wandkowsky, N.; Weaver, Ch.; Wendt, C.; Westerhoff, S.; Whelan, B. J.; Wiebe, K.; Wiebusch, C. H.; Wille, L.; Williams, D. R.; Wills, L.; Wissing, H.; Wolf, M.; Wood, T. R.; Woschnagg, K.; Xu, D. L.; Xu, X. W.; Xu, Y.; Yanez, J. P.; Yodh, G.; Yoshida, S.; Zoll, M.; IceCube Collaboration
2016-08-01
The IceCube Neutrino Observatory accumulated a total of 318 billion cosmic-ray-induced muon events between 2009 May and 2015 May. This data set was used for a detailed analysis of the sidereal anisotropy in the arrival directions of cosmic rays in the TeV to PeV energy range. The observed global sidereal anisotropy features large regions of relative excess and deficit, with amplitudes of the order of 10-3 up to about 100 TeV. A decomposition of the arrival direction distribution into spherical harmonics shows that most of the power is contained in the low-multipole (ℓ ≤ 4) moments. However, higher multipole components are found to be statistically significant down to an angular scale of less than 10°, approaching the angular resolution of the detector. Above 100 TeV, a change in the morphology of the arrival direction distribution is observed, and the anisotropy is characterized by a wide relative deficit whose amplitude increases with primary energy up to at least 5 PeV, the highest energies currently accessible to IceCube. No time dependence of the large- and small-scale structures is observed in the period of six years covered by this analysis. The high-statistics data set reveals more details of the properties of the anisotropy and is potentially able to shed light on the various physical processes that are responsible for the complex angular structure and energy evolution.
SINGH, G. D.; McNAMARA JR, J. A.; LOZANOFF, S.
1997-01-01
This study determines deformations of the midface that contribute to a class III appearance, employing thin-plate spline analysis. A total of 135 lateral cephalographs of prepubertal children of European-American descent with either class III malocclusions or a class I molar occlusion were compared. The cephalographs were traced and checked, and 7 homologous landmarks of the midface were identified and digitised. The data sets were scaled to an equivalent size and subjected to Procrustes analysis. These statistical tests indicated significant differences (P<0.05) between the averaged class I and class III morphologies. Thin-plate spline analysis indicated that both affine and nonaffine transformations contribute towards the total spline for the averaged midfacial configuration. For nonaffine transformations, partial warp 3 had the highest magnitude, indicating the large scale deformations of the midfacial configuration. These deformations affected the palatal landmarks, and were associated with compression of the midfacial complex in the anteroposterior plane predominantly. Partial warp 4 produced some vertical compression of the posterior aspect of the midfacial complex whereas partial warps 1 and 2 indicated localised shape changes of the maxillary alveolus region. Large spatial-scale deformations therefore affect the midfacial complex in an anteroposterior axis, in combination with vertical compression and localised distortions. These deformations may represent a developmental diminution of the palatal complex anteroposteriorly that, allied with vertical shortening of midfacial height posteriorly, results in class III malocclusions with a retrusive midfacial profile. PMID:9449078
Singh, G D; McNamara, J A; Lozanoff, S
1997-11-01
This study determines deformations of the midface that contribute to a class III appearance, employing thinplate spline analysis. A total of 135 lateral cephalographs of prepubertal children of European-American descent with either class III malocclusions or a class I molar occlusion were compared. The cephalographs were traced and checked, and 7 homologous landmarks of the midface were identified and digitised. The data sets were scaled to an equivalent size and subjected to Procrustes analysis. These statistical tests indicated significant differences (P < 0.05) between the averaged class I and class III morphologies. Thinplate spline analysis indicated that both affine and nonaffine transformations contribute towards the total spline for the averaged midfacial configuration. For nonaffine transformations, partial warp 3 had the highest magnitude, indicating the large scale deformations of the midfacial configuration. These deformations affected the palatal landmarks, and were associated with compression of the midfacial complex in the anteroposterior plane predominantly. Partial warp 4 produced some vertical compression of the posterior aspect of the midfacial complex whereas partial warps 1 and 2 indicated localised shape changes of the maxillary alveolus region. large spatial-scale deformations therefore affect the midfacial complex in an anteroposterior axis, in combination with vertical compression and localised distortions. These deformations may represent a developmental diminution of the palatal complex anteroposteriorly that, allied with vertical shortening of midfacial height posteriorly, results in class III malocclusions with a retrusive midfacial profile.
ERIC Educational Resources Information Center
Burstein, Leigh
Two specific methods of analysis in large-scale evaluations are considered: structural equation modeling and selection modeling/analysis of non-equivalent control group designs. Their utility in large-scale educational program evaluation is discussed. The examination of these methodological developments indicates how people (evaluators,…
Impact analysis of two kinds of failure strategies in Beijing road transportation network
NASA Astrophysics Data System (ADS)
Zhang, Zundong; Xu, Xiaoyang; Zhang, Zhaoran; Zhou, Huijuan
The Beijing road transportation network (BRTN), as a large-scale technological network, exhibits very complex and complicate features during daily periods. And it has been widely highlighted that how statistical characteristics (i.e. average path length and global network efficiency) change while the network evolves. In this paper, by using different modeling concepts, three kinds of network models of BRTN namely the abstract network model, the static network model with road mileage as weights and the dynamic network model with travel time as weights — are constructed, respectively, according to the topological data and the real detected flow data. The degree distribution of the three kinds of network models are analyzed, which proves that the urban road infrastructure network and the dynamic network behavior like scale-free networks. By analyzing and comparing the important statistical characteristics of three models under random attacks and intentional attacks, it shows that the urban road infrastructure network and the dynamic network of BRTN are both robust and vulnerable.
Dynamic Quantitative Trait Locus Analysis of Plant Phenomic Data.
Li, Zitong; Sillanpää, Mikko J
2015-12-01
Advanced platforms have recently become available for automatic and systematic quantification of plant growth and development. These new techniques can efficiently produce multiple measurements of phenotypes over time, and introduce time as an extra dimension to quantitative trait locus (QTL) studies. Functional mapping utilizes a class of statistical models for identifying QTLs associated with the growth characteristics of interest. A major benefit of functional mapping is that it integrates information over multiple timepoints, and therefore could increase the statistical power for QTL detection. We review the current development of computationally efficient functional mapping methods which provide invaluable tools for analyzing large-scale timecourse data that are readily available in our post-genome era. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Yang, Y.; Gan, T. Y.; Tan, X.
2017-12-01
In the past few decades, there have been more extreme climate events around the world, and Canada has also suffered from numerous extreme precipitation events. In this paper, trend analysis, change point analysis, probability distribution function, principal component analysis and wavelet analysis were used to investigate the spatial and temporal patterns of extreme precipitation in Canada. Ten extreme precipitation indices were calculated using long-term daily precipitation data from 164 gauging stations. Several large-scale climate patterns such as El Niño-Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), Pacific-North American (PNA), and North Atlantic Oscillation (NAO) were selected to analyze the relationships between extreme precipitation and climate indices. Convective Available Potential Energy (CAPE), specific humidity, and surface temperature were employed to investigate the potential causes of the trends.The results show statistically significant positive trends for most indices, which indicate increasing extreme precipitation. The majority of indices display more increasing trends along the southern border of Canada while decreasing trends dominate in the central Canadian Prairies (CP). In addition, strong connections are found between the extreme precipitation and climate indices and the effects of climate pattern differ for each region. The seasonal CAPE, specific humidity, and temperature are found to be closely related to Canadian extreme precipitation.
Pelvic-floor strength in women with incontinence as assessed by the brink scale.
FitzGerald, Mary P; Burgio, Kathryn L; Borello-France, Diane F; Menefee, Shawn A; Schaffer, Joseph; Kraus, Stephen; Mallett, Veronica T; Xu, Yan
2007-10-01
The purpose of this study was to describe how clinical pelvic-floor muscle (PFM) strength (force-generating capacity) is related to patient characteristics, lower urinary tract symptoms, and fecal incontinence symptoms. Data were obtained from 643 women who were participating in a randomized surgical trial for treatment of stress urinary incontinence. Patient demographic variables, baseline urinary and fecal incontinence symptom questionnaires, urodynamic data and urinary diary data, pad test results, and standardized assessment of pelvic organ support were compared with PFM strength as described by the Brink scoring system. Bivariate analysis of factors associated with the Brink scale score was done using analysis of variance and linear regression. Multivariate analysis included patient variables that were significant on bivariate analysis. The mean Brink scale score was 9 (SD=2) and did not vary widely in this large, but highly select, patient sample. We found a weak, but statistically strong, relationship between age and Brink score. Brink scores were not related to diary and pad test measures of incontinence severity. Overall, PFM strength was good in this sample of women with stress incontinence. Scores tended to be similar, and it is possible that the Brink scale does not reflect real clinical differences in PFM strength.
DOE Office of Scientific and Technical Information (OSTI.GOV)
al-Saffar, Sinan; Joslyn, Cliff A.; Chappell, Alan R.
As semantic datasets grow to be very large and divergent, there is a need to identify and exploit their inherent semantic structure for discovery and optimization. Towards that end, we present here a novel methodology to identify the semantic structures inherent in an arbitrary semantic graph dataset. We first present the concept of an extant ontology as a statistical description of the semantic relations present amongst the typed entities modeled in the graph. This serves as a model of the underlying semantic structure to aid in discovery and visualization. We then describe a method of ontological scaling in which themore » ontology is employed as a hierarchical scaling filter to infer different resolution levels at which the graph structures are to be viewed or analyzed. We illustrate these methods on three large and publicly available semantic datasets containing more than one billion edges each. Keywords-Semantic Web; Visualization; Ontology; Multi-resolution Data Mining;« less
Statistical physics approaches to financial fluctuations
NASA Astrophysics Data System (ADS)
Wang, Fengzhong
2009-12-01
Complex systems attract many researchers from various scientific fields. Financial markets are one of these widely studied complex systems. Statistical physics, which was originally developed to study large systems, provides novel ideas and powerful methods to analyze financial markets. The study of financial fluctuations characterizes market behavior, and helps to better understand the underlying market mechanism. Our study focuses on volatility, a fundamental quantity to characterize financial fluctuations. We examine equity data of the entire U.S. stock market during 2001 and 2002. To analyze the volatility time series, we develop a new approach, called return interval analysis, which examines the time intervals between two successive volatilities exceeding a given value threshold. We find that the return interval distribution displays scaling over a wide range of thresholds. This scaling is valid for a range of time windows, from one minute up to one day. Moreover, our results are similar for commodities, interest rates, currencies, and for stocks of different countries. Further analysis shows some systematic deviations from a scaling law, which we can attribute to nonlinear correlations in the volatility time series. We also find a memory effect in return intervals for different time scales, which is related to the long-term correlations in the volatility. To further characterize the mechanism of price movement, we simulate the volatility time series using two different models, fractionally integrated generalized autoregressive conditional heteroscedasticity (FIGARCH) and fractional Brownian motion (fBm), and test these models with the return interval analysis. We find that both models can mimic time memory but only fBm shows scaling in the return interval distribution. In addition, we examine the volatility of daily opening to closing and of closing to opening. We find that each volatility distribution has a power law tail. Using the detrended fluctuation analysis (DFA) method, we show long-term auto-correlations in these volatility time series. We also analyze return, the actual price changes of stocks, and find that the returns over the two sessions are often anti-correlated.
Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies
Huang, Jim C.; Meek, Christopher; Kadie, Carl; Heckerman, David
2011-01-01
Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science. PMID:21765897
Kirchner, James W.; Neal, Colin
2013-01-01
The chemical dynamics of lakes and streams affect their suitability as aquatic habitats and as water supplies for human needs. Because water quality is typically monitored only weekly or monthly, however, the higher-frequency dynamics of stream chemistry have remained largely invisible. To illuminate a wider spectrum of water quality dynamics, rainfall and streamflow were sampled in two headwater catchments at Plynlimon, Wales, at 7-h intervals for 1–2 y and weekly for over two decades, and were analyzed for 45 solutes spanning the periodic table from H+ to U. Here we show that in streamflow, all 45 of these solutes, including nutrients, trace elements, and toxic metals, exhibit fractal 1/fα scaling on time scales from hours to decades (α = 1.05 ± 0.15, mean ± SD). We show that this fractal scaling can arise through dispersion of random chemical inputs distributed across a catchment. These 1/f time series are non–self-averaging: monthly, yearly, or decadal averages are approximately as variable, one from the next, as individual measurements taken hours or days apart, defying naive statistical expectations. (By contrast, stream discharge itself is nonfractal, and self-averaging on time scales of months and longer.) In the solute time series, statistically significant trends arise much more frequently, on all time scales, than one would expect from conventional t statistics. However, these same trends are poor predictors of future trends—much poorer than one would expect from their calculated uncertainties. Our results illustrate how 1/f time series pose fundamental challenges to trend analysis and change detection in environmental systems. PMID:23842090
NASA Astrophysics Data System (ADS)
Kirchner, James W.; Neal, Colin
2013-07-01
The chemical dynamics of lakes and streams affect their suitability as aquatic habitats and as water supplies for human needs. Because water quality is typically monitored only weekly or monthly, however, the higher-frequency dynamics of stream chemistry have remained largely invisible. To illuminate a wider spectrum of water quality dynamics, rainfall and streamflow were sampled in two headwater catchments at Plynlimon, Wales, at 7-h intervals for 1-2 y and weekly for over two decades, and were analyzed for 45 solutes spanning the periodic table from H+ to U. Here we show that in streamflow, all 45 of these solutes, including nutrients, trace elements, and toxic metals, exhibit fractal 1/fα scaling on time scales from hours to decades (α = 1.05 ± 0.15, mean ± SD). We show that this fractal scaling can arise through dispersion of random chemical inputs distributed across a catchment. These 1/f time series are non-self-averaging: monthly, yearly, or decadal averages are approximately as variable, one from the next, as individual measurements taken hours or days apart, defying naive statistical expectations. (By contrast, stream discharge itself is nonfractal, and self-averaging on time scales of months and longer.) In the solute time series, statistically significant trends arise much more frequently, on all time scales, than one would expect from conventional t statistics. However, these same trends are poor predictors of future trends-much poorer than one would expect from their calculated uncertainties. Our results illustrate how 1/f time series pose fundamental challenges to trend analysis and change detection in environmental systems.
Ji, Hong-Fang; Zhuang, Qi-Shuai; Shen, Liang
2016-04-05
Our study investigated the shared genetic etiology underlying type 2 diabetes (T2D) and major depressive disorder (MDD) by analyzing large-scale genome wide association studies statistics. A total of 496 shared SNPs associated with both T2D and MDD were identified at p-value ≤ 1.0E-07. Functional enrichment analysis showed that the enriched pathways pertained to immune responses (Fc gamma R-mediated phagocytosis, T cell and B cell receptors signaling), cell signaling (MAPK, Wnt signaling), lipid metabolism, and cancer associated pathways. The findings will have potential implications for future interventional studies of the two diseases.
NASA Astrophysics Data System (ADS)
Shan, X.; Zhang, K.; Zhuang, Y.; Fu, R.; Hong, Y.
2017-12-01
Seasonal prediction of rainfall during the dry-to-wet transition season in austral spring (September-November) over southern Amazonia is central for improving planting crops and fire mitigation in that region. Previous studies have identified the key large-scale atmospheric dynamic and thermodynamics pre-conditions during the dry season (June-August) that influence the rainfall anomalies during the dry to wet transition season over Southern Amazonia. Based on these key pre-conditions during dry season, we have evaluated several statistical models and developed a Neural Network based statistical prediction system to predict rainfall during the dry to wet transition for Southern Amazonia (5-15°S, 50-70°W). Multivariate Empirical Orthogonal Function (EOF) Analysis is applied to the following four fields during JJA from the ECMWF Reanalysis (ERA-Interim) spanning from year 1979 to 2015: geopotential height at 200 hPa, surface relative humidity, convective inhibition energy (CIN) index and convective available potential energy (CAPE), to filter out noise and highlight the most coherent spatial and temporal variations. The first 10 EOF modes are retained for inputs to the statistical models, accounting for at least 70% of the total variance in the predictor fields. We have tested several linear and non-linear statistical methods. While the regularized Ridge Regression and Lasso Regression can generally capture the spatial pattern and magnitude of rainfall anomalies, we found that that Neural Network performs best with an accuracy greater than 80%, as expected from the non-linear dependence of the rainfall on the large-scale atmospheric thermodynamic conditions and circulation. Further tests of various prediction skill metrics and hindcasts also suggest this Neural Network prediction approach can significantly improve seasonal prediction skill than the dynamic predictions and regression based statistical predictions. Thus, this statistical prediction system could have shown potential to improve real-time seasonal rainfall predictions in the future.
NASA Astrophysics Data System (ADS)
Campbell, B. D.; Higgins, S. R.
2008-12-01
Developing a method for bridging the gap between macroscopic and microscopic measurements of reaction kinetics at the mineral-water interface has important implications in geological and chemical fields. Investigating these reactions on the nanometer scale with SPM is often limited by image analysis and data extraction due to the large quantity of data usually obtained in SPM experiments. Here we present a computer algorithm for automated analysis of mineral-water interface reactions. This algorithm automates the analysis of sequential SPM images by identifying the kinetically active surface sites (i.e., step edges), and by tracking the displacement of these sites from image to image. The step edge positions in each image are readily identified and tracked through time by a standard edge detection algorithm followed by statistical analysis on the Hough Transform of the edge-mapped image. By quantifying this displacement as a function of time, the rate of step edge displacement is determined. Furthermore, the total edge length, also determined from analysis of the Hough Transform, combined with the computed step speed, yields the surface area normalized rate of the reaction. The algorithm was applied to a study of the spiral growth of the calcite(104) surface from supersaturated solutions, yielding results almost 20 times faster than performing this analysis by hand, with results being statistically similar for both analysis methods. This advance in analysis of kinetic data from SPM images will facilitate the building of experimental databases on the microscopic kinetics of mineral-water interface reactions.
Statistical characterization of planar two-dimensional Rayleigh-Taylor mixing layers
NASA Astrophysics Data System (ADS)
Sendersky, Dmitry
2000-10-01
The statistical evolution of a planar, randomly perturbed fluid interface subject to Rayleigh-Taylor instability is explored through numerical simulation in two space dimensions. The data set, generated by the front-tracking code FronTier, is highly resolved and covers a large ensemble of initial perturbations, allowing a more refined analysis of closure issues pertinent to the stochastic modeling of chaotic fluid mixing. We closely approach a two-fold convergence of the mean two-phase flow: convergence of the numerical solution under computational mesh refinement, and statistical convergence under increasing ensemble size. Quantities that appear in the two-phase averaged Euler equations are computed directly and analyzed for numerical and statistical convergence. Bulk averages show a high degree of convergence, while interfacial averages are convergent only in the outer portions of the mixing zone, where there is a coherent array of bubble and spike tips. Comparison with the familiar bubble/spike penetration law h = alphaAgt 2 is complicated by the lack of scale invariance, inability to carry the simulations to late time, the increasing Mach numbers of the bubble/spike tips, and sensitivity to the method of data analysis. Finally, we use the simulation data to analyze some constitutive properties of the mixing process.
Large-scale dynamics associated with clustering of extratropical cyclones affecting Western Europe
NASA Astrophysics Data System (ADS)
Pinto, Joaquim G.; Gómara, Iñigo; Masato, Giacomo; Dacre, Helen F.; Woollings, Tim; Caballero, Rodrigo
2015-04-01
Some recent winters in Western Europe have been characterized by the occurrence of multiple extratropical cyclones following a similar path. The occurrence of such cyclone clusters leads to large socio-economic impacts due to damaging winds, storm surges, and floods. Recent studies have statistically characterized the clustering of extratropical cyclones over the North Atlantic and Europe and hypothesized potential physical mechanisms responsible for their formation. Here we analyze 4 months characterized by multiple cyclones over Western Europe (February 1990, January 1993, December 1999, and January 2007). The evolution of the eddy driven jet stream, Rossby wave-breaking, and upstream/downstream cyclone development are investigated to infer the role of the large-scale flow and to determine if clustered cyclones are related to each other. Results suggest that optimal conditions for the occurrence of cyclone clusters are provided by a recurrent extension of an intensified eddy driven jet toward Western Europe lasting at least 1 week. Multiple Rossby wave-breaking occurrences on both the poleward and equatorward flanks of the jet contribute to the development of these anomalous large-scale conditions. The analysis of the daily weather charts reveals that upstream cyclone development (secondary cyclogenesis, where new cyclones are generated on the trailing fronts of mature cyclones) is strongly related to cyclone clustering, with multiple cyclones developing on a single jet streak. The present analysis permits a deeper understanding of the physical reasons leading to the occurrence of cyclone families over the North Atlantic, enabling a better estimation of the associated cumulative risk over Europe.
Open star clusters and Galactic structure
NASA Astrophysics Data System (ADS)
Joshi, Yogesh C.
2018-04-01
In order to understand the Galactic structure, we perform a statistical analysis of the distribution of various cluster parameters based on an almost complete sample of Galactic open clusters yet available. The geometrical and physical characteristics of a large number of open clusters given in the MWSC catalogue are used to study the spatial distribution of clusters in the Galaxy and determine the scale height, solar offset, local mass density and distribution of reddening material in the solar neighbourhood. We also explored the mass-radius and mass-age relations in the Galactic open star clusters. We find that the estimated parameters of the Galactic disk are largely influenced by the choice of cluster sample.
NASA Astrophysics Data System (ADS)
Ivkin, N.; Liu, Z.; Yang, L. F.; Kumar, S. S.; Lemson, G.; Neyrinck, M.; Szalay, A. S.; Braverman, V.; Budavari, T.
2018-04-01
Cosmological N-body simulations play a vital role in studying models for the evolution of the Universe. To compare to observations and make a scientific inference, statistic analysis on large simulation datasets, e.g., finding halos, obtaining multi-point correlation functions, is crucial. However, traditional in-memory methods for these tasks do not scale to the datasets that are forbiddingly large in modern simulations. Our prior paper (Liu et al., 2015) proposes memory-efficient streaming algorithms that can find the largest halos in a simulation with up to 109 particles on a small server or desktop. However, this approach fails when directly scaling to larger datasets. This paper presents a robust streaming tool that leverages state-of-the-art techniques on GPU boosting, sampling, and parallel I/O, to significantly improve performance and scalability. Our rigorous analysis of the sketch parameters improves the previous results from finding the centers of the 103 largest halos (Liu et al., 2015) to ∼ 104 - 105, and reveals the trade-offs between memory, running time and number of halos. Our experiments show that our tool can scale to datasets with up to ∼ 1012 particles while using less than an hour of running time on a single GPU Nvidia GTX 1080.
Effects of multiple-scale driving on turbulence statistics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, Hyunju; Cho, Jungyeon, E-mail: hyunju527@gmail.com, E-mail: jcho@cnu.ac.kr
2014-01-01
Turbulence is ubiquitous in astrophysical fluids such as the interstellar medium and the intracluster medium. In turbulence studies, it is customary to assume that fluid is driven on a single scale. However, in astrophysical fluids, there can be many different driving mechanisms that act on different scales. If there are multiple energy-injection scales, the process of energy cascade and turbulence dynamo will be different compared with the case of the single energy-injection scale. In this work, we perform three-dimensional incompressible/compressible magnetohydrodynamic turbulence simulations. We drive turbulence in Fourier space in two wavenumber ranges, 2≤k≤√12 (large scale) and 15 ≲ kmore » ≲ 26 (small scale). We inject different amount of energy in each range by changing the amplitude of forcing in the range. We present the time evolution of the kinetic and magnetic energy densities and discuss the turbulence dynamo in the presence of energy injections at two scales. We show how kinetic, magnetic, and density spectra are affected by the two-scale energy injections and we discuss the observational implications. In the case ε {sub L} < ε {sub S}, where ε {sub L} and ε {sub S} are energy-injection rates at the large and small scales, respectively, our results show that even a tiny amount of large-scale energy injection can significantly change the properties of turbulence. On the other hand, when ε {sub L} ≳ ε {sub S}, the small-scale driving does not influence the turbulence statistics much unless ε {sub L} ∼ ε {sub S}.« less
Reuter, H.; Jopp, F.; Blanco-Moreno, J. M.; Damgaard, C.; Matsinos, Y.; DeAngelis, D.L.
2010-01-01
A continuing discussion in applied and theoretical ecology focuses on the relationship of different organisational levels and on how ecological systems interact across scales. We address principal approaches to cope with complex across-level issues in ecology by applying elements of hierarchy theory and the theory of complex adaptive systems. A top-down approach, often characterised by the use of statistical techniques, can be applied to analyse large-scale dynamics and identify constraints exerted on lower levels. Current developments are illustrated with examples from the analysis of within-community spatial patterns and large-scale vegetation patterns. A bottom-up approach allows one to elucidate how interactions of individuals shape dynamics at higher levels in a self-organisation process; e.g., population development and community composition. This may be facilitated by various modelling tools, which provide the distinction between focal levels and resulting properties. For instance, resilience in grassland communities has been analysed with a cellular automaton approach, and the driving forces in rodent population oscillations have been identified with an agent-based model. Both modelling tools illustrate the principles of analysing higher level processes by representing the interactions of basic components.The focus of most ecological investigations on either top-down or bottom-up approaches may not be appropriate, if strong cross-scale relationships predominate. Here, we propose an 'across-scale-approach', closely interweaving the inherent potentials of both approaches. This combination of analytical and synthesising approaches will enable ecologists to establish a more coherent access to cross-level interactions in ecological systems. ?? 2010 Gesellschaft f??r ??kologie.
High-resolution simulation of deep pencil beam surveys - analysis of quasi-periodicity
NASA Astrophysics Data System (ADS)
Weiss, A. G.; Buchert, T.
1993-07-01
We carry out pencil beam constructions in a high-resolution simulation of the large-scale structure of galaxies. The initial density fluctuations are taken to have a truncated power spectrum. All the models have {OMEGA} = 1. As an example we present the results for the case of "Hot-Dark-Matter" (HDM) initial conditions with scale-free n = 1 power index on large scales as a representative of models with sufficient large-scale power. We use an analytic approximation for particle trajectories of a self-gravitating dust continuum and apply a local dynamical biasing of volume elements to identify luminous matter in the model. Using this method, we are able to resolve formally a simulation box of 1200h^-1^ Mpc (e.g. for HDM initial conditions) down to the scale of galactic halos using 2160^3^ particles. We consider this as the minimal resolution necessary for a sensible simulation of deep pencil beam data. Pencil beam probes are taken for a given epoch using the parameters of observed beams. In particular, our analysis concentrates on the detection of a quasi-periodicity in the beam probes using several different methods. The resulting beam ensembles are analyzed statistically using number distributions, pair-count histograms, unnormalized pair-counts, power spectrum analysis and trial-period folding. Periodicities are classified according to their significance level in the power spectrum of the beams. The simulation is designed for application to parameter studies which prepare future observational projects. We find that a large percentage of the beams show quasi- periodicities with periods which cluster at a certain length scale. The periods found range between one and eight times the cutoff length in the initial fluctuation spectrum. At significance levels similar to those of the data of Broadhurst et al. (1990), we find about 15% of the pencil beams to show periodicities, about 30% of which are around the mean separation of rich clusters, while the distribution of scales reaches values of more than 200h^-1^ Mpc. The detection of periodicities larger than the typical void size must not be due to missing of "walls" (like the so called "Great Wall" seen in the CfA catalogue of galaxies), but can be due to different clustering properties of galaxies along the beams.
Topology of Neutral Hydrogen within the Small Magellanic Cloud
NASA Astrophysics Data System (ADS)
Chepurnov, A.; Gordon, J.; Lazarian, A.; Stanimirovic, S.
2008-12-01
In this paper, genus statistics have been applied to an H I column density map of the Small Magellanic Cloud in order to study its topology. To learn how topology changes with the scale of the system, we provide topology studies for column density maps at varying resolutions. To evaluate the statistical error of the genus, we randomly reassign the phases of the Fourier modes while keeping the amplitudes. We find that at the smallest scales studied (40 pc <= λ <= 80 pc), the genus shift is negative in all regions, implying a clump topology. At the larger scales (110 pc <= λ <= 250 pc), the topology shift is detected to be negative (a "meatball" topology) in four cases and positive (a "swiss cheese" topology) in two cases. In four regions, there is no statistically significant topology shift at large scales.
How big should a mammal be? A macroecological look at mammalian body size over space and time
Smith, Felisa A.; Lyons, S. Kathleen
2011-01-01
Macroecology was developed as a big picture statistical approach to the study of ecology and evolution. By focusing on broadly occurring patterns and processes operating at large spatial and temporal scales rather than on localized and/or fine-scaled details, macroecology aims to uncover general mechanisms operating at organism, population, and ecosystem levels of organization. Macroecological studies typically involve the statistical analysis of fundamental species-level traits, such as body size, area of geographical range, and average density and/or abundance. Here, we briefly review the history of macroecology and use the body size of mammals as a case study to highlight current developments in the field, including the increasing linkage with biogeography and other disciplines. Characterizing the factors underlying the spatial and temporal patterns of body size variation in mammals is a daunting task and moreover, one not readily amenable to traditional statistical analyses. Our results clearly illustrate remarkable regularities in the distribution and variation of mammalian body size across both geographical space and evolutionary time that are related to ecology and trophic dynamics and that would not be apparent without a broader perspective. PMID:21768152
Effects of the water level on the flow topology over the Bolund island
NASA Astrophysics Data System (ADS)
Cuerva-Tejero, A.; Yeow, T. S.; Gallego-Castillo, C.; Lopez-Garcia, O.
2014-06-01
We have analyzed the influence of the actual height of Bolund island above water level on different full-scale statistics of the velocity field over the peninsula. Our analysis is focused on the database of 10-minute statistics provided by Risø-DTU for the Bolund Blind Experiment. We have considered 10-minut.e periods with near-neutral atmospheric conditions, mean wind speed values in the interval [5,20] m/s, and westerly wind directions. As expected, statistics such as speed-up, normalized increase of turbulent kinetic energy and probability of recirculating flow show a large dependence on the emerged height of the island for the locations close to the escarpment. For the published ensemble mean values of speed-up and normalized increase of turbulent kinetic energy in these locations, we propose that some ammount of uncertainty could be explained as a deterministic dependence of the flow field statistics upon the actual height of the Bolund island above the sea level.
Statistical mechanics of soft-boson phase transitions
NASA Technical Reports Server (NTRS)
Gupta, Arun K.; Hill, Christopher T.; Holman, Richard; Kolb, Edward W.
1991-01-01
The existence of structure on large (100 Mpc) scales, and limits to anisotropies in the cosmic microwave background radiation (CMBR), have imperiled models of structure formation based solely upon the standard cold dark matter scenario. Novel scenarios, which may be compatible with large scale structure and small CMBR anisotropies, invoke nonlinear fluctuations in the density appearing after recombination, accomplished via the use of late time phase transitions involving ultralow mass scalar bosons. Herein, the statistical mechanics are studied of such phase transitions in several models involving naturally ultralow mass pseudo-Nambu-Goldstone bosons (pNGB's). These models can exhibit several interesting effects at high temperature, which is believed to be the most general possibilities for pNGB's.
NASA Astrophysics Data System (ADS)
Hoffman, A.; Forest, C. E.; Kemanian, A.
2016-12-01
A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.
NASA Astrophysics Data System (ADS)
Bundel, A.; Kulikova, I.; Kruglova, E.; Muravev, A.
2003-04-01
The scope of the study is to estimate the relationship between large-scale circulation regimes, various instability indices and global precipitation with different boundary conditions, considered as external forcing. The experiments were carried out in the ensemble-prediction framework of the dynamic-statistical monthly forecast scheme run in the Hydrometeorological Research Center of Russia every ten days. The extension to seasonal intervals makes it necessary to investigate the role of slowly changing boundary conditions among which the sea surface temperature (SST) may be defined as the most effective factor. Continuous integrations of the global spectral T41L15 model for the whole year 2000 (starting from January 1) were performed with the climatic SST and the Reynolds Archive SSTs. Monthly values of the SST were projected on the year days using spline interpolation technique. First, the global precipitation values in experiments were compared to the GPCP (Global Precipitation Climate Program) daily observation data. Although the global mean precipitation is underestimated by the model, some large-scale regional amounts correspond to the real ones (e.g. for Europe) fairly well. On the whole, however, anomaly phases failed to be reproduced. The precipitation averaged over the whole land revealed a greater sensitivity to the SSTs than that over the oceans. The wavelet analysis was applied to separate the low- and high-frequency signal of the SST influence on the large-scale circulation and precipitation. A derivative of the Wallace-Gutzler teleconnection index for the East-Atlantic oscillation was taken as the circulation characteristic. The daily oscillation index values and precipitation amounts averaged over Europe were decomposed using wavelet approach with different “mother wavelets” up to approximation level 3. It was demonstrated that an increase in the precipitation amount over Europe was associated with the zonal flow intensification over the Northern Atlantic when the real SSTs were used. Blocking structures in the circulation caused decreasing precipitation amounts. The wavelet approach gave a more distinctive discrimination in the modeled circulation and precipitation patterns versus different external forcing than a number of other statistical techniques. Several atmospheric instability indices (e.g. the Phillips like parameters, Richardson number etc) were additionally used in post-processing for a more detailed validation of the modeled large-scale and total precipitation amounts. It was shown that a reasonable variety of instability indices must be used for such validations and for precipitation output corrections. Their statistical stability may be substantiated only on the ensemble modeling basis. This work was performed with the financial support of the Russian Foundation for Basic Research (02-05-64655).
Multi-scale statistical analysis of coronal solar activity
Gamborino, Diana; del-Castillo-Negrete, Diego; Martinell, Julio J.
2016-07-08
Multi-filter images from the solar corona are used to obtain temperature maps that are analyzed using techniques based on proper orthogonal decomposition (POD) in order to extract dynamical and structural information at various scales. Exploring active regions before and after a solar flare and comparing them with quiet regions, we show that the multi-scale behavior presents distinct statistical properties for each case that can be used to characterize the level of activity in a region. Information about the nature of heat transport is also to be extracted from the analysis.
A High-Resolution WRF Tropical Channel Simulation Driven by a Global Reanalysis
NASA Astrophysics Data System (ADS)
Holland, G.; Leung, L.; Kuo, Y.; Hurrell, J.
2006-12-01
Since 2003, NCAR has invested in the development and application of Nested Regional Climate Model (NRCM) based on the Weather Research and Forecasting (WRF) model and the Community Climate System Model, as a key component of the Prediction Across Scales Initiative. A prototype tropical channel model has been developed to investigate scale interactions and the influence of tropical convection on large scale circulation and tropical modes. The model was developed based on the NCAR Weather Research and Forecasting Model (WRF), configured as a tropical channel between 30 ° S and 45 ° N, wide enough to allow teleconnection effects over the mid-latitudes. Compared to the limited area domain that WRF is typically applied over, the channel mode alleviates issues with reflection of tropical modes that could result from imposing east/west boundaries. Using a large amount of available computing resources on a supercomputer (Blue Vista) during its bedding in period, a simulation has been completed with the tropical channel applied at 36 km horizontal resolution for 5 years from 1996 to 2000, with large scale circulation provided by the NCEP/NCAR global reanalysis at the north/south boundaries. Shorter simulations of 2 years and 6 months have also been performed to include two-way nests at 12 km and 4 km resolution, respectively, over the western Pacific warm pool, to explicitly resolve tropical convection in the Maritime Continent. The simulations realistically captured the large-scale circulation including the trade winds over the tropical Pacific and Atlantic, the Australian and Asian monsoon circulation, and hurricane statistics. Preliminary analysis and evaluation of the simulations will be presented.
Basin-scale heterogeneity in Antarctic precipitation and its impact on surface mass variability
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fyke, Jeremy; Lenaerts, Jan T. M.; Wang, Hailong
Annually averaged precipitation in the form of snow, the dominant term of the Antarctic Ice Sheet surface mass balance, displays large spatial and temporal variability. Here we present an analysis of spatial patterns of regional Antarctic precipitation variability and their impact on integrated Antarctic surface mass balance variability simulated as part of a preindustrial 1800-year global, fully coupled Community Earth System Model simulation. Correlation and composite analyses based on this output allow for a robust exploration of Antarctic precipitation variability. We identify statistically significant relationships between precipitation patterns across Antarctica that are corroborated by climate reanalyses, regional modeling and icemore » core records. These patterns are driven by variability in large-scale atmospheric moisture transport, which itself is characterized by decadal- to centennial-scale oscillations around the long-term mean. We suggest that this heterogeneity in Antarctic precipitation variability has a dampening effect on overall Antarctic surface mass balance variability, with implications for regulation of Antarctic-sourced sea level variability, detection of an emergent anthropogenic signal in Antarctic mass trends and identification of Antarctic mass loss accelerations.« less
A functional model for characterizing long-distance movement behaviour
Buderman, Frances E.; Hooten, Mevin B.; Ivan, Jacob S.; Shenk, Tanya M.
2016-01-01
Advancements in wildlife telemetry techniques have made it possible to collect large data sets of highly accurate animal locations at a fine temporal resolution. These data sets have prompted the development of a number of statistical methodologies for modelling animal movement.Telemetry data sets are often collected for purposes other than fine-scale movement analysis. These data sets may differ substantially from those that are collected with technologies suitable for fine-scale movement modelling and may consist of locations that are irregular in time, are temporally coarse or have large measurement error. These data sets are time-consuming and costly to collect but may still provide valuable information about movement behaviour.We developed a Bayesian movement model that accounts for error from multiple data sources as well as movement behaviour at different temporal scales. The Bayesian framework allows us to calculate derived quantities that describe temporally varying movement behaviour, such as residence time, speed and persistence in direction. The model is flexible, easy to implement and computationally efficient.We apply this model to data from Colorado Canada lynx (Lynx canadensis) and use derived quantities to identify changes in movement behaviour.
Basin-scale heterogeneity in Antarctic precipitation and its impact on surface mass variability
Fyke, Jeremy; Lenaerts, Jan T. M.; Wang, Hailong
2017-11-15
Annually averaged precipitation in the form of snow, the dominant term of the Antarctic Ice Sheet surface mass balance, displays large spatial and temporal variability. Here we present an analysis of spatial patterns of regional Antarctic precipitation variability and their impact on integrated Antarctic surface mass balance variability simulated as part of a preindustrial 1800-year global, fully coupled Community Earth System Model simulation. Correlation and composite analyses based on this output allow for a robust exploration of Antarctic precipitation variability. We identify statistically significant relationships between precipitation patterns across Antarctica that are corroborated by climate reanalyses, regional modeling and icemore » core records. These patterns are driven by variability in large-scale atmospheric moisture transport, which itself is characterized by decadal- to centennial-scale oscillations around the long-term mean. We suggest that this heterogeneity in Antarctic precipitation variability has a dampening effect on overall Antarctic surface mass balance variability, with implications for regulation of Antarctic-sourced sea level variability, detection of an emergent anthropogenic signal in Antarctic mass trends and identification of Antarctic mass loss accelerations.« less
Scaling Laws in Canopy Flows: A Wind-Tunnel Analysis
NASA Astrophysics Data System (ADS)
Segalini, Antonio; Fransson, Jens H. M.; Alfredsson, P. Henrik
2013-08-01
An analysis of velocity statistics and spectra measured above a wind-tunnel forest model is reported. Several measurement stations downstream of the forest edge have been investigated and it is observed that, while the mean velocity profile adjusts quickly to the new canopy boundary condition, the turbulence lags behind and shows a continuous penetration towards the free stream along the canopy model. The statistical profiles illustrate this growth and do not collapse when plotted as a function of the vertical coordinate. However, when the statistics are plotted as function of the local mean velocity (normalized with a characteristic velocity scale), they do collapse, independently of the streamwise position and freestream velocity. A new scaling for the spectra of all three velocity components is proposed based on the velocity variance and integral time scale. This normalization improves the collapse of the spectra compared to existing scalings adopted in atmospheric measurements, and allows the determination of a universal function that provides the velocity spectrum. Furthermore, a comparison of the proposed scaling laws for two different canopy densities is shown, demonstrating that the vertical velocity variance is the most sensible statistical quantity to the characteristics of the canopy roughness.
Ghaffari, Mahsa; Tangen, Kevin; Alaraj, Ali; Du, Xinjian; Charbel, Fady T; Linninger, Andreas A
2017-12-01
In this paper, we present a novel technique for automatic parametric mesh generation of subject-specific cerebral arterial trees. This technique generates high-quality and anatomically accurate computational meshes for fast blood flow simulations extending the scope of 3D vascular modeling to a large portion of cerebral arterial trees. For this purpose, a parametric meshing procedure was developed to automatically decompose the vascular skeleton, extract geometric features and generate hexahedral meshes using a body-fitted coordinate system that optimally follows the vascular network topology. To validate the anatomical accuracy of the reconstructed vasculature, we performed statistical analysis to quantify the alignment between parametric meshes and raw vascular images using receiver operating characteristic curve. Geometric accuracy evaluation showed an agreement with area under the curves value of 0.87 between the constructed mesh and raw MRA data sets. Parametric meshing yielded on-average, 36.6% and 21.7% orthogonal and equiangular skew quality improvement over the unstructured tetrahedral meshes. The parametric meshing and processing pipeline constitutes an automated technique to reconstruct and simulate blood flow throughout a large portion of the cerebral arterial tree down to the level of pial vessels. This study is the first step towards fast large-scale subject-specific hemodynamic analysis for clinical applications. Copyright © 2017 Elsevier Ltd. All rights reserved.
An accurate test for homogeneity of odds ratios based on Cochran's Q-statistic.
Kulinskaya, Elena; Dollinger, Michael B
2015-06-10
A frequently used statistic for testing homogeneity in a meta-analysis of K independent studies is Cochran's Q. For a standard test of homogeneity the Q statistic is referred to a chi-square distribution with K-1 degrees of freedom. For the situation in which the effects of the studies are logarithms of odds ratios, the chi-square distribution is much too conservative for moderate size studies, although it may be asymptotically correct as the individual studies become large. Using a mixture of theoretical results and simulations, we provide formulas to estimate the shape and scale parameters of a gamma distribution to fit the distribution of Q. Simulation studies show that the gamma distribution is a good approximation to the distribution for Q. Use of the gamma distribution instead of the chi-square distribution for Q should eliminate inaccurate inferences in assessing homogeneity in a meta-analysis. (A computer program for implementing this test is provided.) This hypothesis test is competitive with the Breslow-Day test both in accuracy of level and in power.
NASA Astrophysics Data System (ADS)
Glover, David M.; Doney, Scott C.; Oestreich, William K.; Tullo, Alisdair W.
2018-01-01
Mesoscale (10-300 km, weeks to months) physical variability strongly modulates the structure and dynamics of planktonic marine ecosystems via both turbulent advection and environmental impacts upon biological rates. Using structure function analysis (geostatistics), we quantify the mesoscale biological signals within global 13 year SeaWiFS (1998-2010) and 8 year MODIS/Aqua (2003-2010) chlorophyll a ocean color data (Level-3, 9 km resolution). We present geographical distributions, seasonality, and interannual variability of key geostatistical parameters: unresolved variability or noise, resolved variability, and spatial range. Resolved variability is nearly identical for both instruments, indicating that geostatistical techniques isolate a robust measure of biophysical mesoscale variability largely independent of measurement platform. In contrast, unresolved variability in MODIS/Aqua is substantially lower than in SeaWiFS, especially in oligotrophic waters where previous analysis identified a problem for the SeaWiFS instrument likely due to sensor noise characteristics. Both records exhibit a statistically significant relationship between resolved mesoscale variability and the low-pass filtered chlorophyll field horizontal gradient magnitude, consistent with physical stirring acting on large-scale gradient as an important factor supporting observed mesoscale variability. Comparable horizontal length scales for variability are found from tracer-based scaling arguments and geostatistical decorrelation. Regional variations between these length scales may reflect scale dependence of biological mechanisms that also create variability directly at the mesoscale, for example, enhanced net phytoplankton growth in coastal and frontal upwelling and convective mixing regions. Global estimates of mesoscale biophysical variability provide an improved basis for evaluating higher resolution, coupled ecosystem-ocean general circulation models, and data assimilation.
Hierarchical modeling and robust synthesis for the preliminary design of large scale complex systems
NASA Astrophysics Data System (ADS)
Koch, Patrick Nathan
Large-scale complex systems are characterized by multiple interacting subsystems and the analysis of multiple disciplines. The design and development of such systems inevitably requires the resolution of multiple conflicting objectives. The size of complex systems, however, prohibits the development of comprehensive system models, and thus these systems must be partitioned into their constituent parts. Because simultaneous solution of individual subsystem models is often not manageable iteration is inevitable and often excessive. In this dissertation these issues are addressed through the development of a method for hierarchical robust preliminary design exploration to facilitate concurrent system and subsystem design exploration, for the concurrent generation of robust system and subsystem specifications for the preliminary design of multi-level, multi-objective, large-scale complex systems. This method is developed through the integration and expansion of current design techniques: (1) Hierarchical partitioning and modeling techniques for partitioning large-scale complex systems into more tractable parts, and allowing integration of subproblems for system synthesis, (2) Statistical experimentation and approximation techniques for increasing both the efficiency and the comprehensiveness of preliminary design exploration, and (3) Noise modeling techniques for implementing robust preliminary design when approximate models are employed. The method developed and associated approaches are illustrated through their application to the preliminary design of a commercial turbofan turbine propulsion system; the turbofan system-level problem is partitioned into engine cycle and configuration design and a compressor module is integrated for more detailed subsystem-level design exploration, improving system evaluation.
Long, Zhiliang; Duan, Xujun; Xie, Bing; Du, Handan; Li, Rong; Xu, Qiang; Wei, Luqing; Zhang, Shao-xiang; Wu, Yi; Gao, Qing; Chen, Huafu
2013-09-25
Post-traumatic stress disorder (PTSD) is characterized by dysfunction of several discrete brain regions such as medial prefrontal gyrus with hypoactivation and amygdala with hyperactivation. However, alterations of large-scale whole brain topological organization of structural networks remain unclear. Seventeen patients with PTSD in motor vehicle accident survivors and 15 normal controls were enrolled in our study. Large-scale structural connectivity network (SCN) was constructed using diffusion tensor tractography, followed by thresholding the mean factional anisotropy matrix of 90 brain regions. Graph theory analysis was then employed to investigate their aberrant topological properties. Both patient and control group showed small-world topology in their SCNs. However, patients with PTSD exhibited abnormal global properties characterized by significantly decreased characteristic shortest path length and normalized characteristic shortest path length. Furthermore, the patient group showed enhanced nodal centralities predominately in salience network including bilateral anterior cingulate and pallidum, and hippocampus/parahippocamus gyrus, and decreased nodal centralities mainly in medial orbital part of superior frontal gyrus. The main limitation of this study is the small sample of PTSD patients, which may lead to decrease the statistic power. Consequently, this study should be considered an exploratory analysis. These results are consistent with the notion that PTSD can be understood by investigating the dysfunction of large-scale, spatially distributed neural networks, and also provide structural evidences for further exploration of neurocircuitry models in PTSD. © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Shih, Hong-Yan; Goldenfeld, Nigel
Experiments on transitional turbulence in pipe flow seem to show that turbulence is a transient metastable state since the measured mean lifetime of turbulence puffs does not diverge asymptotically at a critical Reynolds number. Yet measurements reveal that the lifetime scales with Reynolds number in a super-exponential way reminiscent of extreme value statistics, and simulations and experiments in Couette and channel flow exhibit directed percolation type scaling phenomena near a well-defined transition. This universality class arises from the interplay between small-scale turbulence and a large-scale collective zonal flow, which exhibit predator-prey behavior. Why is asymptotically divergent behavior not observed? Using directed percolation and a stochastic individual level model of predator-prey dynamics related to transitional turbulence, we investigate the relation between extreme value statistics and power law critical behavior, and show that the paradox is resolved by carefully defining what is measured in the experiments. We theoretically derive the super-exponential scaling law, and using finite-size scaling, show how the same data can give both super-exponential behavior and power-law critical scaling.
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O. (Editor); Housner, Jerrold M. (Editor)
1993-01-01
Computing speed is leaping forward by several orders of magnitude each decade. Engineers and scientists gathered at a NASA Langley symposium to discuss these exciting trends as they apply to parallel computational methods for large-scale structural analysis and design. Among the topics discussed were: large-scale static analysis; dynamic, transient, and thermal analysis; domain decomposition (substructuring); and nonlinear and numerical methods.
Basu, Sumanta; Duren, William; Evans, Charles R; Burant, Charles F; Michailidis, George; Karnovsky, Alla
2017-05-15
Recent technological advances in mass spectrometry, development of richer mass spectral libraries and data processing tools have enabled large scale metabolic profiling. Biological interpretation of metabolomics studies heavily relies on knowledge-based tools that contain information about metabolic pathways. Incomplete coverage of different areas of metabolism and lack of information about non-canonical connections between metabolites limits the scope of applications of such tools. Furthermore, the presence of a large number of unknown features, which cannot be readily identified, but nonetheless can represent bona fide compounds, also considerably complicates biological interpretation of the data. Leveraging recent developments in the statistical analysis of high-dimensional data, we developed a new Debiased Sparse Partial Correlation algorithm (DSPC) for estimating partial correlation networks and implemented it as a Java-based CorrelationCalculator program. We also introduce a new version of our previously developed tool Metscape that enables building and visualization of correlation networks. We demonstrate the utility of these tools by constructing biologically relevant networks and in aiding identification of unknown compounds. http://metscape.med.umich.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
The Math Problem: Advertising Students' Attitudes toward Statistics
ERIC Educational Resources Information Center
Fullerton, Jami A.; Kendrick, Alice
2013-01-01
This study used the Students' Attitudes toward Statistics Scale (STATS) to measure attitude toward statistics among a national sample of advertising students. A factor analysis revealed four underlying factors make up the attitude toward statistics construct--"Interest & Future Applicability," "Confidence," "Statistical Tools," and "Initiative."…
Ariew, André
2007-03-01
Charles Darwin, James Clerk Maxwell, and Francis Galton were all aware, by various means, of Aldolphe Quetelet's pioneering work in statistics. Darwin, Maxwell, and Galton all had reason to be interested in Quetelet's work: they were all working on some instance of how large-scale regularities emerge from individual events that vary from one another; all were rejecting the divine interventionistic theories of their contemporaries; and Quetelet's techniques provided them with a way forward. Maxwell and Galton all explicitly endorse Quetelet's techniques in their work; Darwin does not incorporate any of the statistical ideas of Quetelet, although natural selection post-twentieth century synthesis has. Why not Darwin? My answer is that by the time Darwin encountered Malthus's law of excess reproduction he had all he needed to answer about large scale regularities in extinctions, speciation, and adaptation. He didn't need Quetelet.
Chou, C P; Bentler, P M; Satorra, A
1991-11-01
Research studying robustness of maximum likelihood (ML) statistics in covariance structure analysis has concluded that test statistics and standard errors are biased under severe non-normality. An estimation procedure known as asymptotic distribution free (ADF), making no distributional assumption, has been suggested to avoid these biases. Corrections to the normal theory statistics to yield more adequate performance have also been proposed. This study compares the performance of a scaled test statistic and robust standard errors for two models under several non-normal conditions and also compares these with the results from ML and ADF methods. Both ML and ADF test statistics performed rather well in one model and considerably worse in the other. In general, the scaled test statistic seemed to behave better than the ML test statistic and the ADF statistic performed the worst. The robust and ADF standard errors yielded more appropriate estimates of sampling variability than the ML standard errors, which were usually downward biased, in both models under most of the non-normal conditions. ML test statistics and standard errors were found to be quite robust to the violation of the normality assumption when data had either symmetric and platykurtic distributions, or non-symmetric and zero kurtotic distributions.
Water Masses in the Eastern Mediterranean Sea: An Analysis of Measured Isotopic Oxygen
NASA Astrophysics Data System (ADS)
de Ruggiero, Paola; Zanchettin, Davide; Bensi, Manuel; Hainbucher, Dagmar; Stenni, Barbara; Pierini, Stefano; Rubino, Angelo
2018-04-01
We investigate aspects of the water mass structure of the Adriatic and Ionian basins (Eastern Mediterranean Sea) and their interdecadal variability through statistical analyses focused on δ18Ο measurements carried out in 1985, 1990, and 2011. In particular, the more recent δ18Ο measurements extend throughout the entire water column and constitute, to the best of our knowledge, the largest synoptic dataset encompassing different sub-basins of the Mediterranean Sea. We study the statistical linkages between temperature, salinity, dissolved oxygen and δ18Ο. We find that δ18Ο is largely independent from the other parameters, and it can be used to trace major water masses that are typically found in the basins, including the Adriatic Dense Water, the Levantine Intermediate Water, and the Cretan Intermediate and Dense Waters. Finally, we explore the possibility of using δ18Ο concentration as a proxy for dominant modes of large-scale oceanic variability in the Mediterranean Sea.
NASA Astrophysics Data System (ADS)
Tega, Naoki; Miki, Hiroshi; Mine, Toshiyuki; Ohmori, Kenji; Yamada, Keisaku
2014-03-01
It is demonstrated from a statistical perspective that the generation of random telegraph noise (RTN) changes before and after the application of negative-bias temperature instability (NBTI) stress. The NBTI stress generates a large number of permanent interface traps and, at the same time, a large number of RTN traps causing temporary RTN and one-time RTN. The interface trap and the RTN trap show different features in the recovery process. That is, a re-passivation of interface states is the minor cause of the recovery after the NBTI stress, and in contrast, rapid disappearance of the temporary RTN and the one-time RTN is the main cause of the recovery. The RTN traps are less likely to become permanent. This two-type trap, namely, the interface trap and RTN trap, model simply explains NBTI degradation and recovery in scaled p-channel metal-oxide-semiconductor field-effect transistors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kaurov, Alexander A., E-mail: kaurov@uchicago.edu
The methods for studying the epoch of cosmic reionization vary from full radiative transfer simulations to purely analytical models. While numerical approaches are computationally expensive and are not suitable for generating many mock catalogs, analytical methods are based on assumptions and approximations. We explore the interconnection between both methods. First, we ask how the analytical framework of excursion set formalism can be used for statistical analysis of numerical simulations and visual representation of the morphology of ionization fronts. Second, we explore the methods of training the analytical model on a given numerical simulation. We present a new code which emergedmore » from this study. Its main application is to match the analytical model with a numerical simulation. Then, it allows one to generate mock reionization catalogs with volumes exceeding the original simulation quickly and computationally inexpensively, meanwhile reproducing large-scale statistical properties. These mock catalogs are particularly useful for cosmic microwave background polarization and 21 cm experiments, where large volumes are required to simulate the observed signal.« less
NASA Astrophysics Data System (ADS)
Bergant, Klemen; Kajfež-Bogataj, Lučka; Črepinšek, Zalika
2002-02-01
Phenological observations are a valuable source of information for investigating the relationship between climate variation and plant development. Potential climate change in the future will shift the occurrence of phenological phases. Information about future climate conditions is needed in order to estimate this shift. General circulation models (GCM) provide the best information about future climate change. They are able to simulate reliably the most important mean features on a large scale, but they fail on a regional scale because of their low spatial resolution. A common approach to bridging the scale gap is statistical downscaling, which was used to relate the beginning of flowering of Taraxacum officinale in Slovenia with the monthly mean near-surface air temperature for January, February and March in Central Europe. Statistical models were developed and tested with NCAR/NCEP Reanalysis predictor data and EARS predictand data for the period 1960-1999. Prior to developing statistical models, empirical orthogonal function (EOF) analysis was employed on the predictor data. Multiple linear regression was used to relate the beginning of flowering with expansion coefficients of the first three EOF for the Janauary, Febrauary and March air temperatures, and a strong correlation was found between them. Developed statistical models were employed on the results of two GCM (HadCM3 and ECHAM4/OPYC3) to estimate the potential shifts in the beginning of flowering for the periods 1990-2019 and 2020-2049 in comparison with the period 1960-1989. The HadCM3 model predicts, on average, 4 days earlier occurrence and ECHAM4/OPYC3 5 days earlier occurrence of flowering in the period 1990-2019. The analogous results for the period 2020-2049 are a 10- and 11-day earlier occurrence.
Seminar presentation on the economic evaluation of the space shuttle system
NASA Technical Reports Server (NTRS)
1973-01-01
The proceedings of a seminar on the economic aspects of the space shuttle system are presented. Emphasis was placed on the problems of economic analysis of large scale public investments, the state of the art of cost estimation, the statistical data base for estimating costs of new technological systems, and the role of the main economic parameters affecting the results of the analyses. An explanation of the system components of a space program and the present choice of launch vehicles, spacecraft, and instruments was conducted.
Phage-bacteria infection networks: From nestedness to modularity
NASA Astrophysics Data System (ADS)
Flores, Cesar O.; Valverde, Sergi; Weitz, Joshua S.
2013-03-01
Bacteriophages (viruses that infect bacteria) are the most abundant biological life-forms on Earth. However, very little is known regarding the structure of phage-bacteria infections. In a recent study we re-evaluated 38 prior studies and demonstrated that phage-bacteria infection networks tend to be statistically nested in small scale communities (Flores et al 2011). Nestedness is consistent with a hierarchy of infection and resistance within phages and bacteria, respectively. However, we predicted that at large scales, phage-bacteria infection networks should be typified by a modular structure. We evaluate and confirm this hypothesis using the most extensive study of phage-bacteria infections (Moebus and Nattkemper 1981). In this study, cross-infections were evaluated between 215 marine phages and 286 marine bacteria. We develop a novel multi-scale network analysis and find that the Moebus and Nattkemper (1981) study, is highly modular (at the whole network scale), yet also exhibits nestedness and modularity at the within-module scale. We examine the role of geography in driving these modular patterns and find evidence that phage-bacteria interactions can exhibit strong similarity despite large distances between sites. CFG acknowledges the support of CONACyT Foundation. JSW holds a Career Award at the Scientific Interface from the Burroughs Wellcome Fund and acknowledges the support of the James S. McDonnell Foundation
Meteorological Contribution to Variability in Particulate Matter Concentrations
NASA Astrophysics Data System (ADS)
Woods, H. L.; Spak, S. N.; Holloway, T.
2006-12-01
Local concentrations of fine particulate matter (PM) are driven by a number of processes, including emissions of aerosols and gaseous precursors, atmospheric chemistry, and meteorology at local, regional, and global scales. We apply statistical downscaling methods, typically used for regional climate analysis, to estimate the contribution of regional scale meteorology to PM mass concentration variability at a range of sites in the Upper Midwestern U.S. Multiple years of daily PM10 and PM2.5 data, reported by the U.S. Environmental Protection Agency (EPA), are correlated with large-scale meteorology over the region from the National Centers for Environmental Prediction (NCEP) reanalysis data. We use two statistical downscaling methods (multiple linear regression, MLR, and analog) to identify which processes have the greatest impact on aerosol concentration variability. Empirical Orthogonal Functions of the NCEP meteorological data are correlated with PM timeseries at measurement sites. We examine which meteorological variables exert the greatest influence on PM variability, and which sites exhibit the greatest response to regional meteorology. To evaluate model performance, measurement data are withheld for limited periods, and compared with model results. Preliminary results suggest that regional meteorological processes account over 50% of aerosol concentration variability at study sites.
ERIC Educational Resources Information Center
Andrich, David; Marais, Ida; Humphry, Stephen Mark
2016-01-01
Recent research has shown how the statistical bias in Rasch model difficulty estimates induced by guessing in multiple-choice items can be eliminated. Using vertical scaling of a high-profile national reading test, it is shown that the dominant effect of removing such bias is a nonlinear change in the unit of scale across the continuum. The…
Effect of equilibration on primitive path analyses of entangled polymers.
Hoy, Robert S; Robbins, Mark O
2005-12-01
We use recently developed primitive path analysis (PPA) methods to study the effect of equilibration on entanglement density in model polymeric systems. Values of Ne for two commonly used equilibration methods differ by a factor of 2-4 even though the methods produce similar large-scale chain statistics. We find that local chain stretching in poorly equilibrated samples increases entanglement density. The evolution of Ne with time shows that many entanglements are lost through fast processes such as chain retraction as the local stretching relaxes. Quenching a melt state into a glass has little effect on Ne. Equilibration-dependent differences in short-scale structure affect the craze extension ratio much less than expected from the differences in PPA values of Ne.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Makarashvili, Vakhtang; Merzari, Elia; Obabko, Aleksandr
We analyze the potential performance benefits of estimating expected quantities in large eddy simulations of turbulent flows using true ensembles rather than ergodic time averaging. Multiple realizations of the same flow are simulated in parallel, using slightly perturbed initial conditions to create unique instantaneous evolutions of the flow field. Each realization is then used to calculate statistical quantities. Provided each instance is sufficiently de-correlated, this approach potentially allows considerable reduction in the time to solution beyond the strong scaling limit for a given accuracy. This study focuses on the theory and implementation of the methodology in Nek5000, a massively parallelmore » open-source spectral element code.« less
Makarashvili, Vakhtang; Merzari, Elia; Obabko, Aleksandr; ...
2017-06-07
We analyze the potential performance benefits of estimating expected quantities in large eddy simulations of turbulent flows using true ensembles rather than ergodic time averaging. Multiple realizations of the same flow are simulated in parallel, using slightly perturbed initial conditions to create unique instantaneous evolutions of the flow field. Each realization is then used to calculate statistical quantities. Provided each instance is sufficiently de-correlated, this approach potentially allows considerable reduction in the time to solution beyond the strong scaling limit for a given accuracy. This study focuses on the theory and implementation of the methodology in Nek5000, a massively parallelmore » open-source spectral element code.« less
Comparative analysis of marine ecosystems: workshop on predator-prey interactions.
Bailey, Kevin M; Ciannelli, Lorenzo; Hunsicker, Mary; Rindorf, Anna; Neuenfeldt, Stefan; Möllmann, Christian; Guichard, Frederic; Huse, Geir
2010-10-23
Climate and human influences on marine ecosystems are largely manifested by changes in predator-prey interactions. It follows that ecosystem-based management of the world's oceans requires a better understanding of food web relationships. An international workshop on predator-prey interactions in marine ecosystems was held at the Oregon State University, Corvallis, OR, USA on 16-18 March 2010. The meeting brought together scientists from diverse fields of expertise including theoretical ecology, animal behaviour, fish and seabird ecology, statistics, fisheries science and ecosystem modelling. The goals of the workshop were to critically examine the methods of scaling-up predator-prey interactions from local observations to systems, the role of shifting ecological processes with scale changes, and the complexity and organizational structure in trophic interactions.
OH maser proper motions in Cepheus A
NASA Astrophysics Data System (ADS)
Migenes, V.; Cohen, R. J.; Brebner, G. C.
1992-02-01
MERLIN measurements made between 1982 and 1989 reveal proper motions of OH masers in the source Cepheus A. The proper motions are typically a few milliarcsec per year, and are mainly directed away from the central H II regions. Statistical analysis of the data suggests an expansion time-scale of some 300 yr. The distance of the source implied by the proper motions is 320+140/-80 pc, assuming that the expansion is isotropic. The proper motions can be reconciled with the larger distance of 730 pc which is generally accepted, provided that the masers are moving at large angles to the line of sight. The expansion time-scale agrees with that of the magnetic field decay recently reported by Cohen, et al. (1990).
Non-linear matter power spectrum covariance matrix errors and cosmological parameter uncertainties
NASA Astrophysics Data System (ADS)
Blot, L.; Corasaniti, P. S.; Amendola, L.; Kitching, T. D.
2016-06-01
The covariance of the matter power spectrum is a key element of the analysis of galaxy clustering data. Independent realizations of observational measurements can be used to sample the covariance, nevertheless statistical sampling errors will propagate into the cosmological parameter inference potentially limiting the capabilities of the upcoming generation of galaxy surveys. The impact of these errors as function of the number of realizations has been previously evaluated for Gaussian distributed data. However, non-linearities in the late-time clustering of matter cause departures from Gaussian statistics. Here, we address the impact of non-Gaussian errors on the sample covariance and precision matrix errors using a large ensemble of N-body simulations. In the range of modes where finite volume effects are negligible (0.1 ≲ k [h Mpc-1] ≲ 1.2), we find deviations of the variance of the sample covariance with respect to Gaussian predictions above ˜10 per cent at k > 0.3 h Mpc-1. Over the entire range these reduce to about ˜5 per cent for the precision matrix. Finally, we perform a Fisher analysis to estimate the effect of covariance errors on the cosmological parameter constraints. In particular, assuming Euclid-like survey characteristics we find that a number of independent realizations larger than 5000 is necessary to reduce the contribution of sampling errors to the cosmological parameter uncertainties at subpercent level. We also show that restricting the analysis to large scales k ≲ 0.2 h Mpc-1 results in a considerable loss in constraining power, while using the linear covariance to include smaller scales leads to an underestimation of the errors on the cosmological parameters.
Visually Exploring Transportation Schedules.
Palomo, Cesar; Guo, Zhan; Silva, Cláudio T; Freire, Juliana
2016-01-01
Public transportation schedules are designed by agencies to optimize service quality under multiple constraints. However, real service usually deviates from the plan. Therefore, transportation analysts need to identify, compare and explain both eventual and systemic performance issues that must be addressed so that better timetables can be created. The purely statistical tools commonly used by analysts pose many difficulties due to the large number of attributes at trip- and station-level for planned and real service. Also challenging is the need for models at multiple scales to search for patterns at different times and stations, since analysts do not know exactly where or when relevant patterns might emerge and need to compute statistical summaries for multiple attributes at different granularities. To aid in this analysis, we worked in close collaboration with a transportation expert to design TR-EX, a visual exploration tool developed to identify, inspect and compare spatio-temporal patterns for planned and real transportation service. TR-EX combines two new visual encodings inspired by Marey's Train Schedule: Trips Explorer for trip-level analysis of frequency, deviation and speed; and Stops Explorer for station-level study of delay, wait time, reliability and performance deficiencies such as bunching. To tackle overplotting and to provide a robust representation for a large numbers of trips and stops at multiple scales, the system supports variable kernel bandwidths to achieve the level of detail required by users for different tasks. We justify our design decisions based on specific analysis needs of transportation analysts. We provide anecdotal evidence of the efficacy of TR-EX through a series of case studies that explore NYC subway service, which illustrate how TR-EX can be used to confirm hypotheses and derive new insights through visual exploration.
Large-eddy simulations of turbulent flow for grid-to-rod fretting in nuclear reactors
Bakosi, J.; Christon, M. A.; Lowrie, R. B.; ...
2013-07-12
The grid-to-rod fretting (GTRF) problem in pressurized water reactors is a flow-induced vibration problem that results in wear and failure of the fuel rods in nuclear assemblies. In order to understand the fluid dynamics of GTRF and to build an archival database of turbulence statistics for various configurations, implicit large-eddy simulations of time-dependent single-phase turbulent flow have been performed in 3 × 3 and 5 × 5 rod bundles with a single grid spacer. To assess the computational mesh and resolution requirements, a method for quantitative assessment of unstructured meshes with no-slip walls is described. The calculations have been carriedmore » out using Hydra-TH, a thermal-hydraulics code developed at Los Alamos for the Consortium for Advanced Simulation of Light water reactors, a United States Department of Energy Innovation Hub. Hydra-TH uses a second-order implicit incremental projection method to solve the singlephase incompressible Navier-Stokes equations. The simulations explicitly resolve the large scale motions of the turbulent flow field using first principles and rely on a monotonicity-preserving numerical technique to represent the unresolved scales. Each series of simulations for the 3 × 3 and 5 × 5 rod-bundle geometries is an analysis of the flow field statistics combined with a mesh-refinement study and validation with available experimental data. Our primary focus is the time history and statistics of the forces loading the fuel rods. These hydrodynamic forces are believed to be the key player resulting in rod vibration and GTRF wear, one of the leading causes for leaking nuclear fuel which costs power utilities millions of dollars in preventive measures. As a result, we demonstrate that implicit large-eddy simulation of rod-bundle flows is a viable way to calculate the excitation forces for the GTRF problem.« less
Automatic initialization and quality control of large-scale cardiac MRI segmentations.
Albà, Xènia; Lekadir, Karim; Pereañez, Marco; Medrano-Gracia, Pau; Young, Alistair A; Frangi, Alejandro F
2018-01-01
Continuous advances in imaging technologies enable ever more comprehensive phenotyping of human anatomy and physiology. Concomitant reduction of imaging costs has resulted in widespread use of imaging in large clinical trials and population imaging studies. Magnetic Resonance Imaging (MRI), in particular, offers one-stop-shop multidimensional biomarkers of cardiovascular physiology and pathology. A wide range of analysis methods offer sophisticated cardiac image assessment and quantification for clinical and research studies. However, most methods have only been evaluated on relatively small databases often not accessible for open and fair benchmarking. Consequently, published performance indices are not directly comparable across studies and their translation and scalability to large clinical trials or population imaging cohorts is uncertain. Most existing techniques still rely on considerable manual intervention for the initialization and quality control of the segmentation process, becoming prohibitive when dealing with thousands of images. The contributions of this paper are three-fold. First, we propose a fully automatic method for initializing cardiac MRI segmentation, by using image features and random forests regression to predict an initial position of the heart and key anatomical landmarks in an MRI volume. In processing a full imaging database, the technique predicts the optimal corrective displacements and positions in relation to the initial rough intersections of the long and short axis images. Second, we introduce for the first time a quality control measure capable of identifying incorrect cardiac segmentations with no visual assessment. The method uses statistical, pattern and fractal descriptors in a random forest classifier to detect failures to be corrected or removed from subsequent statistical analysis. Finally, we validate these new techniques within a full pipeline for cardiac segmentation applicable to large-scale cardiac MRI databases. The results obtained based on over 1200 cases from the Cardiac Atlas Project show the promise of fully automatic initialization and quality control for population studies. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Bousserez, Nicolas; Henze, Daven; Bowman, Kevin; Liu, Junjie; Jones, Dylan; Keller, Martin; Deng, Feng
2013-04-01
This work presents improved analysis error estimates for 4D-Var systems. From operational NWP models to top-down constraints on trace gas emissions, many of today's data assimilation and inversion systems in atmospheric science rely on variational approaches. This success is due to both the mathematical clarity of these formulations and the availability of computationally efficient minimization algorithms. However, unlike Kalman Filter-based algorithms, these methods do not provide an estimate of the analysis or forecast error covariance matrices, these error statistics being propagated only implicitly by the system. From both a practical (cycling assimilation) and scientific perspective, assessing uncertainties in the solution of the variational problem is critical. For large-scale linear systems, deterministic or randomization approaches can be considered based on the equivalence between the inverse Hessian of the cost function and the covariance matrix of analysis error. For perfectly quadratic systems, like incremental 4D-Var, Lanczos/Conjugate-Gradient algorithms have proven to be most efficient in generating low-rank approximations of the Hessian matrix during the minimization. For weakly non-linear systems though, the Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS), a quasi-Newton descent algorithm, is usually considered the best method for the minimization. Suitable for large-scale optimization, this method allows one to generate an approximation to the inverse Hessian using the latest m vector/gradient pairs generated during the minimization, m depending upon the available core memory. At each iteration, an initial low-rank approximation to the inverse Hessian has to be provided, which is called preconditioning. The ability of the preconditioner to retain useful information from previous iterations largely determines the efficiency of the algorithm. Here we assess the performance of different preconditioners to estimate the inverse Hessian of a large-scale 4D-Var system. The impact of using the diagonal preconditioners proposed by Gilbert and Le Maréchal (1989) instead of the usual Oren-Spedicato scalar will be first presented. We will also introduce new hybrid methods that combine randomization estimates of the analysis error variance with L-BFGS diagonal updates to improve the inverse Hessian approximation. Results from these new algorithms will be evaluated against standard large ensemble Monte-Carlo simulations. The methods explored here are applied to the problem of inferring global atmospheric CO2 fluxes using remote sensing observations, and are intended to be integrated with the future NASA Carbon Monitoring System.
Statistical Measures of Large-Scale Structure
NASA Astrophysics Data System (ADS)
Vogeley, Michael; Geller, Margaret; Huchra, John; Park, Changbom; Gott, J. Richard
1993-12-01
\\inv Mpc} To quantify clustering in the large-scale distribution of galaxies and to test theories for the formation of structure in the universe, we apply statistical measures to the CfA Redshift Survey. This survey is complete to m_{B(0)}=15.5 over two contiguous regions which cover one-quarter of the sky and include ~ 11,000 galaxies. The salient features of these data are voids with diameter 30-50\\hmpc and coherent dense structures with a scale ~ 100\\hmpc. Comparison with N-body simulations rules out the ``standard" CDM model (Omega =1, b=1.5, sigma_8 =1) at the 99% confidence level because this model has insufficient power on scales lambda >30\\hmpc. An unbiased open universe CDM model (Omega h =0.2) and a biased CDM model with non-zero cosmological constant (Omega h =0.24, lambda_0 =0.6) match the observed power spectrum. The amplitude of the power spectrum depends on the luminosity of galaxies in the sample; bright (L>L(*) ) galaxies are more strongly clustered than faint galaxies. The paucity of bright galaxies in low-density regions may explain this dependence. To measure the topology of large-scale structure, we compute the genus of isodensity surfaces of the smoothed density field. On scales in the ``non-linear" regime, <= 10\\hmpc, the high- and low-density regions are multiply-connected over a broad range of density threshold, as in a filamentary net. On smoothing scales >10\\hmpc, the topology is consistent with statistics of a Gaussian random field. Simulations of CDM models fail to produce the observed coherence of structure on non-linear scales (>95% confidence level). The underdensity probability (the frequency of regions with density contrast delta rho //lineρ=-0.8) depends strongly on the luminosity of galaxies; underdense regions are significantly more common (>2sigma ) in bright (L>L(*) ) galaxy samples than in samples which include fainter galaxies.
NASA Marshall Space Flight Center Controls Systems Design and Analysis Branch
NASA Technical Reports Server (NTRS)
Gilligan, Eric
2014-01-01
Marshall Space Flight Center maintains a critical national capability in the analysis of launch vehicle flight dynamics and flight certification of GN&C algorithms. MSFC analysts are domain experts in the areas of flexible-body dynamics and control-structure interaction, thrust vector control, sloshing propellant dynamics, and advanced statistical methods. Marshall's modeling and simulation expertise has supported manned spaceflight for over 50 years. Marshall's unparalleled capability in launch vehicle guidance, navigation, and control technology stems from its rich heritage in developing, integrating, and testing launch vehicle GN&C systems dating to the early Mercury-Redstone and Saturn vehicles. The Marshall team is continuously developing novel methods for design, including advanced techniques for large-scale optimization and analysis.
de la Osa, Nuria; Granero, Roser; Trepat, Esther; Domenech, Josep Maria; Ezpeleta, Lourdes
2016-01-01
This paper studies the discriminative capacity of CBCL/1½-5 (Manual for the ASEBA Preschool-Age Forms & Profiles, University of Vermont, Research Center for Children, Youth, & Families, Burlington, 2000) DSM5 scales attention deficit and hyperactivity disorder (ADHD), oppositional defiant disorder (ODD), anxiety and depressive problems for detecting the presence of DSM5 (DSM5 diagnostic and statistical manual of mental disorders, APA, Arlington, 2013) disorders, ADHD, ODD, Anxiety and Mood disorders, assessed through diagnostic interview, in children aged 3-5. Additionally, we compare the clinical utility of the CBCL/1½-5-DSM5 scales with respect to analogous CBCL/1½-5 syndrome scales. A large community sample of 616 preschool children was longitudinally assessed for the stated age group. Statistical analysis was based on ROC procedures and binary logistic regressions. ADHD and ODD CBCL/1½-5-DSM5 scales achieved good discriminative ability to identify ADHD and ODD interview's diagnoses, at any age. CBCL/1½-5-DSM5 Anxiety scale discriminative capacity was fair for unspecific anxiety disorders in all age groups. CBCL/1½-5-DSM5 depressive problems' scale showed the poorest discriminative capacity for mood disorders (including depressive episode with insufficient symptoms), oscillating into the poor-to-fair range. As a whole, DSM5-oriented scales generally did not provide evidence better for discriminative capacity than syndrome scales in identifying DSM5 diagnoses. CBCL/1½-5-DSM5 scales discriminate externalizing disorders better than internalizing disorders for ages 3-5. Scores on the ADHD and ODD CBCL/1½-5-DSM5 scales can be used to screen for DSM5 ADHD and ODD disorders in general populations of preschool children.
2013-01-01
Background As high-throughput genomic technologies become accurate and affordable, an increasing number of data sets have been accumulated in the public domain and genomic information integration and meta-analysis have become routine in biomedical research. In this paper, we focus on microarray meta-analysis, where multiple microarray studies with relevant biological hypotheses are combined in order to improve candidate marker detection. Many methods have been developed and applied in the literature, but their performance and properties have only been minimally investigated. There is currently no clear conclusion or guideline as to the proper choice of a meta-analysis method given an application; the decision essentially requires both statistical and biological considerations. Results We performed 12 microarray meta-analysis methods for combining multiple simulated expression profiles, and such methods can be categorized for different hypothesis setting purposes: (1) HS A : DE genes with non-zero effect sizes in all studies, (2) HS B : DE genes with non-zero effect sizes in one or more studies and (3) HS r : DE gene with non-zero effect in "majority" of studies. We then performed a comprehensive comparative analysis through six large-scale real applications using four quantitative statistical evaluation criteria: detection capability, biological association, stability and robustness. We elucidated hypothesis settings behind the methods and further apply multi-dimensional scaling (MDS) and an entropy measure to characterize the meta-analysis methods and data structure, respectively. Conclusions The aggregated results from the simulation study categorized the 12 methods into three hypothesis settings (HS A , HS B , and HS r ). Evaluation in real data and results from MDS and entropy analyses provided an insightful and practical guideline to the choice of the most suitable method in a given application. All source files for simulation and real data are available on the author’s publication website. PMID:24359104
Chang, Lun-Ching; Lin, Hui-Min; Sibille, Etienne; Tseng, George C
2013-12-21
As high-throughput genomic technologies become accurate and affordable, an increasing number of data sets have been accumulated in the public domain and genomic information integration and meta-analysis have become routine in biomedical research. In this paper, we focus on microarray meta-analysis, where multiple microarray studies with relevant biological hypotheses are combined in order to improve candidate marker detection. Many methods have been developed and applied in the literature, but their performance and properties have only been minimally investigated. There is currently no clear conclusion or guideline as to the proper choice of a meta-analysis method given an application; the decision essentially requires both statistical and biological considerations. We performed 12 microarray meta-analysis methods for combining multiple simulated expression profiles, and such methods can be categorized for different hypothesis setting purposes: (1) HS(A): DE genes with non-zero effect sizes in all studies, (2) HS(B): DE genes with non-zero effect sizes in one or more studies and (3) HS(r): DE gene with non-zero effect in "majority" of studies. We then performed a comprehensive comparative analysis through six large-scale real applications using four quantitative statistical evaluation criteria: detection capability, biological association, stability and robustness. We elucidated hypothesis settings behind the methods and further apply multi-dimensional scaling (MDS) and an entropy measure to characterize the meta-analysis methods and data structure, respectively. The aggregated results from the simulation study categorized the 12 methods into three hypothesis settings (HS(A), HS(B), and HS(r)). Evaluation in real data and results from MDS and entropy analyses provided an insightful and practical guideline to the choice of the most suitable method in a given application. All source files for simulation and real data are available on the author's publication website.
Chalise, D. R.; Haj, Adel E.; Fontaine, T.A.
2018-01-01
The hydrological simulation program Fortran (HSPF) [Hydrological Simulation Program Fortran version 12.2 (Computer software). USEPA, Washington, DC] and the precipitation runoff modeling system (PRMS) [Precipitation Runoff Modeling System version 4.0 (Computer software). USGS, Reston, VA] models are semidistributed, deterministic hydrological tools for simulating the impacts of precipitation, land use, and climate on basin hydrology and streamflow. Both models have been applied independently to many watersheds across the United States. This paper reports the statistical results assessing various temporal (daily, monthly, and annual) and spatial (small versus large watershed) scale biases in HSPF and PRMS simulations using two watersheds in the Black Hills, South Dakota. The Nash-Sutcliffe efficiency (NSE), Pearson correlation coefficient (r">rr), and coefficient of determination (R2">R2R2) statistics for the daily, monthly, and annual flows were used to evaluate the models’ performance. Results from the HSPF models showed that the HSPF consistently simulated the annual flows for both large and small basins better than the monthly and daily flows, and the simulated flows for the small watershed better than flows for the large watershed. In comparison, the PRMS model results show that the PRMS simulated the monthly flows for both the large and small watersheds better than the daily and annual flows, and the range of statistical error in the PRMS models was greater than that in the HSPF models. Moreover, it can be concluded that the statistical error in the HSPF and the PRMSdaily, monthly, and annual flow estimates for watersheds in the Black Hills was influenced by both temporal and spatial scale variability.
From sexless to sexy: Why it is time for human genetics to consider and report analyses of sex.
Powers, Matthew S; Smith, Phillip H; McKee, Sherry A; Ehringer, Marissa A
2017-01-01
Science has come a long way with regard to the consideration of sex differences in clinical and preclinical research, but one field remains behind the curve: human statistical genetics. The goal of this commentary is to raise awareness and discussion about how to best consider and evaluate possible sex effects in the context of large-scale human genetic studies. Over the course of this commentary, we reinforce the importance of interpreting genetic results in the context of biological sex, establish evidence that sex differences are not being considered in human statistical genetics, and discuss how best to conduct and report such analyses. Our recommendation is to run stratified analyses by sex no matter the sample size or the result and report the findings. Summary statistics from stratified analyses are helpful for meta-analyses, and patterns of sex-dependent associations may be hidden in a combined dataset. In the age of declining sequencing costs, large consortia efforts, and a number of useful control samples, it is now time for the field of human genetics to appropriately include sex in the design, analysis, and reporting of results.
Renosh, P R; Schmitt, Francois G; Loisel, Hubert
2015-01-01
Satellite remote sensing observations allow the ocean surface to be sampled synoptically over large spatio-temporal scales. The images provided from visible and thermal infrared satellite observations are widely used in physical, biological, and ecological oceanography. The present work proposes a method to understand the multi-scaling properties of satellite products such as the Chlorophyll-a (Chl-a), and the Sea Surface Temperature (SST), rarely studied. The specific objectives of this study are to show how the small scale heterogeneities of satellite images can be characterised using tools borrowed from the fields of turbulence. For that purpose, we show how the structure function, which is classically used in the frame of scaling time series analysis, can be used also in 2D. The main advantage of this method is that it can be applied to process images which have missing data. Based on both simulated and real images, we demonstrate that coarse-graining (CG) of a gradient modulus transform of the original image does not provide correct scaling exponents. We show, using a fractional Brownian simulation in 2D, that the structure function (SF) can be used with randomly sampled couple of points, and verify that 1 million of couple of points provides enough statistics.
Statistical model of exotic rotational correlations in emergent space-time
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogan, Craig; Kwon, Ohkyung; Richardson, Jonathan
2017-06-06
A statistical model is formulated to compute exotic rotational correlations that arise as inertial frames and causal structure emerge on large scales from entangled Planck scale quantum systems. Noncommutative quantum dynamics are represented by random transverse displacements that respect causal symmetry. Entanglement is represented by covariance of these displacements in Planck scale intervals defined by future null cones of events on an observer's world line. Light that propagates in a nonradial direction inherits a projected component of the exotic rotational correlation that accumulates as a random walk in phase. A calculation of the projection and accumulation leads to exact predictionsmore » for statistical properties of exotic Planck scale correlations in an interferometer of any configuration. The cross-covariance for two nearly co-located interferometers is shown to depart only slightly from the autocovariance. Specific examples are computed for configurations that approximate realistic experiments, and show that the model can be rigorously tested.« less
Multiscale pore structure and constitutive models of fine-grained rocks
NASA Astrophysics Data System (ADS)
Heath, J. E.; Dewers, T. A.; Shields, E. A.; Yoon, H.; Milliken, K. L.
2017-12-01
A foundational concept of continuum poromechanics is the representative elementary volume or REV: an amount of material large enough that pore- or grain-scale fluctuations in relevant properties are dissipated to a definable mean, but smaller than length scales of heterogeneity. We determine 2D-equivalent representative elementary areas (REAs) of pore areal fraction of three major types of mudrocks by applying multi-beam scanning electron microscopy (mSEM) to obtain terapixel image mosaics. Image analysis obtains pore areal fraction and pore size and shape as a function of progressively larger measurement areas. Using backscattering imaging and mSEM data, pores are identified by the components within which they occur, such as in organics or the clastic matrix. We correlate pore areal fraction with nano-indentation, micropillar compression, and axysimmetic testing at multiple length scales on a terrigenous-argillaceous mudrock sample. The combined data set is used to: investigate representative elementary volumes (and areas for the 2D images); determine if scale separation occurs; and determine if transport and mechanical properties at a given length scale can be statistically defined. Clear scale separation occurs between REAs and observable heterogeneity in two of the samples. A highly-laminated sample exhibits fine-scale heterogeneity and an overlapping in scales, in which case typical continuum assumptions on statistical variability may break down. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc. for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525.
NASA Astrophysics Data System (ADS)
Cardall, Christian Y.; Budiardja, Reuben D.
2017-05-01
GenASiS Basics provides Fortran 2003 classes furnishing extensible object-oriented utilitarian functionality for large-scale physics simulations on distributed memory supercomputers. This functionality includes physical units and constants; display to the screen or standard output device; message passing; I/O to disk; and runtime parameter management and usage statistics. This revision -Version 2 of Basics - makes mostly minor additions to functionality and includes some simplifying name changes.
Emperical Laws in Economics Uncovered Using Methods in Statistical Mechanics
NASA Astrophysics Data System (ADS)
Stanley, H. Eugene
2001-06-01
In recent years, statistical physicists and computational physicists have determined that physical systems which consist of a large number of interacting particles obey universal "scaling laws" that serve to demonstrate an intrinsic self-similarity operating in such systems. Further, the parameters appearing in these scaling laws appear to be largely independent of the microscopic details. Since economic systems also consist of a large number of interacting units, it is plausible that scaling theory can be usefully applied to economics. To test this possibility using realistic data sets, a number of scientists have begun analyzing economic data using methods of statistical physics [1]. We have found evidence for scaling (and data collapse), as well as universality, in various quantities, and these recent results will be reviewed in this talk--starting with the most recent study [2]. We also propose models that may lead to some insight into these phenomena. These results will be discussed, as well as the overall rationale for why one might expect scaling principles to hold for complex economic systems. This work on which this talk is based is supported by BP, and was carried out in collaboration with L. A. N. Amaral S. V. Buldyrev, D. Canning, P. Cizeau, X. Gabaix, P. Gopikrishnan, S. Havlin, Y. Lee, Y. Liu, R. N. Mantegna, K. Matia, M. Meyer, C.-K. Peng, V. Plerou, M. A. Salinger, and M. H. R. Stanley. [1.] See, e.g., R. N. Mantegna and H. E. Stanley, Introduction to Econophysics: Correlations & Complexity in Finance (Cambridge University Press, Cambridge, 1999). [2.] P. Gopikrishnan, B. Rosenow, V. Plerou, and H. E. Stanley, "Identifying Business Sectors from Stock Price Fluctuations," e-print cond-mat/0011145; V. Plerou, P. Gopikrishnan, L. A. N. Amaral, X. Gabaix, and H. E. Stanley, "Diffusion and Economic Fluctuations," Phys. Rev. E (Rapid Communications) 62, 3023-3026 (2000); P. Gopikrishnan, V. Plerou, X. Gabaix, and H. E. Stanley, "Statistical Properties of Share Volume Traded in Financial Markets," Phys. Rev. E (Rapid Communications) 62, 4493-4496 (2000).
NASA Astrophysics Data System (ADS)
Wakazuki, Yasutaka; Hara, Masayuki; Fujita, Mikiko; Ma, Xieyao; Kimura, Fujio
2013-04-01
Regional scale climate change projections play an important role in assessments of influences of global warming and include statistical (SD) and dynamical downscaling (DD) approaches. One of DD methods is developed basing on the pseudo-global-warming (PGW) method developed by Kimura and Kitoh (2007) in this study. In general, DD uses regional climate model (RCM) with lateral boundary data. In PGW method, the climatological mean difference estimated by GCMs are added to the objective analysis data (ANAL), and the data are used as the lateral boundary data in the future climate simulations. The ANAL is also used as the lateral boundary conditions of the present climate simulation. One of merits of the PGW method is that influences of biases of GCMs in RCM simulations are reduced. However, the PGW method does not treat climate changes in relative humidity, year-to-year variation, and short-term disturbances. The developing new downscaling method is named as the incremental dynamical downscaling and analysis system (InDDAS). The InDDAS treat climate changes in relative humidity and year-to-year variations. On the other hand, uncertainties of climate change projections estimated by many GCMs are large and are not negligible. Thus, stochastic regional scale climate change projections are expected for assessments of influences of global warming. Many RCM runs must be performed to make stochastic information. However, the computational costs are huge because grid size of RCM runs should be small to resolve heavy rainfall phenomena. Therefore, the number of runs to make stochastic information must be reduced. In InDDAS, climatological differences added to ANAL become statistically pre-analyzed information. The climatological differences of many GCMs are divided into mean climatological difference (MD) and departures from MD. The departures are analyzed by principal component analysis, and positive and negative perturbations (positive and negative standard deviations multiplied by departure patterns (eigenvectors)) with multi modes are added to MD. Consequently, the most likely future states are calculated with climatological difference of MD. For example, future states in cases that temperature increase is large and small are calculated with MD plus positive and negative perturbations of the first mode.
Exploring Rating Quality in Rater-Mediated Assessments Using Mokken Scale Analysis
ERIC Educational Resources Information Center
Wind, Stefanie A.; Engelhard, George, Jr.
2016-01-01
Mokken scale analysis is a probabilistic nonparametric approach that offers statistical and graphical tools for evaluating the quality of social science measurement without placing potentially inappropriate restrictions on the structure of a data set. In particular, Mokken scaling provides a useful method for evaluating important measurement…
Identification of the underlying factor structure of the Derriford Appearance Scale 24
Lawson, Victoria; White, Paul
2015-01-01
Background. The Derriford Appearance Scale24 (DAS24) is a widely used measure of distress and dysfunction in relation to self-consciousness of appearance. It has been used in clinical and research settings, and translated into numerous European and Asian languages. Hitherto, no study has conducted an analysis to determine the underlying factor structure of the scale. Methods. A large (n = 1,265) sample of community and hospital patients with a visible difference were recruited face to face or by post, and completed the DAS24. Results. A two factor solution was generated. An evaluation of the congruence of the factor solutions on each of the the hospital and the community samples using Tucker’s Coefficient of Congruence (rc = .979) and confirmatory factor analysis, which demonstrated a consistent factor structure. A main factor, general self consciousness (GSC), was represented by 18 items. Six items comprised a second factor, sexual and body self-consciousness (SBSC). The SBSC scale demonstrated greater sensitivity and specificity in identifying distress for sexually significant areas of the body. Discussion. The factor structure of the DAS24 facilitates a more nuanced interpretation of scores using this scale. Two conceptually and statistically coherent sub-scales were identified. The SBSC sub-scale offers a means of identifying distress and dysfunction around sexually significant areas of the body not previously possible with this scale. PMID:26157633
Identification of the underlying factor structure of the Derriford Appearance Scale 24.
Moss, Timothy P; Lawson, Victoria; White, Paul
2015-01-01
Background. The Derriford Appearance Scale24 (DAS24) is a widely used measure of distress and dysfunction in relation to self-consciousness of appearance. It has been used in clinical and research settings, and translated into numerous European and Asian languages. Hitherto, no study has conducted an analysis to determine the underlying factor structure of the scale. Methods. A large (n = 1,265) sample of community and hospital patients with a visible difference were recruited face to face or by post, and completed the DAS24. Results. A two factor solution was generated. An evaluation of the congruence of the factor solutions on each of the the hospital and the community samples using Tucker's Coefficient of Congruence (rc = .979) and confirmatory factor analysis, which demonstrated a consistent factor structure. A main factor, general self consciousness (GSC), was represented by 18 items. Six items comprised a second factor, sexual and body self-consciousness (SBSC). The SBSC scale demonstrated greater sensitivity and specificity in identifying distress for sexually significant areas of the body. Discussion. The factor structure of the DAS24 facilitates a more nuanced interpretation of scores using this scale. Two conceptually and statistically coherent sub-scales were identified. The SBSC sub-scale offers a means of identifying distress and dysfunction around sexually significant areas of the body not previously possible with this scale.
Extracting Primordial Non-Gaussianity from Large Scale Structure in the Post-Planck Era
NASA Astrophysics Data System (ADS)
Dore, Olivier
Astronomical observations have become a unique tool to probe fundamental physics. Cosmology, in particular, emerged as a data-driven science whose phenomenological modeling has achieved great success: in the post-Planck era, key cosmological parameters are measured to percent precision. A single model reproduces a wealth of astronomical observations involving very distinct physical processes at different times. This success leads to fundamental physical questions. One of the most salient is the origin of the primordial perturbations that grew to form the large-scale structures we now observe. More and more cosmological observables point to inflationary physics as the origin of the structure observed in the universe. Inflationary physics predict the statistical properties of the primordial perturbations and it is thought to be slightly non-Gaussian. The detection of this small deviation from Gaussianity represents the next frontier in early Universe physics. To measure it would provide direct, unique and quantitative insights about the physics at play when the Universe was only a fraction of a second old, thus probing energies untouchable otherwise. En par with the well-known relic gravitational wave radiation -- the famous ``B-modes'' -- it is one the few probes of inflation. This departure from Gaussianity leads to very specific signature in the large scale clustering of galaxies. Observing large-scale structure, we can thus establish a direct connection with fundamental theories of the early universe. In the post-Planck era, large-scale structures are our most promising pathway to measuring this primordial signal. Current estimates suggests that the next generation of space or ground based large scale structure surveys (e.g. the ESA EUCLID or NASA WFIRST missions) might enable a detection of this signal. This potential huge payoff requires us to solidify the theoretical predictions supporting these measurements. Even if the exact signal we are looking for is of unknown amplitude, it is obvious that we must measure it as well as these ground breaking data set will permit. We propose to develop the supporting theoretical work to the point where the complete non-gaussianian signature can be extracted from these data sets. We will do so by developing three complementary directions: - We will develop the appropriate formalism to measure and model galaxy clustering on the largest scales. - We will study the impact of non-Gaussianity on higher-order statistics, the most promising statistics for our purpose.. - We will explicit the connection between these observables and the microphysics of a large class of inflation models, but also identify fundamental limitations to this interpretation.
Fast numerical methods for simulating large-scale integrate-and-fire neuronal networks.
Rangan, Aaditya V; Cai, David
2007-02-01
We discuss numerical methods for simulating large-scale, integrate-and-fire (I&F) neuronal networks. Important elements in our numerical methods are (i) a neurophysiologically inspired integrating factor which casts the solution as a numerically tractable integral equation, and allows us to obtain stable and accurate individual neuronal trajectories (i.e., voltage and conductance time-courses) even when the I&F neuronal equations are stiff, such as in strongly fluctuating, high-conductance states; (ii) an iterated process of spike-spike corrections within groups of strongly coupled neurons to account for spike-spike interactions within a single large numerical time-step; and (iii) a clustering procedure of firing events in the network to take advantage of localized architectures, such as spatial scales of strong local interactions, which are often present in large-scale computational models-for example, those of the primary visual cortex. (We note that the spike-spike corrections in our methods are more involved than the correction of single neuron spike-time via a polynomial interpolation as in the modified Runge-Kutta methods commonly used in simulations of I&F neuronal networks.) Our methods can evolve networks with relatively strong local interactions in an asymptotically optimal way such that each neuron fires approximately once in [Formula: see text] operations, where N is the number of neurons in the system. We note that quantifications used in computational modeling are often statistical, since measurements in a real experiment to characterize physiological systems are typically statistical, such as firing rate, interspike interval distributions, and spike-triggered voltage distributions. We emphasize that it takes much less computational effort to resolve statistical properties of certain I&F neuronal networks than to fully resolve trajectories of each and every neuron within the system. For networks operating in realistic dynamical regimes, such as strongly fluctuating, high-conductance states, our methods are designed to achieve statistical accuracy when very large time-steps are used. Moreover, our methods can also achieve trajectory-wise accuracy when small time-steps are used.
A large-scale multicentre cerebral diffusion tensor imaging study in amyotrophic lateral sclerosis.
Müller, Hans-Peter; Turner, Martin R; Grosskreutz, Julian; Abrahams, Sharon; Bede, Peter; Govind, Varan; Prudlo, Johannes; Ludolph, Albert C; Filippi, Massimo; Kassubek, Jan
2016-06-01
Damage to the cerebral tissue structural connectivity associated with amyotrophic lateral sclerosis (ALS), which extends beyond the motor pathways, can be visualised by diffusion tensor imaging (DTI). The effective translation of DTI metrics as biomarker requires its application across multiple MRI scanners and patient cohorts. A multicentre study was undertaken to assess structural connectivity in ALS within a large sample size. 442 DTI data sets from patients with ALS (N=253) and controls (N=189) were collected for this retrospective study, from eight international ALS-specialist clinic sites. Equipment and DTI protocols varied across the centres. Fractional anisotropy (FA) maps of the control participants were used to establish correction matrices to pool data, and correction algorithms were applied to the FA maps of the control and ALS patient groups. Analysis of data pooled from all centres, using whole-brain-based statistical analysis of FA maps, confirmed the most significant alterations in the corticospinal tracts, and captured additional significant white matter tract changes in the frontal lobe, brainstem and hippocampal regions of the ALS group that coincided with postmortem neuropathological stages. Stratification of the ALS group for disease severity (ALS functional rating scale) confirmed these findings. This large-scale study overcomes the challenges associated with processing and analysis of multiplatform, multicentre DTI data, and effectively demonstrates the anatomical fingerprint patterns of changes in a DTI metric that reflect distinct ALS disease stages. This success paves the way for the use of DTI-based metrics as read-out in natural history, prognostic stratification and multisite disease-modifying studies in ALS. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Next Generation Analytic Tools for Large Scale Genetic Epidemiology Studies of Complex Diseases
Mechanic, Leah E.; Chen, Huann-Sheng; Amos, Christopher I.; Chatterjee, Nilanjan; Cox, Nancy J.; Divi, Rao L.; Fan, Ruzong; Harris, Emily L.; Jacobs, Kevin; Kraft, Peter; Leal, Suzanne M.; McAllister, Kimberly; Moore, Jason H.; Paltoo, Dina N.; Province, Michael A.; Ramos, Erin M.; Ritchie, Marylyn D.; Roeder, Kathryn; Schaid, Daniel J.; Stephens, Matthew; Thomas, Duncan C.; Weinberg, Clarice R.; Witte, John S.; Zhang, Shunpu; Zöllner, Sebastian; Feuer, Eric J.; Gillanders, Elizabeth M.
2012-01-01
Over the past several years, genome-wide association studies (GWAS) have succeeded in identifying hundreds of genetic markers associated with common diseases. However, most of these markers confer relatively small increments of risk and explain only a small proportion of familial clustering. To identify obstacles to future progress in genetic epidemiology research and provide recommendations to NIH for overcoming these barriers, the National Cancer Institute sponsored a workshop entitled “Next Generation Analytic Tools for Large-Scale Genetic Epidemiology Studies of Complex Diseases” on September 15–16, 2010. The goal of the workshop was to facilitate discussions on (1) statistical strategies and methods to efficiently identify genetic and environmental factors contributing to the risk of complex disease; and (2) how to develop, apply, and evaluate these strategies for the design, analysis, and interpretation of large-scale complex disease association studies in order to guide NIH in setting the future agenda in this area of research. The workshop was organized as a series of short presentations covering scientific (gene-gene and gene-environment interaction, complex phenotypes, and rare variants and next generation sequencing) and methodological (simulation modeling and computational resources and data management) topic areas. Specific needs to advance the field were identified during each session and are summarized. PMID:22147673
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bartoli, B.; Catalanotti, S.; Piazzoli, B. D’Ettorre
2015-08-10
This paper reports on the measurement of the large-scale anisotropy in the distribution of cosmic-ray arrival directions using the data collected by the air shower detector ARGO-YBJ from 2008 January to 2009 December, during the minimum of solar activity between cycles 23 and 24. In this period, more than 2 × 10{sup 11} showers were recorded with energies between ∼1 and 30 TeV. The observed two-dimensional distribution of cosmic rays is characterized by two wide regions of excess and deficit, respectively, both of relative intensity ∼10{sup −3} with respect to a uniform flux, superimposed on smaller size structures. The harmonicmore » analysis shows that the large-scale cosmic-ray relative intensity as a function of R.A. can be described by the first and second terms of a Fouries series. The high event statistics allow the study of the energy dependence of the anistropy, showing that the amplitude increases with energy, with a maximum intensity at ∼10 TeV, and then decreases while the phase slowly shifts toward lower values of R.A. with increasing energy. The ARGO-YBJ data provide accurate observations over more than a decade of energy around this feature of the anisotropy spectrum.« less
Computers as an Instrument for Data Analysis. Technical Report No. 11.
ERIC Educational Resources Information Center
Muller, Mervin E.
A review of statistical data analysis involving computers as a multi-dimensional problem provides the perspective for consideration of the use of computers in statistical analysis and the problems associated with large data files. An overall description of STATJOB, a particular system for doing statistical data analysis on a digital computer,…
Park, Y.; Krause, E.; Dodelson, S.; ...
2016-09-30
The joint analysis of galaxy-galaxy lensing and galaxy clustering is a promising method for inferring the growth function of large scale structure. Our analysis will be carried out on data from the Dark Energy Survey (DES), with its measurements of both the distribution of galaxies and the tangential shears of background galaxies induced by these foreground lenses. We develop a practical approach to modeling the assumptions and systematic effects affecting small scale lensing, which provides halo masses, and large scale galaxy clustering. Introducing parameters that characterize the halo occupation distribution (HOD), photometric redshift uncertainties, and shear measurement errors, we studymore » how external priors on different subsets of these parameters affect our growth constraints. Degeneracies within the HOD model, as well as between the HOD and the growth function, are identified as the dominant source of complication, with other systematic effects sub-dominant. The impact of HOD parameters and their degeneracies necessitate the detailed joint modeling of the galaxy sample that we employ. Finally, we conclude that DES data will provide powerful constraints on the evolution of structure growth in the universe, conservatively/optimistically constraining the growth function to 7.9%/4.8% with its first-year data that covered over 1000 square degrees, and to 3.9%/2.3% with its full five-year data that will survey 5000 square degrees, including both statistical and systematic uncertainties.« less
Hypothesis exploration with visualization of variance
2014-01-01
Background The Consortium for Neuropsychiatric Phenomics (CNP) at UCLA was an investigation into the biological bases of traits such as memory and response inhibition phenotypes—to explore whether they are linked to syndromes including ADHD, Bipolar disorder, and Schizophrenia. An aim of the consortium was in moving from traditional categorical approaches for psychiatric syndromes towards more quantitative approaches based on large-scale analysis of the space of human variation. It represented an application of phenomics—wide-scale, systematic study of phenotypes—to neuropsychiatry research. Results This paper reports on a system for exploration of hypotheses in data obtained from the LA2K, LA3C, and LA5C studies in CNP. ViVA is a system for exploratory data analysis using novel mathematical models and methods for visualization of variance. An example of these methods is called VISOVA, a combination of visualization and analysis of variance, with the flavor of exploration associated with ANOVA in biomedical hypothesis generation. It permits visual identification of phenotype profiles—patterns of values across phenotypes—that characterize groups. Visualization enables screening and refinement of hypotheses about variance structure of sets of phenotypes. Conclusions The ViVA system was designed for exploration of neuropsychiatric hypotheses by interdisciplinary teams. Automated visualization in ViVA supports ‘natural selection’ on a pool of hypotheses, and permits deeper understanding of the statistical architecture of the data. Large-scale perspective of this kind could lead to better neuropsychiatric diagnostics. PMID:25097666
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, Y.; Krause, E.; Dodelson, S.
The joint analysis of galaxy-galaxy lensing and galaxy clustering is a promising method for inferring the growth function of large scale structure. Our analysis will be carried out on data from the Dark Energy Survey (DES), with its measurements of both the distribution of galaxies and the tangential shears of background galaxies induced by these foreground lenses. We develop a practical approach to modeling the assumptions and systematic effects affecting small scale lensing, which provides halo masses, and large scale galaxy clustering. Introducing parameters that characterize the halo occupation distribution (HOD), photometric redshift uncertainties, and shear measurement errors, we studymore » how external priors on different subsets of these parameters affect our growth constraints. Degeneracies within the HOD model, as well as between the HOD and the growth function, are identified as the dominant source of complication, with other systematic effects sub-dominant. The impact of HOD parameters and their degeneracies necessitate the detailed joint modeling of the galaxy sample that we employ. Finally, we conclude that DES data will provide powerful constraints on the evolution of structure growth in the universe, conservatively/optimistically constraining the growth function to 7.9%/4.8% with its first-year data that covered over 1000 square degrees, and to 3.9%/2.3% with its full five-year data that will survey 5000 square degrees, including both statistical and systematic uncertainties.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anovitz, Lawrence; Cole, David; Rother, Gernot
2013-01-01
Small- and Ultra-Small Angle Neutron Scattering (SANS and USANS) provide powerful tools for quantitative analysis of porous rocks, yielding bulk statistical information over a wide range of length scales. This study utilized (U)SANS to characterize shallowly buried quartz arenites from the St. Peter Sandstone. Backscattered electron imaging was also used to extend the data to larger scales. These samples contain significant volumes of large-scale porosity, modified by quartz overgrowths, and neutron scattering results show significant sub-micron porosity. While previous scattering data from sandstones suggest scattering is dominated by surface fractal behavior over many orders of magnitude, careful analysis of ourmore » data shows both fractal and pseudo-fractal behavior. The scattering curves are composed of subtle steps, modeled as polydispersed assemblages of pores with log-normal distributions. However, in some samples an additional surface-fractal overprint is present, while in others there is no such structure, and scattering can be explained by summation of non-fractal structures. Combined with our work on other rock-types, these data suggest that microporosity is more prevalent, and may play a much more important role than previously thought in fluid/rock interactions.« less
A statistical study of decaying kink oscillations detected using SDO/AIA
NASA Astrophysics Data System (ADS)
Goddard, C. R.; Nisticò, G.; Nakariakov, V. M.; Zimovets, I. V.
2016-01-01
Context. Despite intensive studies of kink oscillations of coronal loops in the last decade, a large-scale statistically significant investigation of the oscillation parameters has not been made using data from the Solar Dynamics Observatory (SDO). Aims: We carry out a statistical study of kink oscillations using extreme ultraviolet imaging data from a previously compiled catalogue. Methods: We analysed 58 kink oscillation events observed by the Atmospheric Imaging Assembly (AIA) on board SDO during its first four years of operation (2010-2014). Parameters of the oscillations, including the initial apparent amplitude, period, length of the oscillating loop, and damping are studied for 120 individual loop oscillations. Results: Analysis of the initial loop displacement and oscillation amplitude leads to the conclusion that the initial loop displacement prescribes the initial amplitude of oscillation in general. The period is found to scale with the loop length, and a linear fit of the data cloud gives a kink speed of Ck = (1330 ± 50) km s-1. The main body of the data corresponds to kink speeds in the range Ck = (800-3300) km s-1. Measurements of 52 exponential damping times were made, and it was noted that at least 21 of the damping profiles may be better approximated by a combination of non-exponential and exponential profiles rather than a purely exponential damping envelope. There are nine additional cases where the profile appears to be purely non-exponential and no damping time was measured. A scaling of the exponential damping time with the period is found, following the previously established linear scaling between these two parameters.
Intrinsic alignments of galaxies in the MassiveBlack-II simulation: analysis of two-point statistics
NASA Astrophysics Data System (ADS)
Tenneti, Ananth; Singh, Sukhdeep; Mandelbaum, Rachel; di Matteo, Tiziana; Feng, Yu; Khandai, Nishikanta
2015-04-01
The intrinsic alignment of galaxies with the large-scale density field is an important astrophysical contaminant in upcoming weak lensing surveys. We present detailed measurements of the galaxy intrinsic alignments and associated ellipticity-direction (ED) and projected shape (wg+) correlation functions for galaxies in the cosmological hydrodynamic MassiveBlack-II simulation. We carefully assess the effects on galaxy shapes, misalignment of the stellar component with the dark matter shape and two-point statistics of iterative weighted (by mass and luminosity) definitions of the (reduced and unreduced) inertia tensor. We find that iterative procedures must be adopted for a reliable measurement of the reduced tensor but that luminosity versus mass weighting has only negligible effects. Both ED and wg+ correlations increase in amplitude with subhalo mass (in the range of 1010-6.0 × 1014 h-1 M⊙), with a weak redshift dependence (from z = 1 to 0.06) at fixed mass. At z ˜ 0.3, we predict a wg+ that is in reasonable agreement with Sloan Digital Sky Survey luminous red galaxy measurements and that decreases in amplitude by a factor of ˜5-18 for galaxies in the Large Synoptic Survey Telescope survey. We also compared the intrinsic alignments of centrals and satellites, with clear detection of satellite radial alignments within their host haloes. Finally, we show that wg+ (using subhaloes as tracers of density) and wδ+ (using dark matter density) predictions from the simulations agree with that of non-linear alignment (NLA) models at scales where the two-halo term dominates in the correlations (and tabulate associated NLA fitting parameters). The one-halo term induces a scale-dependent bias at small scales which is not modelled in the NLA model.
NASA Technical Reports Server (NTRS)
Toll, T. A.
1980-01-01
A parametric analysis was made to investigate the relationship between current cargo airplanes and possible future designs that may differ greatly in both size and configuration. The method makes use of empirical scaling laws developed from statistical studies of data from current and advanced airplanes and, in addition, accounts for payload density, effects of span distributed load, and variations in tail area ratio. The method is believed to be particularly useful for exploratory studies of design and technology options for large airplanes. The analysis predicts somewhat more favorable variations of the ratios of payload to gross weight and block fuel to payload as the airplane size is increased than has been generally understood from interpretations of the cube-square law. In terms of these same ratios, large all wing (spanloader) designs show an advantage over wing-fuselage designs.
Roccuzzo, Sebastiana; Beckerman, Andrew P; Pandhal, Jagroop
2016-12-01
Open raceway ponds are regarded as the most economically viable option for large-scale cultivation of microalgae for low to mid-value bio-products, such as biodiesel. However, improvements are required including reducing the costs associated with harvesting biomass. There is now a growing interest in exploiting natural ecological processes within biotechnology. We review how chemical cues produced by algal grazers induce colony formation in algal cells, which subsequently leads to their sedimentation. A statistical meta-analysis of more than 80 studies reveals that Daphnia grazers can induce high levels of colony formation and sedimentation in Scenedesmus obliquus and that these natural, infochemical induced sedimentation rates are comparable to using commercial chemical equivalents. These data suggest that natural ecological interactions can be co-opted in biotechnology as part of a promising, low energy and clean harvesting method for use in large raceway systems.
Thin-plate spline analysis of the cranial base in subjects with Class III malocclusion.
Singh, G D; McNamara, J A; Lozanoff, S
1997-08-01
The role of the cranial base in the emergence of Class III malocclusion is not fully understood. This study determines deformations that contribute to a Class III cranial base morphology, employing thin-plate spline analysis on lateral cephalographs. A total of 73 children of European-American descent aged between 5 and 11 years of age with Class III malocclusion were compared with an equivalent group of subjects with a normal, untreated, Class I molar occlusion. The cephalographs were traced, checked and subdivided into seven age- and sex-matched groups. Thirteen points on the cranial base were identified and digitized. The datasets were scaled to an equivalent size, and statistical analysis indicated significant differences between average Class I and Class III cranial base morphologies for each group. Thin-plate spline analysis indicated that both affine (uniform) and non-affine transformations contribute toward the total spline for each average cranial base morphology at each age group analysed. For non-affine transformations, Partial warps 10, 8 and 7 had high magnitudes, indicating large-scale deformations affecting Bolton point, basion, pterygo-maxillare, Ricketts' point and articulare. In contrast, high eigenvalues associated with Partial warps 1-3, indicating localized shape changes, were found at tuberculum sellae, sella, and the frontonasomaxillary suture. It is concluded that large spatial-scale deformations affect the occipital complex of the cranial base and sphenoidal region, in combination with localized distortions at the frontonasal suture. These deformations may contribute to reduced orthocephalization or deficient flattening of the cranial base antero-posteriorly that, in turn, leads to the formation of a Class III malocclusion.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Terrana, Alexandra; Johnson, Matthew C.; Harris, Mary-Jean, E-mail: aterrana@perimeterinstitute.ca, E-mail: mharris8@perimeterinstitute.ca, E-mail: mjohnson@perimeterinstitute.ca
Due to cosmic variance we cannot learn any more about large-scale inhomogeneities from the primary cosmic microwave background (CMB) alone. More information on large scales is essential for resolving large angular scale anomalies in the CMB. Here we consider cross correlating the large-scale kinetic Sunyaev Zel'dovich (kSZ) effect and probes of large-scale structure, a technique known as kSZ tomography. The statistically anisotropic component of the cross correlation encodes the CMB dipole as seen by free electrons throughout the observable Universe, providing information about long wavelength inhomogeneities. We compute the large angular scale power asymmetry, constructing the appropriate transfer functions, andmore » estimate the cosmic variance limited signal to noise for a variety of redshift bin configurations. The signal to noise is significant over a large range of power multipoles and numbers of bins. We present a simple mode counting argument indicating that kSZ tomography can be used to estimate more modes than the primary CMB on comparable scales. A basic forecast indicates that a first detection could be made with next-generation CMB experiments and galaxy surveys. This paper motivates a more systematic investigation of how close to the cosmic variance limit it will be possible to get with future observations.« less
The HI Content of Galaxies as a Function of Local Density and Large-Scale Environment
NASA Astrophysics Data System (ADS)
Thoreen, Henry; Cantwell, Kelly; Maloney, Erin; Cane, Thomas; Brough Morris, Theodore; Flory, Oscar; Raskin, Mark; Crone-Odekon, Mary; ALFALFA Team
2017-01-01
We examine the HI content of galaxies as a function of environment, based on a catalogue of 41527 galaxies that are part of the 70% complete Arecibo Legacy Fast-ALFA (ALFALFA) survey. We use nearest-neighbor methods to characterize local environment, and a modified version of the algorithm developed for the Galaxy and Mass Assembly (GAMA) survey to classify large-scale environment as group, filament, tendril, or void. We compare the HI content in these environments using statistics that include both HI detections and the upper limits on detections from ALFALFA. The large size of the sample allows to statistically compare the HI content in different environments for early-type galaxies as well as late-type galaxies. This work is supported by NSF grants AST-1211005 and AST-1637339, the Skidmore Faculty-Student Summer Research program, and the Schupf Scholars program.
Astrophysical data analysis with information field theory
NASA Astrophysics Data System (ADS)
Enßlin, Torsten
2014-12-01
Non-parametric imaging and data analysis in astrophysics and cosmology can be addressed by information field theory (IFT), a means of Bayesian, data based inference on spatially distributed signal fields. IFT is a statistical field theory, which permits the construction of optimal signal recovery algorithms. It exploits spatial correlations of the signal fields even for nonlinear and non-Gaussian signal inference problems. The alleviation of a perception threshold for recovering signals of unknown correlation structure by using IFT will be discussed in particular as well as a novel improvement on instrumental self-calibration schemes. IFT can be applied to many areas. Here, applications in in cosmology (cosmic microwave background, large-scale structure) and astrophysics (galactic magnetism, radio interferometry) are presented.
Impact of satellite-based data on FGGE general circulation statistics
NASA Technical Reports Server (NTRS)
Salstein, David A.; Rosen, Richard D.; Baker, Wayman E.; Kalnay, Eugenia
1987-01-01
The NASA Goddard Laboratory for Atmospheres (GLA) analysis/forecast system was run in two different parallel modes in order to evaluate the influence that data from satellites and other FGGE observation platforms can have on analyses of large scale circulation; in the first mode, data from all observation systems were used, while in the second only conventional upper air and surface reports were used. The GLA model was also integrated for the same period without insertion of any data; an independent objective analysis based only on rawinsonde and pilot balloon data is also performed. A small decrease in the vigor of the general circulation is noted to follow from the inclusion of satellite observations.
Heidelberg Retina Tomography Analysis in Optic Disks with Anatomic Particularities
Alexandrescu, C; Pascu, R; Ilinca, R; Popescu, V; Ciuluvica, R; Voinea, L; Celea, C
2010-01-01
Due to its objectivity, reproducibility and predictive value confirmed by many large scale statistical clinical studies, Heidelberg Retina Tomography has become one of the most used computerized image analysis of the optic disc in glaucoma. It has been signaled, though, that the diagnostic value of Moorfieds Regression Analyses and Glaucoma Probability Score decreases when analyzing optic discs with extreme sizes. The number of false positive results increases in cases of megalopapilllae and the number of false negative results increases in cases of small size optic discs. The present paper is a review of the aspects one should take into account when analyzing a HRT result of an optic disc with anatomic particularities. PMID:21254731
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Xing, Z.; Fetzer, E.
2008-12-01
NASA's Earth Observing System (EOS) is the world's most ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the A-Train platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the cloud scenes from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time matchups between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, and assemble merged datasets for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, and perform pairwise instrument matchups for A-Train datasets. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System
NASA Astrophysics Data System (ADS)
Wilson, B.; Manipon, G.; Xing, Z.; Fetzer, E.
2009-04-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, perform pairwise instrument matchups for A-Train datasets, and compute fused products. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS Online
Forsberg, Erica M; Huan, Tao; Rinehart, Duane; Benton, H Paul; Warth, Benedikt; Hilmers, Brian; Siuzdak, Gary
2018-01-01
Systems biology is the study of complex living organisms, and as such, analysis on a systems-wide scale involves the collection of information-dense data sets that are representative of an entire phenotype. To uncover dynamic biological mechanisms, bioinformatics tools have become essential to facilitating data interpretation in large-scale analyses. Global metabolomics is one such method for performing systems biology, as metabolites represent the downstream functional products of ongoing biological processes. We have developed XCMS Online, a platform that enables online metabolomics data processing and interpretation. A systems biology workflow recently implemented within XCMS Online enables rapid metabolic pathway mapping using raw metabolomics data for investigating dysregulated metabolic processes. In addition, this platform supports integration of multi-omic (such as genomic and proteomic) data to garner further systems-wide mechanistic insight. Here, we provide an in-depth procedure showing how to effectively navigate and use the systems biology workflow within XCMS Online without a priori knowledge of the platform, including uploading liquid chromatography (LCLC)–mass spectrometry (MS) data from metabolite-extracted biological samples, defining the job parameters to identify features, correcting for retention time deviations, conducting statistical analysis of features between sample classes and performing predictive metabolic pathway analysis. Additional multi-omics data can be uploaded and overlaid with previously identified pathways to enhance systems-wide analysis of the observed dysregulations. We also describe unique visualization tools to assist in elucidation of statistically significant dysregulated metabolic pathways. Parameter input takes 5–10 min, depending on user experience; data processing typically takes 1–3 h, and data analysis takes ~30 min. PMID:29494574
Lindsey, Bruce D.; Rupert, Michael G.
2012-01-01
Decadal-scale changes in groundwater quality were evaluated by the U.S. Geological Survey National Water-Quality Assessment (NAWQA) Program. Samples of groundwater collected from wells during 1988-2000 - a first sampling event representing the decade ending the 20th century - were compared on a pair-wise basis to samples from the same wells collected during 2001-2010 - a second sampling event representing the decade beginning the 21st century. The data set consists of samples from 1,236 wells in 56 well networks, representing major aquifers and urban and agricultural land-use areas, with analytical results for chloride, dissolved solids, and nitrate. Statistical analysis was done on a network basis rather than by individual wells. Although spanning slightly more or less than a 10-year period, the two-sample comparison between the first and second sampling events is referred to as an analysis of decadal-scale change based on a step-trend analysis. The 22 principal aquifers represented by these 56 networks account for nearly 80 percent of the estimated withdrawals of groundwater used for drinking-water supply in the Nation. Well networks where decadal-scale changes in concentrations were statistically significant were identified using the Wilcoxon-Pratt signed-rank test. For the statistical analysis of chloride, dissolved solids, and nitrate concentrations at the network level, more than half revealed no statistically significant change over the decadal period. However, for networks that had statistically significant changes, increased concentrations outnumbered decreased concentrations by a large margin. Statistically significant increases of chloride concentrations were identified for 43 percent of 56 networks. Dissolved solids concentrations increased significantly in 41 percent of the 54 networks with dissolved solids data, and nitrate concentrations increased significantly in 23 percent of 56 networks. At least one of the three - chloride, dissolved solids, or nitrate - had a statistically significant increase in concentration in 66 percent of the networks. Statistically significant decreases in concentrations were identified in 4 percent of the networks for chloride, 2 percent of the networks for dissolved solids, and 9 percent of the networks for nitrate. A larger percentage of urban land-use networks had statistically significant increases in chloride, dissolved solids, and nitrate concentrations than agricultural land-use networks. In order to assess the magnitude of statistically significant changes, the median of the differences between constituent concentrations from the first full-network sampling event and those from the second full-network sampling event was calculated using the Turnbull method. The largest median decadal increases in chloride concentrations were in networks in the Upper Illinois River Basin (67 mg/L) and in the New England Coastal Basins (34 mg/L), whereas the largest median decadal decrease in chloride concentrations was in the Upper Snake River Basin (1 mg/L). The largest median decadal increases in dissolved solids concentrations were in networks in the Rio Grande Valley (260 mg/L) and the Upper Illinois River Basin (160 mg/L). The largest median decadal decrease in dissolved solids concentrations was in the Apalachicola-Chattahoochee-Flint River Basin (6.0 mg/L). The largest median decadal increases in nitrate as nitrogen (N) concentrations were in networks in the South Platte River Basin (2.0 mg/L as N) and the San Joaquin-Tulare Basins (1.0 mg/L as N). The largest median decadal decrease in nitrate concentrations was in the Santee River Basin and Coastal Drainages (0.63 mg/L). The magnitude of change in networks with statistically significant increases typically was much larger than the magnitude of change in networks with statistically significant decreases. The magnitude of change was greatest for chloride in the urban land-use networks and greatest for dissolved solids and nitrate in the agricultural land-use networks. Analysis of data from all networks combined indicated statistically significant increases for chloride, dissolved solids, and nitrate. Although chloride, dissolved solids, and nitrate concentrations were typically less than the drinking-water standards and guidelines, a statistical test was used to determine whether or not the proportion of samples exceeding the drinking-water standard or guideline changed significantly between the first and second full-network sampling events. The proportion of samples exceeding the U.S. Environmental Protection Agency (USEPA) Secondary Maximum Contaminant Level for dissolved solids (500 milligrams per liter) increased significantly between the first and second full-network sampling events when evaluating all networks combined at the national level. Also, for all networks combined, the proportion of samples exceeding the USEPA Maximum Contaminant Level (MCL) of 10 mg/L as N for nitrate increased significantly. One network in the Delmarva Peninsula had a significant increase in the proportion of samples exceeding the MCL for nitrate. A subset of 261 wells was sampled every other year (biennially) to evaluate decadal-scale changes using a time-series analysis. The analysis of the biennial data set showed that changes were generally similar to the findings from the analysis of decadal-scale change that was based on a step-trend analysis. Because of the small number of wells in a network with biennial data (typically 4-5 wells), the time-series analysis is more useful for understanding water-quality responses to changes in site-specific conditions rather than as an indicator of the change for the entire network.
Muths, Delphine; Le Couls, Sarah; Evano, Hugues; Grewe, Peter; Bourjea, Jerome
2013-01-01
Genetic population structure of swordfish Xiphias gladius was examined based on 2231 individual samples, collected mainly between 2009 and 2010, among three major sampling areas within the Indian Ocean (IO; twelve distinct sites), Atlantic (two sites) and Pacific (one site) Oceans using analysis of nineteen microsatellite loci (n = 2146) and mitochondrial ND2 sequences (n = 2001) data. Sample collection was stratified in time and space in order to investigate the stability of the genetic structure observed with a special focus on the South West Indian Ocean. Significant AMOVA variance was observed for both markers indicating genetic population subdivision was present between oceans. Overall value of F-statistics for ND2 sequences confirmed that Atlantic and Indian Oceans swordfish represent two distinct genetic stocks. Indo-Pacific differentiation was also significant but lower than that observed between Atlantic and Indian Oceans. However, microsatellite F-statistics failed to reveal structure even at the inter-oceanic scale, indicating that resolving power of our microsatellite loci was insufficient for detecting population subdivision. At the scale of the Indian Ocean, results obtained from both markers are consistent with swordfish belonging to a single unique panmictic population. Analyses partitioned by sampling area, season, or sex also failed to identify any clear structure within this ocean. Such large spatial and temporal homogeneity of genetic structure, observed for such a large highly mobile pelagic species, suggests as satisfactory to consider swordfish as a single panmictic population in the Indian Ocean. PMID:23717447
The linkage between geopotential height and monthly precipitation in Iran
NASA Astrophysics Data System (ADS)
Shirvani, Amin; Fadaei, Amir Sabetan; Landman, Willem A.
2018-04-01
This paper investigates the linkage between large-scale atmospheric circulation and monthly precipitation during November to April over Iran. Canonical correlation analysis (CCA) is used to set up the statistical linkage between the 850 hPa geopotential height large-scale circulation and monthly precipitation over Iran for the period 1968-2010. The monthly precipitation dataset for 50 synoptic stations distributed in different climate regions of Iran is considered as the response variable in the CCA. The monthly geopotential height reanalysis dataset over an area between 10° N and 60° N and from 20° E to 80° E is utilized as the explanatory variable in the CCA. Principal component analysis (PCA) as a pre-filter is used for data reduction for both explanatory and response variables before applying CCA. The optimal number of principal components and canonical variables to be retained in the CCA equations is determined using the highest average cross-validated Kendall's tau value. The 850 hPa geopotential height pattern over the Red Sea, Saudi Arabia, and Persian Gulf is found to be the major pattern related to Iranian monthly precipitation. The Pearson correlation between the area averaged of the observed and predicted precipitation over the study area for Jan, Feb, March, April, November, and December months are statistically significant at the 5% significance level and are 0.78, 0.80, 0.82, 0.74, 0.79, and 0.61, respectively. The relative operating characteristic (ROC) indicates that the highest scores for the above- and below-normal precipitation categories are, respectively, for February and April and the lowest scores found for December.
NASA Astrophysics Data System (ADS)
Fosas de Pando, Miguel; Schmid, Peter J.; Sipp, Denis
2016-11-01
Nonlinear model reduction for large-scale flows is an essential component in many fluid applications such as flow control, optimization, parameter space exploration and statistical analysis. In this article, we generalize the POD-DEIM method, introduced by Chaturantabut & Sorensen [1], to address nonlocal nonlinearities in the equations without loss of performance or efficiency. The nonlinear terms are represented by nested DEIM-approximations using multiple expansion bases based on the Proper Orthogonal Decomposition. These extensions are imperative, for example, for applications of the POD-DEIM method to large-scale compressible flows. The efficient implementation of the presented model-reduction technique follows our earlier work [2] on linearized and adjoint analyses and takes advantage of the modular structure of our compressible flow solver. The efficacy of the nonlinear model-reduction technique is demonstrated to the flow around an airfoil and its acoustic footprint. We could obtain an accurate and robust low-dimensional model that captures the main features of the full flow.
Mining influence on underground water resources in arid and semiarid regions
NASA Astrophysics Data System (ADS)
Luo, A. K.; Hou, Y.; Hu, X. Y.
2018-02-01
Coordinated mining of coal and water resources in arid and semiarid regions has traditionally become a focus issue. The research takes Energy and Chemical Base in Northern Shaanxi as an example, and conducts statistical analysis on coal yield and drainage volume from several large-scale mines in the mining area. Meanwhile, research determines average water volume per ton coal, and calculates four typical years’ drainage volume in different mining intensity. Then during mining drainage, with the combination of precipitation observation data in recent two decades and water level data from observation well, the calculation of groundwater table, precipitation infiltration recharge, and evaporation capacity are performed. Moreover, the research analyzes the transforming relationship between surface water, mine water, and groundwater. The result shows that the main reason for reduction of water resources quantity and transforming relationship between surface water, groundwater, and mine water is massive mine drainage, which is caused by large-scale coal mining in the research area.
NASA Technical Reports Server (NTRS)
Ganga, Ken; Page, Lyman; Cheng, Edward; Meyer, Stephan
1994-01-01
In many cosmological models, the large angular scale anisotropy in the cosmic microwave background is parameterized by a spectral index, n, and a quadrupolar amplitude, Q. For a Harrison-Peebles-Zel'dovich spectrum, n = 1. Using data from the Far Infrared Survey (FIRS) and a new statistical measure, a contour plot of the likelihood for cosmological models for which -1 less than n less than 3 and 0 equal to or less than Q equal to or less than 50 micro K is obtained. Depending upon the details of the analysis, the maximum likelihood occurs at n between 0.8 and 1.4 and Q between 18 and 21 micro K. Regardless of Q, the likelihood is always less than half its maximum for n less than -0.4 and for n greater than 2.2, as it is for Q less than 8 micro K and Q greater than 44 micro K.
Single cell Hi-C reveals cell-to-cell variability in chromosome structure
Schoenfelder, Stefan; Yaffe, Eitan; Dean, Wendy; Laue, Ernest D.; Tanay, Amos; Fraser, Peter
2013-01-01
Large-scale chromosome structure and spatial nuclear arrangement have been linked to control of gene expression and DNA replication and repair. Genomic techniques based on chromosome conformation capture assess contacts for millions of loci simultaneously, but do so by averaging chromosome conformations from millions of nuclei. Here we introduce single cell Hi-C, combined with genome-wide statistical analysis and structural modeling of single copy X chromosomes, to show that individual chromosomes maintain domain organisation at the megabase scale, but show variable cell-to-cell chromosome territory structures at larger scales. Despite this structural stochasticity, localisation of active gene domains to boundaries of territories is a hallmark of chromosomal conformation. Single cell Hi-C data bridge current gaps between genomics and microscopy studies of chromosomes, demonstrating how modular organisation underlies dynamic chromosome structure, and how this structure is probabilistically linked with genome activity patterns. PMID:24067610
Robert, Donatien; Douillard, Thierry; Boulineau, Adrien; Brunetti, Guillaume; Nowakowski, Pawel; Venet, Denis; Bayle-Guillemaud, Pascale; Cayron, Cyril
2013-12-23
LiFePO4 and FePO4 phase distributions of entire cross-sectioned electrodes with various Li content are investigated from nanoscale to mesoscale, by transmission electron microscopy and by the new electron forward scattering diffraction technique. The distributions of the fully delithiated (FePO4) or lithiated particles (LiFePO4) are mapped on large fields of view (>100 × 100 μm(2)). Heterogeneities in thin and thick electrodes are highlighted at different scales. At the nanoscale, the statistical analysis of 64 000 particles unambiguously shows that the small particles delithiate first. At the mesoscale, the phase maps reveal a core-shell mechanism at the scale of the agglomerates with a preferential pathway along the electrode porosities. At larger scale, lithiation occurs in thick electrodes "stratum by stratum" from the surface in contact with electrolyte toward the current collector.
Scaling Phenomenology in Meson Photoproduction from CLAS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Biplab Dey, Curtis A. Meyer
2010-08-01
In the high energy limit, perturbative QCD predicts that hard scattering amplitudes should follow simple scaling laws. For hard scattering at 90°, we show that experiments support this prediction even in the “medium energy” regime of 2.3 GeV<=sqrt(s)<=2.84 GeV, as long as there are no s-channel resonances present. Our data consists of high statistics measurements for five different exclusive meson photoproduction channels (pomega, peta, peta[prime], K+Lambdaand K+[summation]0) recently obtained from CLAS at Jefferson Lab. The same power-law scaling also leads to “saturated” Regge trajectories at high energies. That is, at large -t and -u, Regge trajectories must approach constant negativemore » integers. We demonstrate the application of saturated Regge phenomenology by performing a partial wave analysis fit to the gammayp-->peta[prime]differential cross sections.« less
(Finite) statistical size effects on compressive strength.
Weiss, Jérôme; Girard, Lucas; Gimbert, Florent; Amitrano, David; Vandembroucq, Damien
2014-04-29
The larger structures are, the lower their mechanical strength. Already discussed by Leonardo da Vinci and Edmé Mariotte several centuries ago, size effects on strength remain of crucial importance in modern engineering for the elaboration of safety regulations in structural design or the extrapolation of laboratory results to geophysical field scales. Under tensile loading, statistical size effects are traditionally modeled with a weakest-link approach. One of its prominent results is a prediction of vanishing strength at large scales that can be quantified in the framework of extreme value statistics. Despite a frequent use outside its range of validity, this approach remains the dominant tool in the field of statistical size effects. Here we focus on compressive failure, which concerns a wide range of geophysical and geotechnical situations. We show on historical and recent experimental data that weakest-link predictions are not obeyed. In particular, the mechanical strength saturates at a nonzero value toward large scales. Accounting explicitly for the elastic interactions between defects during the damage process, we build a formal analogy of compressive failure with the depinning transition of an elastic manifold. This critical transition interpretation naturally entails finite-size scaling laws for the mean strength and its associated variability. Theoretical predictions are in remarkable agreement with measurements reported for various materials such as rocks, ice, coal, or concrete. This formalism, which can also be extended to the flowing instability of granular media under multiaxial compression, has important practical consequences for future design rules.
Implementation of Dynamic Extensible Adaptive Locally Exchangeable Measures (IDEALEM) v 0.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sim, Alex; Lee, Dongeun; Wu, K. John
2016-03-04
Handling large streaming data is essential for various applications such as network traffic analysis, social networks, energy cost trends, and environment modeling. However, it is in general intractable to store, compute, search, and retrieve large streaming data. This software addresses a fundamental issue, which is to reduce the size of large streaming data and still obtain accurate statistical analysis. As an example, when a high-speed network such as 100 Gbps network is monitored, the collected measurement data rapidly grows so that polynomial time algorithms (e.g., Gaussian processes) become intractable. One possible solution to reduce the storage of vast amounts ofmore » measured data is to store a random sample, such as one out of 1000 network packets. However, such static sampling methods (linear sampling) have drawbacks: (1) it is not scalable for high-rate streaming data, and (2) there is no guarantee of reflecting the underlying distribution. In this software, we implemented a dynamic sampling algorithm, based on the recent technology from the relational dynamic bayesian online locally exchangeable measures, that reduces the storage of data records in a large scale, and still provides accurate analysis of large streaming data. The software can be used for both online and offline data records.« less
NASA Astrophysics Data System (ADS)
Wang, Ke; Testi, Leonardo; Burkert, Andreas; Walmsley, C. Malcolm; Beuther, Henrik; Henning, Thomas
2016-09-01
Large-scale gaseous filaments with lengths up to the order of 100 pc are on the upper end of the filamentary hierarchy of the Galactic interstellar medium (ISM). Their association with respect to the Galactic structure and their role in Galactic star formation are of great interest from both an observational and theoretical point of view. Previous “by-eye” searches, combined together, have started to uncover the Galactic distribution of large filaments, yet inherent bias and small sample size limit conclusive statistical results from being drawn. Here, we present (1) a new, automated method for identifying large-scale velocity-coherent dense filaments, and (2) the first statistics and the Galactic distribution of these filaments. We use a customized minimum spanning tree algorithm to identify filaments by connecting voxels in the position-position-velocity space, using the Bolocam Galactic Plane Survey spectroscopic catalog. In the range of 7\\buildrel{\\circ}\\over{.} 5≤slant l≤slant 194^\\circ , we have identified 54 large-scale filaments and derived mass (˜ {10}3{--}{10}5 {M}⊙ ), length (10-276 pc), linear mass density (54-8625 {M}⊙ pc-1), aspect ratio, linearity, velocity gradient, temperature, fragmentation, Galactic location, and orientation angle. The filaments concentrate along major spiral arms. They are widely distributed across the Galactic disk, with 50% located within ±20 pc from the Galactic mid-plane and 27% run in the center of spiral arms. An order of 1% of the molecular ISM is confined in large filaments. Massive star formation is more favorable in large filaments compared to elsewhere. This is the first comprehensive catalog of large filaments that can be useful for a quantitative comparison with spiral structures and numerical simulations.
NASA Astrophysics Data System (ADS)
Protat, A.; Delanoë, J.; May, P. T.; Haynes, J.; Jakob, C.; O'Connor, E.; Pope, M.; Wheeler, M. C.
2011-08-01
The high complexity of cloud parameterizations now held in models puts more pressure on observational studies to provide useful means to evaluate them. One approach to the problem put forth in the modelling community is to evaluate under what atmospheric conditions the parameterizations fail to simulate the cloud properties and under what conditions they do a good job. It is the ambition of this paper to characterize the variability of the statistical properties of tropical ice clouds in different tropical "regimes" recently identified in the literature to aid the development of better process-oriented parameterizations in models. For this purpose, the statistical properties of non-precipitating tropical ice clouds over Darwin, Australia are characterized using ground-based radar-lidar observations from the Atmospheric Radiation Measurement (ARM) Program. The ice cloud properties analysed are the frequency of ice cloud occurrence, the morphological properties (cloud top height and thickness), and the microphysical and radiative properties (ice water content, visible extinction, effective radius, and total concentration). The variability of these tropical ice cloud properties is then studied as a function of the large-scale cloud regimes derived from the International Satellite Cloud Climatology Project (ISCCP), the amplitude and phase of the Madden-Julian Oscillation (MJO), and the large-scale atmospheric regime as derived from a long-term record of radiosonde observations over Darwin. The vertical variability of ice cloud occurrence and microphysical properties is largest in all regimes (1.5 order of magnitude for ice water content and extinction, a factor 3 in effective radius, and three orders of magnitude in concentration, typically). 98 % of ice clouds in our dataset are characterized by either a small cloud fraction (smaller than 0.3) or a very large cloud fraction (larger than 0.9). In the ice part of the troposphere three distinct layers characterized by different statistically-dominant microphysical processes are identified. The variability of the ice cloud properties as a function of the large-scale atmospheric regime, cloud regime, and MJO phase is large, producing mean differences of up to a factor 8 in the frequency of ice cloud occurrence between large-scale atmospheric regimes and mean differences of a factor 2 typically in all microphysical properties. Finally, the diurnal cycle of the frequency of occurrence of ice clouds is also very different between regimes and MJO phases, with diurnal amplitudes of the vertically-integrated frequency of ice cloud occurrence ranging from as low as 0.2 (weak diurnal amplitude) to values in excess of 2.0 (very large diurnal amplitude). Modellers should now use these results to check if their model cloud parameterizations are capable of translating a given atmospheric forcing into the correct statistical ice cloud properties.
Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Hua, H.
2012-12-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within SciReduce a versatile set of python operators for data lookup, access, subsetting, co-registration, mining, fusion, and statistical analysis. All operators take in sets of geo-located arrays and generate more arrays. Large, multi-year satellite and model datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of granules) can be compared or fused in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP or webification URLs, thereby minimizing the size of the stored input and intermediate datasets. A typical map function might assemble and quality control AIRS Level-2 water vapor profiles for a year of data in parallel, then a reduce function would average the profiles in lat/lon bins (again, in parallel), and a final reduce would aggregate the climatology and write it to output files. We are using SciReduce to automate the production of multiple versions of a multi-year water vapor climatology (AIRS & MODIS), stratified by Cloudsat cloud classification, and compare it to models (ECMWF & MERRA reanalysis). We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing huge datasets on our own nodes and in the Amazon Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer.
Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud
NASA Astrophysics Data System (ADS)
Wilson, B.; Manipon, G.; Hua, H.
2012-04-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within SciReduce a versatile set of python operators for data lookup, access, subsetting, co-registration, mining, fusion, and statistical analysis. All operators take in sets of geo-arrays and generate more arrays. Large, multi-year satellite and model datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of granules) can be compared or fused in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP or webification URLs, thereby minimizing the size of the stored input and intermediate datasets. A typical map function might assemble and quality control AIRS Level-2 water vapor profiles for a year of data in parallel, then a reduce function would average the profiles in bins (again, in parallel), and a final reduce would aggregate the climatology and write it to output files. We are using SciReduce to automate the production of multiple versions of a multi-year water vapor climatology (AIRS & MODIS), stratified by Cloudsat cloud classification, and compare it to models (ECMWF & MERRA reanalysis). We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing huge datasets on our own nodes and in the Amazon Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer.
Mapping the Energy Cascade in the North Atlantic Ocean: The Coarse-graining Approach
Aluie, Hussein; Hecht, Matthew; Vallis, Geoffrey K.
2017-11-14
A coarse-graining framework is implemented to analyze nonlinear processes, measure energy transfer rates and map out the energy pathways from simulated global ocean data. Traditional tools to measure the energy cascade from turbulence theory, such as spectral flux or spectral transfer rely on the assumption of statistical homogeneity, or at least a large separation between the scales of motion and the scales of statistical inhomogeneity. The coarse-graining framework allows for probing the fully nonlinear dynamics simultaneously in scale and in space, and is not restricted by those assumptions. This study describes how the framework can be applied to ocean flows.
Mapping the Energy Cascade in the North Atlantic Ocean: The Coarse-graining Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aluie, Hussein; Hecht, Matthew; Vallis, Geoffrey K.
A coarse-graining framework is implemented to analyze nonlinear processes, measure energy transfer rates and map out the energy pathways from simulated global ocean data. Traditional tools to measure the energy cascade from turbulence theory, such as spectral flux or spectral transfer rely on the assumption of statistical homogeneity, or at least a large separation between the scales of motion and the scales of statistical inhomogeneity. The coarse-graining framework allows for probing the fully nonlinear dynamics simultaneously in scale and in space, and is not restricted by those assumptions. This study describes how the framework can be applied to ocean flows.
Spectral kinetic energy transfer in turbulent premixed reacting flows.
Towery, C A Z; Poludnenko, A Y; Urzay, J; O'Brien, J; Ihme, M; Hamlington, P E
2016-05-01
Spectral kinetic energy transfer by advective processes in turbulent premixed reacting flows is examined using data from a direct numerical simulation of a statistically planar turbulent premixed flame. Two-dimensional turbulence kinetic-energy spectra conditioned on the planar-averaged reactant mass fraction are computed through the flame brush and variations in the spectra are connected to terms in the spectral kinetic energy transport equation. Conditional kinetic energy spectra show that turbulent small-scale motions are suppressed in the burnt combustion products, while the energy content of the mean flow increases. An analysis of spectral kinetic energy transfer further indicates that, contrary to the net down-scale transfer of energy found in the unburnt reactants, advective processes transfer energy from small to large scales in the flame brush close to the products. Triadic interactions calculated through the flame brush show that this net up-scale transfer of energy occurs primarily at spatial scales near the laminar flame thermal width. The present results thus indicate that advective processes in premixed reacting flows contribute to energy backscatter near the scale of the flame.
A Large-Scale Analysis of Variance in Written Language.
Johns, Brendan T; Jamieson, Randall K
2018-01-22
The collection of very large text sources has revolutionized the study of natural language, leading to the development of several models of language learning and distributional semantics that extract sophisticated semantic representations of words based on the statistical redundancies contained within natural language (e.g., Griffiths, Steyvers, & Tenenbaum, ; Jones & Mewhort, ; Landauer & Dumais, ; Mikolov, Sutskever, Chen, Corrado, & Dean, ). The models treat knowledge as an interaction of processing mechanisms and the structure of language experience. But language experience is often treated agnostically. We report a distributional semantic analysis that shows written language in fiction books varies appreciably between books from the different genres, books from the same genre, and even books written by the same author. Given that current theories assume that word knowledge reflects an interaction between processing mechanisms and the language environment, the analysis shows the need for the field to engage in a more deliberate consideration and curation of the corpora used in computational studies of natural language processing. Copyright © 2018 Cognitive Science Society, Inc.
NASA Astrophysics Data System (ADS)
Mizukami, N.; Clark, M. P.; Newman, A. J.; Wood, A.; Gutmann, E. D.
2017-12-01
Estimating spatially distributed model parameters is a grand challenge for large domain hydrologic modeling, especially in the context of hydrologic model applications such as streamflow forecasting. Multi-scale Parameter Regionalization (MPR) is a promising technique that accounts for the effects of fine-scale geophysical attributes (e.g., soil texture, land cover, topography, climate) on model parameters and nonlinear scaling effects on model parameters. MPR computes model parameters with transfer functions (TFs) that relate geophysical attributes to model parameters at the native input data resolution and then scales them using scaling functions to the spatial resolution of the model implementation. One of the biggest challenges in the use of MPR is identification of TFs for each model parameter: both functional forms and geophysical predictors. TFs used to estimate the parameters of hydrologic models typically rely on previous studies or were derived in an ad-hoc, heuristic manner, potentially not utilizing maximum information content contained in the geophysical attributes for optimal parameter identification. Thus, it is necessary to first uncover relationships among geophysical attributes, model parameters, and hydrologic processes (i.e., hydrologic signatures) to obtain insight into which and to what extent geophysical attributes are related to model parameters. We perform multivariate statistical analysis on a large-sample catchment data set including various geophysical attributes as well as constrained VIC model parameters at 671 unimpaired basins over the CONUS. We first calibrate VIC model at each catchment to obtain constrained parameter sets. Additionally, parameter sets sampled during the calibration process are used for sensitivity analysis using various hydrologic signatures as objectives to understand the relationships among geophysical attributes, parameters, and hydrologic processes.
Learning a Living: First Results of the Adult Literacy and Life Skills Survey
ERIC Educational Resources Information Center
OECD Publishing (NJ1), 2005
2005-01-01
The Adult Literacy and Life Skills Survey (ALL) is a large-scale co-operative effort undertaken by governments, national statistics agencies, research institutions and multi-lateral agencies. The development and management of the study were co-ordinated by Statistics Canada and the Educational Testing Service (ETS) in collaboration with the…
Detection of Test Collusion via Kullback-Leibler Divergence
ERIC Educational Resources Information Center
Belov, Dmitry I.
2013-01-01
The development of statistical methods for detecting test collusion is a new research direction in the area of test security. Test collusion may be described as large-scale sharing of test materials, including answers to test items. Current methods of detecting test collusion are based on statistics also used in answer-copying detection.…
Kimura, Rie; Saiki, Akiko; Fujiwara-Tsukamoto, Yoko; Sakai, Yutaka; Isomura, Yoshikazu
2017-01-01
There have been few systematic population-wide analyses of relationships between spike synchrony within a period of several milliseconds and behavioural functions. In this study, we obtained a large amount of spike data from > 23,000 neuron pairs by multiple single-unit recording from deep layer neurons in motor cortical areas in rats performing a forelimb movement task. The temporal changes of spike synchrony in the whole neuron pairs were statistically independent of behavioural changes during the task performance, although some neuron pairs exhibited correlated changes in spike synchrony. Mutual information analyses revealed that spike synchrony made a smaller contribution than spike rate to behavioural functions. The strength of spike synchrony between two neurons was statistically independent of the spike rate-based preferences of the pair for behavioural functions. Spike synchrony within a period of several milliseconds in presynaptic neurons enables effective integration of functional information in the postsynaptic neuron. However, few studies have systematically analysed the population-wide relationships between spike synchrony and behavioural functions. Here we obtained a sufficiently large amount of spike data among regular-spiking (putatively excitatory) and fast-spiking (putatively inhibitory) neuron subtypes (> 23,000 pairs) by multiple single-unit recording from deep layers in motor cortical areas (caudal forelimb area, rostral forelimb area) in rats performing a forelimb movement task. After holding a lever, rats pulled the lever either in response to a cue tone (external-trigger trials) or spontaneously without any cue (internal-trigger trials). Many neurons exhibited functional spike activity in association with forelimb movements, and the preference of regular-spiking neurons in the rostral forelimb area was more biased toward externally triggered movement than that in the caudal forelimb area. We found that a population of neuron pairs with spike synchrony does exist, and that some neuron pairs exhibit a dependence on movement phase during task performance. However, the population-wide analysis revealed that spike synchrony was statistically independent of the movement phase and the spike rate-based preferences of the pair for behavioural functions, whereas spike rates were clearly dependent on the movement phase. In fact, mutual information analyses revealed that the contribution of spike synchrony to the behavioural functions was small relative to the contribution of spike rate. Our large-scale analysis revealed that cortical spike rate, rather than spike synchrony, contributes to population coding for movement. © 2016 The Authors. The Journal of Physiology © 2016 The Physiological Society.
Groups of galaxies in the Center for Astrophysics redshift survey
NASA Technical Reports Server (NTRS)
Ramella, Massimo; Geller, Margaret J.; Huchra, John P.
1989-01-01
By applying the Huchra and Geller (1982) objective group identification algorithm to the Center for Astrophysics' redshift survey, a catalog of 128 groups with three or more members is extracted, and 92 of these are used as a statistical sample. A comparison of the distribution of group centers with the distribution of all galaxies in the survey indicates qualitatively that groups trace the large-scale structure of the region. The physical properties of groups may be related to the details of large-scale structure, and it is concluded that differences among group catalogs may be due to the properties of large-scale structures and their location relative to the survey limits.
A large scale analysis of genetic variants within putative miRNA binding sites in prostate cancer
Stegeman, Shane; Amankwah, Ernest; Klein, Kerenaftali; O’Mara, Tracy A.; Kim, Donghwa; Lin, Hui-Yi; Permuth-Wey, Jennifer; Sellers, Thomas A.; Srinivasan, Srilakshmi; Eeles, Rosalind; Easton, Doug; Kote-Jarai, Zsofia; Olama, Ali Amin Al; Benlloch, Sara; Muir, Kenneth; Giles, Graham G.; Wiklund, Fredrik; Gronberg, Henrik; Haiman, Christopher A.; Schleutker, Johanna; Nordestgaard, Børge G.; Travis, Ruth C.; Neal, David; Pharoah, Paul; Khaw, Kay-Tee; Stanford, Janet L.; Blot, William J.; Thibodeau, Stephen; Maier, Christiane; Kibel, Adam S.; Cybulski, Cezary; Cannon-Albright, Lisa; Brenner, Hermann; Kaneva, Radka; Teixeira, Manuel R.; Consortium, PRACTICAL; Spurdle, Amanda B.; Clements, Judith A.; Park, Jong Y.; Batra, Jyotsna
2015-01-01
Prostate cancer is the second most common malignancy among men worldwide. Genome-wide association studies (GWAS) have identified 100 risk variants for prostate cancer, which can explain ~33% of the familial risk of the disease. We hypothesized that a comprehensive analysis of genetic variations found within the 3′ UTR of genes predicted to affect miRNA binding (miRSNPs) can identify additional prostate cancer risk variants. We investigated the association between 2,169 miRSNPs and prostate cancer risk in a large-scale analysis of 22,301 cases and 22,320 controls of European ancestry from 23 participating studies. Twenty-two miRSNPs were associated (p<2.3×10−5) with risk of prostate cancer, 10 of which were within the 7 genes previously not mapped by GWASs. Further, using miRNA mimics and reporter gene assays, we showed that miR-3162-5p has specific affinity for the KLK3 rs1058205 miRSNP T-allele whilst miR-370 has greater affinity for the VAMP8 rs1010 miRSNP A-allele, validating their functional role. Significance Findings from this large association study suggest that a focus on miRSNPs, including functional evaluation, can identify candidate risk loci below currently accepted statistical levels of genome-wide significance. Studies of miRNAs and their interactions with SNPs could provide further insights into the mechanisms of prostate cancer risk. PMID:25691096
NASA Astrophysics Data System (ADS)
Forootan, Ehsan; Kusche, Jürgen
2016-04-01
Geodetic/geophysical observations, such as the time series of global terrestrial water storage change or sea level and temperature change, represent samples of physical processes and therefore contain information about complex physical interactionswith many inherent time scales. Extracting relevant information from these samples, for example quantifying the seasonality of a physical process or its variability due to large-scale ocean-atmosphere interactions, is not possible by rendering simple time series approaches. In the last decades, decomposition techniques have found increasing interest for extracting patterns from geophysical observations. Traditionally, principal component analysis (PCA) and more recently independent component analysis (ICA) are common techniques to extract statistical orthogonal (uncorrelated) and independent modes that represent the maximum variance of observations, respectively. PCA and ICA can be classified as stationary signal decomposition techniques since they are based on decomposing the auto-covariance matrix or diagonalizing higher (than two)-order statistical tensors from centered time series. However, the stationary assumption is obviously not justifiable for many geophysical and climate variables even after removing cyclic components e.g., the seasonal cycles. In this paper, we present a new decomposition method, the complex independent component analysis (CICA, Forootan, PhD-2014), which can be applied to extract to non-stationary (changing in space and time) patterns from geophysical time series. Here, CICA is derived as an extension of real-valued ICA (Forootan and Kusche, JoG-2012), where we (i) define a new complex data set using a Hilbert transformation. The complex time series contain the observed values in their real part, and the temporal rate of variability in their imaginary part. (ii) An ICA algorithm based on diagonalization of fourth-order cumulants is then applied to decompose the new complex data set in (i). (iii) Dominant non-stationary patterns are recognized as independent complex patterns that can be used to represent the space and time amplitude and phase propagations. We present the results of CICA on simulated and real cases e.g., for quantifying the impact of large-scale ocean-atmosphere interaction on global mass changes. Forootan (PhD-2014) Statistical signal decomposition techniques for analyzing time-variable satellite gravimetry data, PhD Thesis, University of Bonn, http://hss.ulb.uni-bonn.de/2014/3766/3766.htm Forootan and Kusche (JoG-2012) Separation of global time-variable gravity signals into maximally independent components, Journal of Geodesy 86 (7), 477-497, doi: 10.1007/s00190-011-0532-5