Sample records for em gpu para

  1. Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

    NASA Astrophysics Data System (ADS)

    Schultz, A.

    2010-12-01

    3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We describe our ongoing efforts to achieve massive parallelization on a novel hybrid GPU testbed machine currently configured with 12 Intel Westmere Xeon CPU cores (or 24 parallel computational threads) with 96 GB DDR3 system memory, 4 GPU subsystems which in aggregate contain 960 NVidia Tesla GPU cores with 16 GB dedicated DDR3 GPU memory, and a second interleved bank of 4 GPU subsystems containing in aggregate 1792 NVidia Fermi GPU cores with 12 GB dedicated DDR5 GPU memory. We are applying domain decomposition methods to a modified version of Weiss' (2001) 3D frequency domain full physics EM finite difference code, an open source GPL licensed f90 code available for download from www.OpenEM.org. This will be the core of a new hybrid 3D inversion that parallelizes frequencies across CPUs and individual forward solutions across GPUs. We describe progress made in modifying the code to use direct solvers in GPU cores dedicated to each small subdomain, iteratively improving the solution by matching adjacent subdomain boundary solutions, rather than iterative Krylov space sparse solvers as currently applied to the whole domain.

  2. AKDNR - DNR Business Reporting System (DBRS)

    Science.gov Websites

    Skip to content State of Alaska myAlaska My Government Resident Business in Alaska Visiting Alaska Resources > IRM GPU > Main Menu DNR Business Reporting System (DBRS) The DNR Business Reporting System (DBRS) allows users to generate reports from the DNR Business databases and maps. The reports offered

  3. Peregrine Software Toolchains | High-Performance Computing | NREL

    Science.gov Websites

    toolchain is an open-source alternative against which many technical applications are natively developed and tested. The Portland Group compilers are not fully supported, but are available to the HPC community. Use Group (PGI) C/C++ and Fortran (partially supported) The PGI Accelerator compilers include NVIDIA GPU

  4. GPU-Accelerated Optical Coherence Tomography Signal Processing and Visualization

    NASA Astrophysics Data System (ADS)

    Darbrazi, Seyed Hamid Hosseiny

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  5. Bem vindo - Portal Brasileiro de Dados Abertos

    Science.gov Websites

    Pular para o conteúdo Portal do Governo Brasileiro Ir para o conteúdo 1 Ir para o menu 2 Ir para a busca 3 Ir para o rodapé 4 dados.gov.br Portal Brasileiro de Dados Abertos Enviar feed twitter CÃ

  6. Home - Fundação Nacional de Saúde

    Science.gov Websites

    Pular para o conteúdo Portal do Governo Brasileiro Atualize sua Barra de Governo Ir para o conteúdo 1 Ir para o menu 2 Ir para a busca 3 Ir para o rodapé 4 Acessibilidade Acessibilidade Alto

  7. Accelerating Computation of DCM for ERP in MATLAB by External Function Calls to the GPU.

    PubMed

    Wang, Wei-Jen; Hsieh, I-Fan; Chen, Chun-Chuan

    2013-01-01

    This study aims to improve the performance of Dynamic Causal Modelling for Event Related Potentials (DCM for ERP) in MATLAB by using external function calls to a graphics processing unit (GPU). DCM for ERP is an advanced method for studying neuronal effective connectivity. DCM utilizes an iterative procedure, the expectation maximization (EM) algorithm, to find the optimal parameters given a set of observations and the underlying probability model. As the EM algorithm is computationally demanding and the analysis faces possible combinatorial explosion of models to be tested, we propose a parallel computing scheme using the GPU to achieve a fast estimation of DCM for ERP. The computation of DCM for ERP is dynamically partitioned and distributed to threads for parallel processing, according to the DCM model complexity and the hardware constraints. The performance efficiency of this hardware-dependent thread arrangement strategy was evaluated using the synthetic data. The experimental data were used to validate the accuracy of the proposed computing scheme and quantify the time saving in practice. The simulation results show that the proposed scheme can accelerate the computation by a factor of 155 for the parallel part. For experimental data, the speedup factor is about 7 per model on average, depending on the model complexity and the data. This GPU-based implementation of DCM for ERP gives qualitatively the same results as the original MATLAB implementation does at the group level analysis. In conclusion, we believe that the proposed GPU-based implementation is very useful for users as a fast screen tool to select the most likely model and may provide implementation guidance for possible future clinical applications such as online diagnosis.

  8. Accelerating Computation of DCM for ERP in MATLAB by External Function Calls to the GPU

    PubMed Central

    Wang, Wei-Jen; Hsieh, I-Fan; Chen, Chun-Chuan

    2013-01-01

    This study aims to improve the performance of Dynamic Causal Modelling for Event Related Potentials (DCM for ERP) in MATLAB by using external function calls to a graphics processing unit (GPU). DCM for ERP is an advanced method for studying neuronal effective connectivity. DCM utilizes an iterative procedure, the expectation maximization (EM) algorithm, to find the optimal parameters given a set of observations and the underlying probability model. As the EM algorithm is computationally demanding and the analysis faces possible combinatorial explosion of models to be tested, we propose a parallel computing scheme using the GPU to achieve a fast estimation of DCM for ERP. The computation of DCM for ERP is dynamically partitioned and distributed to threads for parallel processing, according to the DCM model complexity and the hardware constraints. The performance efficiency of this hardware-dependent thread arrangement strategy was evaluated using the synthetic data. The experimental data were used to validate the accuracy of the proposed computing scheme and quantify the time saving in practice. The simulation results show that the proposed scheme can accelerate the computation by a factor of 155 for the parallel part. For experimental data, the speedup factor is about 7 per model on average, depending on the model complexity and the data. This GPU-based implementation of DCM for ERP gives qualitatively the same results as the original MATLAB implementation does at the group level analysis. In conclusion, we believe that the proposed GPU-based implementation is very useful for users as a fast screen tool to select the most likely model and may provide implementation guidance for possible future clinical applications such as online diagnosis. PMID:23840507

  9. PIA Program | CTIO

    Science.gov Websites

    se cuenta con los fondos necesarios para efectuar el Programa PIA para el 2018. We do, on occasion Observatorio Inter-Americano de Cerro Tololo es el centro nacional de los Estados Unidos para el desarrollo de unos 400 kms al norte de Santiago. Los telescopios se ubican en la cima de Cerro Tololo, a 90 kms al

  10. CNEA - Centro Atómico Bariloche

    Science.gov Websites

    ³mica (CNEA), a través de la Sección de... EL RA-6, RENOVADO EN EL AÑO DE SU 35º ANIVERSARIO EL RA-6 DIRIGIDAS A DOCENTES REALIZAN EN EL CAB CAPACITACIONES DIRIGIDAS A... La Comisión Nacional de Energía Atà ... RECONOCIMIENTO PARA LA SECCIÓN DE DIVULGACIÓN CIENTÍFICA Y TECNOLÓGICA DEL CAB RECONOCIMIENTO PARA LA SECCIÃ

  11. GASPACHO: a generic automatic solver using proximal algorithms for convex huge optimization problems

    NASA Astrophysics Data System (ADS)

    Goossens, Bart; Luong, Hiêp; Philips, Wilfried

    2017-08-01

    Many inverse problems (e.g., demosaicking, deblurring, denoising, image fusion, HDR synthesis) share various similarities: degradation operators are often modeled by a specific data fitting function while image prior knowledge (e.g., sparsity) is incorporated by additional regularization terms. In this paper, we investigate automatic algorithmic techniques for evaluating proximal operators. These algorithmic techniques also enable efficient calculation of adjoints from linear operators in a general matrix-free setting. In particular, we study the simultaneous-direction method of multipliers (SDMM) and the parallel proximal algorithm (PPXA) solvers and show that the automatically derived implementations are well suited for both single-GPU and multi-GPU processing. We demonstrate this approach for an Electron Microscopy (EM) deconvolution problem.

  12. NCEP HYSPLIT SMOKE & DUST Verification. NOAA/NWS/NCEP/EMC

    Science.gov Websites

    April May June July August Summer September October November December Prod vs Para Summer 2013 CA/MX Hawaii All regions PROD run All regions PARA run Select averaged hour: 1 hr average Select forecast four

  13. Uma grade de perfis teóricos para estrelas massivas em transição

    NASA Astrophysics Data System (ADS)

    Nascimento, C. M. P.; Machado, M. A.

    2003-08-01

    Na XXVIII Reunião Anual da Sociedade Astronômica Brasileira (2002) apresentamos uma grade de perfis calculados de acordo com os pontos da trajetória evolutiva de metalicidade solar, Z = 0.02 e taxa de perda de massa () padrão, para estrelas com massa inicial de 25, 40, 60, 85 e 120 massas solares. Estes perfis foram calculados com o auxílio de um código numérico adequado para descrever os ventos de objetos massivos, supondo simetria esférica, estacionaridade e homogeneidade. No presente trabalho, apresentamos a complementação da grade com os perfis teóricos relativos às trajetórias de Z = 0.02 com taxa de perda de massa dobrada em relação a padrão (2´), e de metalicidade Z = 0.008. Para cada ponto das três trajetórias obtemos os perfis teóricos de Ha, Hb, Hg e Hd, e como esperado eles se apresentam em pura emissão, pura absorção ou em P-Cygni. Para valores de taxa de perda de massa muito baixos (~10-7) não há formação de linhas, o que é visto nos primeiros pontos em todas as trajetórias. Em geral, para um mesmo ponto a componente de emissão diminui e a absorção aumenta de Ha para Hd. É verificado que as trajetórias com Z = 0.02 e padrão possuem menos circuitos (loops) do que as com metalicidade Z = 0.02 e 2´ padrão, e seus perfis são, em geral, menos intensos. Em relação a trajetória de Z = 0.008, verifica-se menos circuitos e maior variação em luminosidade, e seus perfis mostram-se em, algumas trajetórias, mais intensos. Verificamos também que, pontos distintos em uma mesma trajetória, apresentam perfis diferentes para valores similares de luminosidade e temperatura efetiva. Sendo assim, uma grade de perfis teóricos parece ser útil para fornecer uma informação preliminar sobre o estágio evolutivo de uma estrela massiva.

  14. Gctf: Real-time CTF determination and correction

    PubMed Central

    Zhang, Kai

    2016-01-01

    Accurate estimation of the contrast transfer function (CTF) is critical for a near-atomic resolution cryo electron microscopy (cryoEM) reconstruction. Here, a GPU-accelerated computer program, Gctf, for accurate and robust, real-time CTF determination is presented. The main target of Gctf is to maximize the cross-correlation of a simulated CTF with the logarithmic amplitude spectra (LAS) of observed micrographs after background subtraction. Novel approaches in Gctf improve both speed and accuracy. In addition to GPU acceleration (e.g. 10–50×), a fast ‘1-dimensional search plus 2-dimensional refinement (1S2R)’ procedure further speeds up Gctf. Based on the global CTF determination, the local defocus for each particle and for single frames of movies is accurately refined, which improves CTF parameters of all particles for subsequent image processing. Novel diagnosis method using equiphase averaging (EPA) and self-consistency verification procedures have also been implemented in the program for practical use, especially for aims of near-atomic reconstruction. Gctf is an independent program and the outputs can be easily imported into other cryoEM software such as Relion (Scheres, 2012) and Frealign (Grigorieff, 2007). The results from several representative datasets are shown and discussed in this paper. PMID:26592709

  15. Fiber-optic components for optical communicatios and sensing =

    NASA Astrophysics Data System (ADS)

    Marques, Carlos Alberto Ferreira

    Nos ultimos anos, a Optoelectronica tem sido estabelecida como um campo de investigacao capaz de conduzir a novas solucoes tecnologicas. As conquistas abundantes no campo da optica e lasers, bem como em comunicacoes opticas tem sido de grande importancia e desencadearam uma serie de inovacoes. Entre o grande numero de componentes opticos existentes, os componentes baseados em fibra optica sao principalmente relevantes devido a sua simplicidade e a elevada de transporte de dados da fibra optica. Neste trabalho foi focado um destes componentes opticos: as redes de difraccao em fibra optica, as quais tem propriedades opticas de processamento unicas. Esta classe de componentes opticos e extremamente atraente para o desenvolvimento de dispositivos de comunicacoes opticas e sensores. O trabalho comecou com uma analise teorica aplicada a redes em fibra e foram focados os metodos de fabricacao de redes em fibra mais utilizados. A inscricao de redes em fibra tambem foi abordado neste trabalho, onde um sistema de inscricao automatizada foi implementada para a fibra optica de silica, e os resultados experimentais mostraram uma boa aproximacao ao estudo de simulacao. Tambem foi desenvolvido um sistema de inscricao de redes de Bragg em fibra optica de plastico. Foi apresentado um estudo detalhado da modulacao acustico-optica em redes em fibra optica de silica e de plastico. Por meio de uma analise detalhada dos modos de excitacao mecanica aplicadas ao modulador acustico-optico, destacou-se que dois modos predominantes de excitacao acustica pode ser estabelecidos na fibra optica, dependendo da frequencia acustica aplicada. Atraves dessa caracterizacao, foi possivel desenvolver novas aplicacoes para comunicacoes opticas. Estudos e implementacao de diferentes dispositivos baseados em redes em fibra foram realizados, usando o efeito acustico-optico e o processo de regeneracao em fibra optica para varias aplicacoes tais como rapido multiplexador optico add-drop, atraso de grupo sintonizavel de redes de Bragg, redes de Bragg com descolamento de fase sintonizaveis, metodo para a inscricao de redes de Bragg com perfis complexos, filtro sintonizavel para equalizacao de ganho e filtros opticos notch ajustaveis.

  16. Elastography using multi-stream GPU: an application to online tracked ultrasound elastography, in-vivo and the da Vinci Surgical System.

    PubMed

    Deshmukh, Nishikant P; Kang, Hyun Jae; Billings, Seth D; Taylor, Russell H; Hager, Gregory D; Boctor, Emad M

    2014-01-01

    A system for real-time ultrasound (US) elastography will advance interventions for the diagnosis and treatment of cancer by advancing methods such as thermal monitoring of tissue ablation. A multi-stream graphics processing unit (GPU) based accelerated normalized cross-correlation (NCC) elastography, with a maximum frame rate of 78 frames per second, is presented in this paper. A study of NCC window size is undertaken to determine the effect on frame rate and the quality of output elastography images. This paper also presents a novel system for Online Tracked Ultrasound Elastography (O-TRuE), which extends prior work on an offline method. By tracking the US probe with an electromagnetic (EM) tracker, the system selects in-plane radio frequency (RF) data frames for generating high quality elastograms. A novel method for evaluating the quality of an elastography output stream is presented, suggesting that O-TRuE generates more stable elastograms than generated by untracked, free-hand palpation. Since EM tracking cannot be used in all systems, an integration of real-time elastography and the da Vinci Surgical System is presented and evaluated for elastography stream quality based on our metric. The da Vinci surgical robot is outfitted with a laparoscopic US probe, and palpation motions are autonomously generated by customized software. It is found that a stable output stream can be achieved, which is affected by both the frequency and amplitude of palpation. The GPU framework is validated using data from in-vivo pig liver ablation; the generated elastography images identify the ablated region, outlined more clearly than in the corresponding B-mode US images.

  17. Elastography Using Multi-Stream GPU: An Application to Online Tracked Ultrasound Elastography, In-Vivo and the da Vinci Surgical System

    PubMed Central

    Deshmukh, Nishikant P.; Kang, Hyun Jae; Billings, Seth D.; Taylor, Russell H.; Hager, Gregory D.; Boctor, Emad M.

    2014-01-01

    A system for real-time ultrasound (US) elastography will advance interventions for the diagnosis and treatment of cancer by advancing methods such as thermal monitoring of tissue ablation. A multi-stream graphics processing unit (GPU) based accelerated normalized cross-correlation (NCC) elastography, with a maximum frame rate of 78 frames per second, is presented in this paper. A study of NCC window size is undertaken to determine the effect on frame rate and the quality of output elastography images. This paper also presents a novel system for Online Tracked Ultrasound Elastography (O-TRuE), which extends prior work on an offline method. By tracking the US probe with an electromagnetic (EM) tracker, the system selects in-plane radio frequency (RF) data frames for generating high quality elastograms. A novel method for evaluating the quality of an elastography output stream is presented, suggesting that O-TRuE generates more stable elastograms than generated by untracked, free-hand palpation. Since EM tracking cannot be used in all systems, an integration of real-time elastography and the da Vinci Surgical System is presented and evaluated for elastography stream quality based on our metric. The da Vinci surgical robot is outfitted with a laparoscopic US probe, and palpation motions are autonomously generated by customized software. It is found that a stable output stream can be achieved, which is affected by both the frequency and amplitude of palpation. The GPU framework is validated using data from in-vivo pig liver ablation; the generated elastography images identify the ablated region, outlined more clearly than in the corresponding B-mode US images. PMID:25541954

  18. Software Accelerates Computing Time for Complex Math

    NASA Technical Reports Server (NTRS)

    2014-01-01

    Ames Research Center awarded Newark, Delaware-based EM Photonics Inc. SBIR funding to utilize graphic processing unit (GPU) technology- traditionally used for computer video games-to develop high-computing software called CULA. The software gives users the ability to run complex algorithms on personal computers with greater speed. As a result of the NASA collaboration, the number of employees at the company has increased 10 percent.

  19. Research on large area VUV-sensitive gaseous photomultipliers for cryogenic applications

    NASA Astrophysics Data System (ADS)

    Coimbra, Artur Emanuel Cardoso

    Desde cedo que a comunidade cientifica compreendeu que gases nobres em liquido sao excelentes meios de deteccao de radiacao, combinando a sua elevada densidade, elevado grau de homogeneidade e de elevado rendimento de cintilacao. Para alem destas caracteristicas inerentes, estes tem a potencialidade de fornecer ambos sinais de ionizacao - criando electroes livres - e cintilacao em resposta a interaccao com radiacao ionizante e, tendo em vista a sua aplicacao em experiencias de eventos raros relacionados com fisica de neutrinos ou materia-escura, a capacidade de autoblindagem garante a exclusao de eventos induzidos por radiacao de fundo. O facto de nao absorverem a sua propria luz, emergente dos eventos de cintilacao, permite a expansao deste tipo de detectores ate grandes volumes, sendo que as colaboracoes mais recentes propoem detectores com dezenas de toneladas de xenon em estado liquido. As experiencias actuais que usam gases nobres em estado liquido empregam xenon ou argon numa so fase (estado liquido) ou em dupla-fase (estado liquido + gasoso) e as suas aplicacoes abrangem desde as ja referidas experiencias de procura de eventos raros, passando por imagiologia medica tais como detectores de radiacao gama para PET ou câmaras Compton "3-gamma" em combinacao com PET, passando tambem por aplicacoes de seguranca como sistemas de inspeccao para deteccao de material fissil e, finalmente, em câmaras Compton para aplicacoes de astrofisica. Em ambas as configuracoes a leitura dos sinais de cintilacao e geralmente feita atraves de um grande numero de dispendiosos fotomultiplicadores de vacuo agrupados. A presente tese de doutoramento e dedicada aos fotomultiplicadores gasosos de grande area para aplicacoes criogenicas desenvolvidos no contexto do programa doutoral, tendo em vista a sua eventual aplicacao como um dispositivo complementar aos metodos existentes de deteccao de cintilacao, para aplicacao em futuras experiencias de grande escala. Esta pesquisa foi direccionada tendo em vista o desenvolvimento de eficientes fotomultiplicadores gasosos de grande area, potencialmente mais economicos por unidade de area, baseados em "Thick Gas-Electron Multipliers" (THGEMs). Combinando fotocatodos de alta eficiencia com multiplicadores gasosos de electroes capazes de atingir elevado ganho em carga obteve-se assim um dispositivo com elevada sensibilidade para a deteccao de fotoes unicos, com a possibilidade de discriminacao em posicao com resolucao espacial inferior a um milimetro e com resolucao temporal da ordem de poucos nano segundos. Contrariamente ao que sucede com a tecnologia de vacuo actualmente, com este dispositivo a localizacao em posicao de fotoes em grandes areas e feita num unico dispositivo integrando electronica habitualmente utilizada em experiencias de rastreamento de particulas. Neste trabalho o fotomultiplicador gasoso desenvolvido consiste numa cadeia de THGEMs combinados com um fotocatodo de iodeto de cesio (CsI) sensivel ao ultravioleta enquanto que os testes criogenicos foram realizados na Time Projection Chamber (TPC) de dupla fase de xenon liquido recentemente desenvolvida no Weizmann Institute of Science (WILiX). (Abstract shortened by ProQuest.).

  20. Implementation of collisions on GPU architecture in the Vorpal code

    NASA Astrophysics Data System (ADS)

    Leddy, Jarrod; Averkin, Sergey; Cowan, Ben; Sides, Scott; Werner, Greg; Cary, John

    2017-10-01

    The Vorpal code contains a variety of collision operators allowing for the simulation of plasmas containing multiple charge species interacting with neutrals, background gas, and EM fields. These existing algorithms have been improved and reimplemented to take advantage of the massive parallelization allowed by GPU architecture. The use of GPUs is most effective when algorithms are single-instruction multiple-data, so particle collisions are an ideal candidate for this parallelization technique due to their nature as a series of independent processes with the same underlying operation. This refactoring required data memory reorganization and careful consideration of device/host data allocation to minimize memory access and data communication per operation. Successful implementation has resulted in an order of magnitude increase in simulation speed for a test-case involving multiple binary collisions using the null collision method. Work supported by DARPA under contract W31P4Q-16-C-0009.

  1. VIII Olimpíada Brasileira de Astronomia e Astronáutica

    NASA Astrophysics Data System (ADS)

    Garcia Canalle, João Batista; Villas da Rocha, Jaime Fernando; Wuensche de Souza, Carlos Alexandre; Pereira Ortiz, Roberto; Aguilera, Nuricel Villalonga; Padilha, Maria De Fátima Catta Preta; Pessoa Filho, José Bezerra; Soares Rodrigues, Ivette Maria

    2007-07-01

    Neste trabalho apresentamos as motivações pelas quais organizamos, em conjunto, pela primeira vez, a Olimpíada Brasileira de Astronomia incluindo a Astronáutica, em colaboração com a Agência Espacial Brasileira. Esta ampliação contribuiu para atrair ainda mais alunos, professores, escolas e patrocinadores para participarem desta Olimpíada. Em 2005 participaram da VIII Olimpíada Brasileira de Astronomia e Astronáutica (VIII OBA) 187.726 alunos distribuídos por 3.229 escolas, pertencentes a todos os estados brasileiros, incluindo o Distrito Federal. O crescimento em número de alunos participantes foi 52,4% maior do que em 2004. Em abril de 2005 organizamos, em Itapecerica da Serra, SP, um curso para os 50 alunos previamente selecionados e participantes da VII OBA e ao final selecionamos, dentre eles, uma equipe de 5 alunos, os quais representaram o Brasil na X Olimpíada Internacional de Astronomia, na China, em outubro de 2005. Ganhamos, pela primeira vez, uma medalha de ouro naquele evento. Em Agosto de 2005, organizamos a VIII Escola de Agosto para 50 alunos e respectivos professores, em Águas de Lindóia, SP, juntamente com a XXXI reunião anual da Sociedade Astronômica Brasileira (SAB). Em novembro de 2005 realizamos a I Jornada Espacial, em São José dos Campos, com 22 alunos e 22 professores selecionados dentre os participantes que melhores resultados obtiveram nas questões de Astronáutica da VIII OBA. Neste trabalho detalhamos os resultados da VIII OBA bem como as ações subseqüentes.

  2. Scalable and Interactive Segmentation and Visualization of Neural Processes in EM Datasets

    PubMed Central

    Jeong, Won-Ki; Beyer, Johanna; Hadwiger, Markus; Vazquez, Amelio; Pfister, Hanspeter; Whitaker, Ross T.

    2011-01-01

    Recent advances in scanning technology provide high resolution EM (Electron Microscopy) datasets that allow neuroscientists to reconstruct complex neural connections in a nervous system. However, due to the enormous size and complexity of the resulting data, segmentation and visualization of neural processes in EM data is usually a difficult and very time-consuming task. In this paper, we present NeuroTrace, a novel EM volume segmentation and visualization system that consists of two parts: a semi-automatic multiphase level set segmentation with 3D tracking for reconstruction of neural processes, and a specialized volume rendering approach for visualization of EM volumes. It employs view-dependent on-demand filtering and evaluation of a local histogram edge metric, as well as on-the-fly interpolation and ray-casting of implicit surfaces for segmented neural structures. Both methods are implemented on the GPU for interactive performance. NeuroTrace is designed to be scalable to large datasets and data-parallel hardware architectures. A comparison of NeuroTrace with a commonly used manual EM segmentation tool shows that our interactive workflow is faster and easier to use for the reconstruction of complex neural processes. PMID:19834227

  3. Tribosystems based on multilayered micro/nanocrystalline CVD diamond coatings =

    NASA Astrophysics Data System (ADS)

    Shabani, Mohammadmehdi

    A combinacao das caracteristicas do diamante microcristalino (MCD) e nanocristalino (NCD), tais como elevada adesao do MCD e a baixa rugosidade superficial e baixo coeficiente de atrito do NCD, e ideal para aplicacoes tribologicas exigentes. Deste modo, o presente trabalho centrou-se no desenvolvimento de revestimentos em multicamada MCD/NCD. Filmes com dez camadas foram depositados em amostras de cerâmicos de Si3N4 pela tecnica de deposicao quimica em fase vapor assistida por filamento quente (HFCVD). A microestrutura, qualidade do diamante e adesao foram investigadas usando tecnicas como SEM, AFM, espectroscopia Raman, DRX, indentacao Brale e perfilometria otica 3D. Diversas geometrias para aplicacoes distintas foram revestidas: discos e esferas para testes tribologicos a escala laboratorial, e para testes em servico, aneis de empanques mecânicos e pastilhas de corte para torneamento. Nos ensaios tribologicos esfera-sobre-plano em movimento reciproco, sob 10-90% de humidade relativa (RH), os valores medios dos coeficientes de atrito maximo e em estado estacionario sao de 0,32 e 0,09, respetivamente. Em relacao aos coeficientes de desgaste, observou-se um valor minimo de cerca de 5,2x10-8 mm3N-1m-1 para valores intermedios de 20-25% de RH. A humidade relativa tem um forte efeito sobre o valor da carga critica que triplica a partir de 40 N a 10% RH para 120 N a 90% de RH. No intervalo de temperaturas 50-100 °C, as cargas criticas sao semelhantes as obtidas em condicoes de baixa RH ( 10-25%). A vida util das ferramentas com revestimento de dez camadas alternadas MCD/NCD e 24 mum de espessura total no torneamento de um composito de matriz metalica Al- 15 vol% Al2O3 (Al-MMC) e melhor do que a maioria das ferramentas de diamante CVD encontradas na literatura, e semelhante a maioria das ferramentas de diamante policristalino (PCD). A formacao de cratera ocorre por desgaste sucessivo das varias camadas, atrasando a delaminacao total do revestimento de diamante do substrato, ao contrario do que acontece com os revestimentos monocamada. Os aneis de empanque testados com biodiesel apresentaram coeficientes de desgaste (4,1x10-10 mm3N-1m-1) duas ordens de grandeza menores do que em ensaios esfera-sobre-plano em movimento reciproco (k = 5,0x10-8 mm3N-1m-1), mas nao foi possivel obter vedacao completa devido a sobreaquecimento do fluido. Esta condicao foi obtida com agua sob pressao, para condicoes P.V na gama 0,72-5,3 MPa.ms-1. Um coeficiente de atrito em estado estacionario de 0,04 e um valor de coeficiente de desgaste de 6,0x10-10 mm3N-1m-1, caracteristico de um regime desgaste ultra-suave, revelam o alto desempenho deste tribossistema.

  4. Visualization and Analytics Software Tools for Peregrine System |

    Science.gov Websites

    R is a language and environment for statistical computing and graphics. Go to the R web site for System Visualization and Analytics Software Tools for Peregrine System Learn about the available visualization for OpenGL-based applications. For more information, please go to the FastX page. ParaView An open

  5. Astronomia para/com crianças carentes em Limeira

    NASA Astrophysics Data System (ADS)

    Bretones, P. S.; Oliveira, V. C.

    2003-08-01

    Em 2001, o Instituto Superior de Ciências Aplicadas (ISCA Faculdades de Limeira) iniciou um projeto pelo qual o Observatório do Morro Azul empreendeu uma parceria com o Centro de Promoção Social Municipal (CEPROSOM), instituição mantida pela Prefeitura Municipal de Limeira para atender crianças e adolescentes carentes. O CEPROSOM contava com dois projetos: Projeto Centro de Convivência Infantil (CCI) e Programa Criança e Adolescente (PCA), que atendiam crianças e adolescentes em Centros Comunitários de diversas áreas da cidade. Esses projetos têm como prioridades estabelecer atividades prazerosas para as crianças no sentido de retirá-las das ruas. Assim sendo, as crianças passaram a ter mais um tipo de atividade - as visitas ao observatório. Este painel descreve as várias fases do projeto, que envolveu: reuniões de planejamento, curso de Astronomia para as orientadoras dos CCIs e PCAs, atividades relacionadas a visitas das crianças ao Observatório, proposta de construção de gnômons e relógios de Sol nos diversos Centros Comunitários de Limeira e divulgação do projeto na imprensa. O painel inclui discussões sobre a aprendizagem de crianças carentes, relatos que mostram a postura das orientadoras sobre a pertinência do ensino de Astronomia, relatos do monitor que fez o atendimento no Observatório e o que o número de crianças atendidas representou para as atividades da instituição desde o início de suas atividades e, em particular, em 2001. Os resultados são baseados na análise de relatos das orientadoras e do monitor do Observatório, registros de visitas e matérias da imprensa local. Conclui com uma avaliação do que tal projeto representou para as Instituições participantes. Para o Observatório, em particular, foi feita uma análise com relação às outras modalidades de atendimentos que envolvem alunos de escolas e público em geral. Também é abordada a questão do compromisso social do Observatório na educação do público em questão.

  6. NOAA Weather Radio

    Science.gov Websites

    recepción confiable en algunas localidades debido a bloqueos de las señales y/o distancia excesiva de la mensajes NOTA: Servicio de NWR para un condado depende de recepción confiable la señal, la cual cobertura de NWR, o cobertura parcial, serán indicados. Algunos condados o partes de condados

  7. GEONETCast Americas

    Science.gov Websites

    -channel CIIFEN Centro Internacional para la Investigación del Fenómeno de El Niño / International Conocimiento y Uso de la Biodiversidad / National Commission for Knowledge and Use of Biodiversity Sub-channel Center for the Investigation of the El Niño Phenomenon Sub-channel CMACast Regional GEOENTCast Component

  8. The William Perry Center for Hemispheric Defense Studies (The Perry Center)

    Science.gov Websites

    de usuario multilingüe para sus visitantes, y consolidará el contenido que solía extenderse a sites. * * * 21 septiembre, 2017 El Centro William J. Perry se complace en anunciar su nuevo hogar virtual. La nueva página, ubicada en [http://williamjperrycenter.org], ofrece una verdadera experiencia

  9. Prevendo a atividade solar através de redes neurais nebulosas

    NASA Astrophysics Data System (ADS)

    Martin, V. A. F.; Poppe, P. C. R.

    2003-08-01

    Atualmente, a integração de redes neurais com técnicas da Matemática Nebulosa (Fuzzy Sets), tem sido usada robustamente para fazer previsões em vários sistemas físicos. Este trabalho representa uma continuidade da contribuição apresentada anteriormente durante a XXVIIa Reunião Anual da SAB, onde exploramos a aplicação de redes neurais para previsões futuras de séries temporais. Para este, enfatizamos o uso da técnica ANFIS (Adaptative Neuro-Fuzzy Inference System), que consiste em uma rede do tipo back-propagation, onde os dados são processados em uma camada intermediária, tendo numa camada de saída, os dados numéricos. Para que a previsão seja feita com sucesso utilizando-se técnicas matemáticas adequadas, é fundamental a existência de uma série razoavelmente longa de modo que a dinâmica contida nesta possa ser melhor extraída pela rede neural. Nesse sentido, foram utilizados novamente os dados históricos das manchas do Sol (1818-2002) afim de verificar o comportamento futuro da atividade solar (Ciclos de Schawbe) a partir da técnica descrita acima. Previsões realizadas para o ciclo anterior (n.22, máximo de 158,5 em julho de 1989), bem como para o atual (n.23, máximo de 153 em setembro de 2000), apontam valores bastante coerentes com os publicados na literatura, levando em consideração, respectivamente, as barras de erros associadas: 166+/-18 e 160+/-14. Para o próximo ciclo de Schawbe (2006-2017), nossa previsão aponta o valor de 172+/-23 como máximo para o primeiro semestre de 2011 (Abril +/- 3 meses). A ANFIS acompanha de maneira satisfatória o movimento das séries estudadas durante o treinamento e durante a verificação (menor dispersão das funções de pertinência), com erro absoluto inferior a 20 por cento.

  10. NOAA Weather Radio - EAS Description

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing ±ol Condado de cobertura Listado de estación Lista de Emisora y Cobertura Acerca de NWR ESTACIONES NACIONAL Información General Información Para el consumidor receptor NWR Recepción Explicacion de NWR

  11. NOAA Weather Radio - Outage Reporting

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing ±ol Condado de cobertura Listado de estación Lista de Emisora y Cobertura Acerca de NWR ESTACIONES NACIONAL Información General Información Para el consumidor receptor NWR Recepción Explicacion de NWR

  12. NOAA Weather Radio

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing Condado de cobertura Listado de estación Lista de Emisora y Cobertura Acerca de NWR ESTACIONES NACIONAL Información General Información Para el consumidor receptor NWR Recepción Explicacion de NWR SAME

  13. NOAA Weather Radio - Reception Problems

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing ±ol Condado de cobertura Listado de estación Lista de Emisora y Cobertura Acerca de NWR ESTACIONES NACIONAL Información General Información Para el consumidor receptor NWR Recepción Explicacion de NWR

  14. Código para imageamento indireto de estrelas em sistemas binarios: simulação de variações elipsoidais e do perfil das linhas

    NASA Astrophysics Data System (ADS)

    Souza, T. R.; Baptista, R.

    2003-08-01

    As estrelas secundárias em variáveis cataclí smicas (VCs) e binárias-x de baixa massa (BXBMs) são cruciais para o entendimento da origem, evolução e comportamento destas binárias interagentes. Elas são estrelas magneticamente ativas submetidas a condições ambientais extremas [e.g., estão muito próximas de uma fonte quente e irradiante; têm rotação extremamente rápida e forma distorcida; estão perdendo massa a taxas de 10-8-10-10 M¤/ano] que contribuem para que suas propriedades sejam distintas das de estrelas de mesma massa na seqüência principal. Por outro lado, o padrão de irradiação na face da secundária fornece informação sobre a geometria das estruturas de acréscimo em torno da estrela primária. Assim, a obtenção de imagens da superfície destas estrelas é de grande interesse astrofísico. A Tomografia Roche usa as variações no perfil das linhas de emissão/absorção da estrela secundária em função da fase orbital para mapear a distribuição de brilho em sua superfície. Neste trabalho apresentamos os resultados iniciais do desenvolvimento de um programa para o mapeamento da distribuição de brilho na superfí cie das estrelas secundárias em VCs e BXBMs com técnicas de astro-tomografia. Presentemente temos em operação um código que simula as variações no perfil das linhas em conseqüência de efeito Doppler resultante da combinação de rotação e translação de uma estrela em forma de lobo de Roche em torno do centro de massa da binária, em função da distribuição de brilho na superfície desta estrela. O código igualmente produz a curva de luz resultante das variações de aspecto da estrela em função da fase orbital (variações elipsoidais).

  15. NOAA Weather Radio

    Science.gov Websites

    para un condado depende de recepción de la seña fiable, que típicamente se extiende en un radio de 40 millas del transmisor, asumiendo terreno plano. Condados sin la cobertura de NWR o cobertura obstáculos de la seña y/o la distancia excesiva del transmisor. Usted puede referirse directamente a

  16. NOAA Weather Radio

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing Search For Go NWS All NOAA Lista de Emisora y Cobertura Seleccione aquí para radiofrecuencias (La lista Transmitter Outages FAQ NWR - Special Needs ESPAÑOL Español Condado de cobertura Listado de estación Lista

  17. NOAA Weather Radio - Using NWR SAME

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing Transmitter Outages FAQ NWR - Special Needs ESPAÑOL Español Condado de cobertura Listado de estación Lista de Emisora y Cobertura Acerca de NWR ESTACIONES NACIONAL Información General Información Para el

  18. NOAA Weather Radio - EAS Event Codes

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing Transmitter Outages FAQ NWR - Special Needs ESPAÑOL Español Condado de cobertura Listado de estación Lista de Emisora y Cobertura Acerca de NWR ESTACIONES NACIONAL Información General Información Para el

  19. NOAA Weather Radio - Station Listing

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing Outages FAQ NWR - Special Needs ESPAÑOL Español Condado de cobertura Listado de estación Lista de Emisora y Cobertura Acerca de NWR ESTACIONES NACIONAL Información General Información Para el consumidor

  20. NOAA Weather Radio - NWR Voices

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing Transmitter Outages FAQ NWR - Special Needs ESPAÑOL Español Condado de cobertura Listado de estación Lista de Emisora y Cobertura Acerca de NWR ESTACIONES NACIONAL Información General Información Para el

  1. NOAA Weather Radio - General Information

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing - Special Needs ESPAÑOL Español Condado de cobertura Listado de estación Lista de Emisora y Cobertura Acerca de NWR ESTACIONES NACIONAL Información General Información Para el consumidor receptor NWR

  2. INTI

    Science.gov Websites

    Instituto Nacional de Tecnología Industrial Red nacional de innovación, soporte a la calidad y desarrollo Saltea al Contenido principal Saltea al pie de página Instituto Nacional de Tecnología Industrial tecnológico para la industria Trabajá con nosotros Ingresá tu curriculum Institucional ¿Conocés el

  3. Microstructure design of titanate-based electroceramics =

    NASA Astrophysics Data System (ADS)

    Amaral, Luis Miguel de Almeida

    Electrocerâmicos sao uma classe de materiais avancados com propriedades electricas valiosas para aplicacoes. Estas propriedades sao geralmente muito dependentes da microestrutura dos materiais. Portanto, o objectivo geral deste trabalho e investigar o desenho da resposta dielectrica de filmes espessos obtidos por Deposicao Electroforetica (EPD) e cerâmicos monoliticos, atraves do controlo da evolucao da microestrutura durante a sinterizacao de electrocerâmicos a base de titanatos. Aplicacoes sem fios na industria microelectronica e de comunicacoes, em rapido crescimento, tornaram-se um importante mercado para os fabricantes de semicondutores. Devido a constante necessidade de miniaturizacao, reducao de custos e maior funcionalidade e integracao, a tecnologia de filmes espessos esta a tornar-se uma abordagem de processamento de materiais funcionais cada vez mais importante. Uma tecnica adequada neste contexto e EPD. Os filmes espessos resultantes necessitam de um passo subsequente de sinterizacao que e afectada pelo substrato subjacente, tendo este um forte efeito sobre a evolucao da microestrutura. Relacionado com a miniaturizacao e a discriminacao do sinal, materiais dielectricos usados como componentes operando a frequencias das microondas em aplicacoes na industria microelectronica de comunicacoes devem apresentar baixas perdas dielectricas e elevadas permitividade dielectrica e estabilidade com a temperatura. Materiais do sistema BaO-Ln2O3- TiO2 (BLnT: Ln = La ou Nd), como BaLa4Ti4O15 (BLT) e Ba4.5Nd9Ti18O54 (BNT), cumprem esses requisitos e sao interessantes para aplicacoes, por exemplo, em estacoes de base para comunicacoes moveis ou em ressonadores para telefones moveis, onde a miniaturizacao dos dispositivos e muito importante. Por sua vez, o titanato de estroncio (SrTiO3, STO) e um ferroelectrico incipiente com constante dielectrica elevada e baixas perdas, que encontra aplicacao em, por exemplo, condensadores de camada interna, tirando partido de fronteiras de grao altamente resistivas. A dependencia da permitividade dielectrica do campo electrico aplicado torna este material muito interessante para aplicacoes em dispositivos de microondas sintonizaveis. Materiais a base de STO sao tambem interessantes para aplicacoes termoelectricas, que podem contribuir para a reducao da actual dependencia de combustiveis fosseis por meio da geracao de energia a partir de calor desaproveitado. No entanto, as mesmas fronteiras de grao resistivas sao um obstaculo relativamente a eficiencia do STO para aplicacoes termoelectricas. (Abstract shortened by ProQuest.).

  4. Abundâncias de oxigênio e enxofre nas estrelas de tipo solar da vizinhança solar

    NASA Astrophysics Data System (ADS)

    Requeijo, F.; Porto de Mello, G. F.

    2003-08-01

    Alguns resultados sugerem que o Sol seja 58% mais abundante em oxigênio que o meio interestelar local. Esta anomalia parece estender-se para o carbono e o criptônio. Entre as possíveis explicações deste fenômeno estão: uma supernova de tipo II que tenha enriquecido a nebulosa protosolar, tornando-a superabundante em oxigênio; um episódio de infall de material pobre em metais sobre o disco Galático, diluindo o meio interestelar local ou uma migração dinâmica do Sol de uma órbita mais interna da Galáxia para sua posição atual. A escolha entre estes cenários exige o conhecimento preciso da abundância solar em relação às anãs G da vizinhança. Neste contexto, o oxigênio e enxofre, são elementos-chave por serem ambos produzidos pelas supernovas de tipo II, devendo portanto possuir o mesmo padrão de abundância. Este projeto visa esclarecer qual a posição do Sol na distribuição local de abundâncias de enxofre e oxigênio para uma amostra de estrelas de tipo solar com idades e metalicidades bem conhecidas. Para tal, analisamos espectros de alta resolução e alta relação sinal-ruído nas regiões espectrais de ll 6300, 7774 (O) e l8695 (S). Para o enxofre encontramos que o Sol parece ser uma estrela típica dentre as da vizinhança, e que este elemento não mostra a sobreabundância para baixas metalicidades, já bem estabelecida para o oxigênio. Discutimos as abundâncias do enxofre no contexto da Evolução Química da Galáxia. Apresentamos resultados preliminares muito precisos para a linha proibida do oxigênio l6300 e comparamos estes com os obtidos para o tripleto em l7774. Quantificamos os efeitos não-ETL presentes no tripleto em função dos parâmetros atmosféricos estelares.

  5. Tão perto de casa, tão longe de nós: etnografia das novas margens no centro da urbe

    PubMed Central

    Fernandes, Por Luís

    2011-01-01

    Philippe Bourgois é, desde 2007, “Richard Perry University professor” no Departamento de Antropologia e de Medicina Familiar e de Práticas Comunitárias na Universidade da Pensilvânia. Esteve durante largos anos ligado ao Departamento de Antropologia, História e Medicina Social da Universidade da Califórnia, São Francisco. A publicação, em 1995, de In Search of Respect: Selling Crack in El Barrio1 projectaria o seu nome muito para além dos Estados Unidos: uma etnografia no coração porto-riquenho do Harlem, em torno dos actores e dos ambientes da venda de crack. Seguir-se-ia um longo trabalho de terreno em acampamentos de dependentes de heroína em São Francisco, orientando o seu trabalho para as formas mais radicais da pobreza e da marginalidade nos EUA. É deste trabalho de terreno que sai o seu último livro, Righteous Dopefiend2. Em Junho de 2007 esteve em Lisboa para participar na 3.a edição do “Ethnografeast”. Foi então que aproveitámos a oportunidade para ouvir um percurso invulgar contado pelo próprio: uma longa conversa no Hotel Zurique, cujo nome só vem ao caso por evocar o país onde passou uma parte da infância. PMID:22013286

  6. Alternative Fuels Data Center: Spanish Resources

    Science.gov Websites

    elíctricos híbridos y enchufables) Natural Gas Basics (Conceptos básicos sobre el gas natural) Plug-In ; Features Spanish Resources Contacts Spanish Resources Recursos en Espanol The following publications and de jardinería para uso comercial con tecnología avanzada y combustible alternativo) Biodiesel

  7. Biofunctionality and immunocompatibility of starch-based biomaterials

    NASA Astrophysics Data System (ADS)

    Marques, Alexandra Margarida Pinto

    A procura de novos biomateriais que desempenhem funcoes especificas sem, no entanto, desencadearem respostas negativas nos hospedeiros constitui um desafio permanente e actual nesta area. Biomateriais degradaveis foram uma das solucoes propostas e actualmente em aplicacao mas, embora possuam vantagens inegaveis, tambem apresentam alguns problemas nomeadamente no que diz respeito aos seus produtos de degradacao e respectivos efeitos negativos consequentes. Outros biomateriais, entre os quais polimeros de origem natural, foram propostos considerando que os seus produtos de degradacao poderao ser incorporados nas vias metabolicas normais evitando efeitos secundarios no hospedeiro. Ate ao momento, e apesar de todos os esforcos e do grande numero de dispositivos biomedicos desenvolvidos, o biomaterial ideal para uma aplicacao especifica ainda nao foi encontrado. Estudos com polimeros biodegradaveis a base de amido demonstraram que estes materiais possuem propriedades promissoras abrindo novas perspectivas para a sua possivel aplicacao numa variedade de aplicacoes biomedicas. Assim, de modo a demonstrar que estes materiais tem de facto potencial para serem utilizados em, por exemplo, substituicao ossea, sistemas de libertacao controlada, cimentos osseos e engenharia de tecidos, seria imperativo avaliar com maior profundidade a resposta biologica desencadeada pelos mesmos. Para tal foi delineado um plano de trabalhos com tres objectivos principais: i) avaliar a citocompatibilidade dos polimeros e compositos a base de amido com monitorizacao da citotoxicidade e analise da adesao e proliferacao celulares nas suas superficies. Foi dada particular atencao a osteoblastos considerando uma possivel aplicacao ortopedica para estes materiais; ii) estabelecer modelos in vitro para analisar e prever, tanto quanto possivel, uma situacao real de resposta inflamatoria; iii) validar os resultados in vitro com um modelo in vivo ja estabelecido em outros trabalhos de analise da resposta inflamatoria a biomateriais. Em resumo, os estudos de citocompatibilidade e imunocompatibilidade demonstraram que os polimeros e compositos a base de milho sao biomateriais promissores. Em comparacao com os biomateriais degradaveis actualmente em uso, possuem propriedades capazes de induzir um comportamento semelhante, ou mesmo melhor, em termos de citotoxicidade. Estes dados foram reconfirmados com a adesao e proliferacao de celulas do tipo osteoblastos na superficie de alguns dos materiais a base de amido, que demonstraram ser comparaveis as observadas no PLLA, evidenciando a possibilidade de usar esses materiais em aplicacoes ortopedicas. As conclusoes retiradas dos estudos in vitro e in vivo de imunocompatibilidade reforcam as observacoes das experiencias de citocompatibilidade e em conjunto, evidenciam a possibilidade de utilizacao dos biomateriais a base de amido, com fraca capacidade de desencadear uma reaccao inflamatoria, em aplicacoes biomedicas. (Abstract shortened by ProQuest.).

  8. NOAA Weather Radio - County Coverage by State

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing ±ol Condado de cobertura Listado de estación Lista de Emisora y Cobertura Acerca de NWR ESTACIONES NACIONAL Información General Información Para el consumidor receptor NWR Recepción Explicacion de NWR

  9. NOAA Weather Radio - Deaf and Hard of Hearing

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing ±ol Condado de cobertura Listado de estación Lista de Emisora y Cobertura Acerca de NWR ESTACIONES NACIONAL Información General Información Para el consumidor receptor NWR Recepción Explicacion de NWR

  10. NOAA Weather Radio

    Science.gov Websites

    Emergencia (EAS) de la Comisión Federal de Comunicaciones, Radio NOAA es una red para todo tipo de peligros . De este modo, es la fuente más comprensiva de información del tiempo y emergencias que està químicos o derramamientos de petróleo). Conocida como "La Voz del Servicio Nacional de Meteorología

  11. NOAA Weather Radio - Viewing Outages

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing Listado de estación Lista de Emisora y Cobertura Acerca de NWR ESTACIONES NACIONAL Información General Información Para el consumidor receptor NWR Recepción Explicacion de NWR SAME Programación en Español

  12. San Juan Ultra (Mooklabs)

    Science.gov Websites

    de Información Formulario para la solicitud de datos e información relevantes a las investigaciones Science Foundation under Grant No. 0948507." back up ↑ © Copyright 2018 San Juan Ultra (Mooklabs

  13. NOAA Weather Radio - Station Search

    Science.gov Websites

    Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing estación Lista de Emisora y Cobertura Acerca de NWR ESTACIONES NACIONAL Información General Información Para el consumidor receptor NWR Recepción Explicacion de NWR SAME Programación en Español NOAA

  14. NOAA Weather Radio

    Science.gov Websites

    automatizado apoyará la difusión en español. Idioma español de voz sintetizada será proporcionado para algunas oficinas donde el personal permite y los dictados de la población, la radiodifusión española Programación Español Listado de estación Explicacion de SAME Coverage Station Listing County Listing

  15. NOAA Weather Radio - Cobertura de Condado a Condado

    Science.gov Websites

    para un condado depende de recepción de la seña fiable, que típicamente se extiende en un radio de 40 millas del transmisor, asumiendo terreno plano. Condados sin la cobertura de NWR o cobertura obstáculos de la seña y/o la distancia excesiva del transmisor. Usted puede referirse directamente a

  16. Inmetro - Instituto Nacional de Metrologia, Qualidade e Tecnologia

    Science.gov Websites

    Consumidor Laboratórios Legislação Comunicação Social Programa de Monitoramento BPL Produtos analisados de selos Indique! Sugestão para o Programa de Análise de Produtos Licitações Material de Portal do Governo Brasileiro mdic Carta de Serviços Inmetro nos Estados Acre Alagoas AmapÃ

  17. A cosmologia no ensino da geografia

    NASA Astrophysics Data System (ADS)

    Santos, S. C.; Chiaradia, A. P. M.

    2003-08-01

    O principal objetivo deste trabalho é auxiliar o professor de Geografia em sala de aula no ensino de tópicos relacionados com a Cosmologia. A idéia deste trabalho surgiu quando foi constatado que o professor de Geografia tem dificuldades de ensinar este tópico. Esta constatação foi feita por uma das autoras ao lecionar este tópico no ensino fundamental e em discussões com outros professores de Geografia. Da mesma maneira que ocorria desde os tempos mais antigos, os alunos têm muito interesse em conhecer os fenômenos que ocorrem no Cosmo, porém os livros didáticos de Geografia utilizados em sala de aula não são ricos em informações sobre este assunto. Assim, o professor de Geografia tem poucas informações para discutir este assunto em sala de aula e não dá a devida importância para este tópico. Então, foi desenvolvido um material de apoio para professores de Geografia sobre a origem do Universo, sua evolução e seu possível futuro evolutivo segundo as mais recentes teorias, com base em perguntas feitas pelos alunos de ensino fundamental e as informações trazidas nos livros didáticos Não cabe a este material inovar e tão pouco trazer uma metodologia de ensino de Cosmologia. Neste material o professor de Geografia pode encontrará um banco de informações, que constitui no estabelecimento de conceitos, teorias e hipóteses, sobre a Cosmologia, em linguagem simples e de fácil entendimento. Para desenvolvê-lo, foram feitas pesquisas não exaustivas em livros e revistas científicas, compilação e discussão em forma cronológica das teorias aceitas sobre modelos cosmológicos. Portanto, este material será apresentado neste trabalho.

  18. Prefeitura Municipal de Amparo - Prefeitura Municipal de Amparo

    Science.gov Websites

    PARA O FESTIVAL DE INVERNO Veja quais foram os projetos aprovados Saiba mais GALERIA VIRTUAL GALERIA VIRTUAL Homenagem aos ex-combatentes da Força Expedicionária Brasileira Confira TRSD TRSD O formulÃ

  19. NOAA Weather Radio

    Science.gov Websites

    televisión afuera o dentro de la casa. Todos éstos pueden mejorar recepción a cualquier radio de FM , incluso NWR. Cualquier pregunta específica sobre la recepción (o falta de ella) en su sitio debe Search For Go NWS All NOAA NWR Recepción El área de la recepción nominal para un receptor de Radio

  20. Using ParaView Software on the Peregrine System | High-Performance

    Science.gov Websites

    come pre-installed on most Linux and Mac systems. On Windows the ssh and terminal functions are provided by the programs plink.exe and cmd.exe, of which only cmd.exe will come pre-installed. The ssh

  1. Alternative Fuels Data Center: Recursos en español

    Science.gov Websites

    colaboración entre el gobierno y la industria patrocinada por el Programa de Tecnologías de Vehículos del enchufables Recursos en la web Use estas recursos para obtener información sobre: Economía de combustible y huella de carbono de su vehículo Cómo aumentar la economía de combustible Importancia de la economía

  2. NOAA Weather Radio

    Science.gov Websites

    Search For Go NWS All NOAA NWR Recepción El área de la recepción nominal para un receptor de Radio cosas afectan la recepción de señas de la radio. Por ejemplo, las extensiones grandes de agua de sal de NWR. Siga leyendo si está dentro del área pero tiene dificultad con recepción. Su seña de la

  3. SMARTINIT DOWNSCALING GRAPHICS

    Science.gov Websites

    Month: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Day: 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 first prev next last SMARTINIT Verification NAM VS NEST Graphics MMB web site Model Type: PROD vs PARA DNG Plot Type: Comparison maps Difference plots Region: CONUS Nest 2.5 km South West U.S. New York, NY

  4. NCEP Air Quality Forecast(AQF) Verification. NOAA/NWS/NCEP/EMC

    Science.gov Websites

    Southwest Desert All regions PROD All regions PARA Select averaged hour: 8 hr sfc average 1 hr sfc average Select forecast four: Diurnal period 01-24 hr by day 25-48 hr by day Select statistic type: BIAS RMSE

  5. Biaxial seismic behaviour of reinforced concrete columns =

    NASA Astrophysics Data System (ADS)

    Rodrigues, Hugo Filipe Pinheiro

    A analise dos efeitos dos sismos mostra que a investigacao em engenharia sismica deve dar especial atencao a avaliacao da vulnerabilidade das construcoes existentes, frequentemente desprovidas de adequada resistencia sismica tal como acontece em edificios de betao armado (BA) de muitas cidades em paises do sul da Europa, entre os quais Portugal. Sendo os pilares elementos estruturais fundamentais na resistencia sismica dos edificios, deve ser dada especial atencao a sua resposta sob acoes ciclicas. Acresce que o sismo e um tipo de acao cujos efeitos nos edificios exige a consideracao de duas componentes horizontais, o que tem exigencias mais severas nos pilares comparativamente a acao unidirecional. Assim, esta tese centra-se na avaliacao da resposta estrutural de pilares de betao armado sujeitos a acoes ciclicas horizontais biaxiais, em tres linhas principais. Em primeiro lugar desenvolveu-se uma campanha de ensaios para o estudo do comportamento ciclico uniaxial e biaxial de pilares de betao armado com esforco axial constante. Para tal foram construidas quatro series de pilares retangulares de betao armado (24 no total) com diferentes caracteristicas geometricas e quantidades de armadura longitudinal, tendo os pilares sido ensaiados para diferentes historias de carga. Os resultados experimentais obtidos sao analisados e discutidos dando particular atencao a evolucao do dano, a degradacao de rigidez e resistencia com o aumento das exigencias de deformacao, a energia dissipada, ao amortecimento viscoso equivalente; por fim e proposto um indice de dano para pilares solicitados biaxialmente. De seguida foram aplicadas diferentes estrategias de modelacao nao-linear para a representacao do comportamento biaxial dos pilares ensaiados, considerando nao-linearidade distribuida ao longo dos elementos ou concentrada nas extremidades dos mesmos. Os resultados obtidos com as varias estrategias de modelacao demonstraram representar adequadamente a resposta em termos das curvas envolventes forca-deslocamento, mas foram encontradas algumas dificuldades na representacao da degradacao de resistencia e na evolucao da energia dissipada. Por fim, e proposto um modelo global para a representacao do comportamento nao-linear em flexao de elementos de betao armado sujeitos a acoes biaxiais ciclicas. Este modelo tem por base um modelo uniaxial conhecido, combinado com uma funcao de interacao desenvolvida com base no modelo de Bouc- Wen. Esta funcao de interacao foi calibrada com recurso a tecnicas de otimizacao e usando resultados de uma serie de analises numericas com um modelo refinado. E ainda demonstrada a capacidade do modelo simplificado em reproduzir os resultados experimentais de ensaios biaxiais de pilares.

  6. Controle orbital de satélites artificiais com propulsão e uso de gravidade lunar

    NASA Astrophysics Data System (ADS)

    Torres, K. S.; de Almeida Prado, A. F. B.

    2003-08-01

    A redução do custo de combustível de uma manobra é atualmente a grande prioridade de todos os programas espaciais existentes no mundo. As manobras assistidas pela gravidade são uma ótima forma de se contornar o problema pois proporcionam economias com vasto impacto no custo final da missão. Neste trabalho é feito um estudo particular do controle orbital de um satélite artificial da Terra usando a gravidade da Lua. O objetivo é estudar uma técnica econômica para uma mudança de plano de um satélite que está em órbita em volta da Terra. A idéia principal desta abordagem é enviar primeiramente o veículo espacial em direção à Lua usando uma manobra mono-impulsiva para que assim o campo gravitacional da Lua possa fazer a mudança de plano desejada (sem custo de combustível) e só então retornar o veículo aos valores iniciais de semi-eixo e excentricidade usando uma manobra bi-impulsiva tipo Hohmann. Para tanto, é assumido que a espaçonave inicia em uma órbita circular coplanar à órbita da lua em torno da Terra e a meta é colocá-la em uma órbita similar que difere da órbita inicial somente pela inclinação. São usadas equações analíticas baseadas na abordagem Patched Conics para se calcular a variação na velocidade, momento angular, energia e inclinação do veículo espacial que realiza esta manobra. Várias simulações são feitas para se avaliar as economias de combustível envolvidas.

  7. Determinação da massa de júpiter a partir das órbitas de seus satélites: um experimento didático

    NASA Astrophysics Data System (ADS)

    Schlickmann, M. S.; Saito, R. K.; Becker, D. A.; Rezende, M. F., Jr.; Cid Fernandes, R.

    2003-08-01

    Este trabalho apresenta o roteiro piloto de uma prática observacional em astronomia, junto com os primeiros resultados obtidos nesta fase de implementacão. O projeto, que será executado em duas etapas, visa introduzir noções de Astronomia a alunos do Ensino Médio e iniciantes nos cursos de Física. O experimento consiste em medir as órbitas dos satélites Galileanos e, a partir da análise dos dados coletados, verificar a validade da Lei das órbitas de Kepler, determinando a massa do planeta Júpiter. Em uma primeira etapa, as observações serão feitas utilizando um telescópio Meade LX200 10" e câmera CCD para obter uma seqüência de imagens do planeta, que possibilitará medir o movimento de seus satélites. A segunda etapa terá início a partir do funcionamento do telescópio em modo robótico, com a possibilidade de observações via internet por instituições de ensino. Para o desenvolvimento deste experimento foram inicialmente coletadas várias imagens de Júpiter obtidas com os instrumentos citados acima. Estas imagens serviram como base para confecção dos roteiros para a experiência no nível médio e superior. Os roteiros serão inicialmente apresentados em uma home-page. Nela também se buscará uma contextualização histórica da experiência bem como o estabelecimento de relações com professores e alunos, propostas metodológicas e a disponibilização dos programas computacionais necessários para a utilização "on-line" pelos usuários. O projeto conta com apoio da Fundação VITAE.

  8. Astronomia cultural e meio ambiente segundo uma abordagem holística

    NASA Astrophysics Data System (ADS)

    Jafelice, L. C.

    2003-08-01

    Neste trabalho ampliamos a discussão da abordagem holística para o ensino de astronomia que temos desenvolvido nos últimos anos, analisamos novos resultados e apresentamos exemplos práticos para interessados em experimentá-la. A constatação básica a orientar este enfoque é que cursos introdutórios em astronomia costumam ser excessiva e prematuramente técnicos, além de assumirem uma visão tradicional, muito estreita, do que seja educação científica, herdada da era cartesiana e positivista da ciência. Fundamentamos porque é importante que elementos de astronomia cultural ofereçam o mote e constituam o eixo norteador daqueles cursos e porque é urgente revermos a visão que temos da relação entre astronomia e meio ambiente. Um ponto central nesta abordagem é explorar formas de reativar e atualizar uma relação orgânica com o meio e excitar a consciência de nossa inevitável e profunda interdependência com ele em nível cósmico. Neste trabalho exemplificamos a possibilidade de concretização desta proposta em três diferentes situações: disciplinas de cursos de licenciatura em geografia e em física; escolas de nível fundamental; e, neste caso ainda a ser implementada, comunidades carentes. Estes casos envolvem públicos e espaços diferenciados para educação formal e não-formal. Dos casos já implementados, destacamos os resultados alcançados pelos alunos: enriquecimento cultural, aprendizagem significativa de conteúdos astronômicos tradicionais; mudanças de comportamento, incorporando contato diário com o céu; e freqüentes vivências de sentimentos empáticos que redirecionam a relação com a natureza e a consciência ecológica global. Além disto, para interessados em aplicar esta proposta, também socializamos procedimentos e cuidados para a implementação de ações alternativas consonantes com a mesma. (PPGECNM/UFRN; PRONEX/FINEP; NUPA/USP; Temáticos/FAPESP)

  9. Um estudo espectrofotométrico da variável cataclísmica V3885 Sgr

    NASA Astrophysics Data System (ADS)

    Ribeiro, F. M. A.; Diaz, M. P.

    2003-08-01

    Variáveis Cataclísmicas são sistemas binários cerrados compostos de uma anã vermelha que transfere matéria para uma anã branca, em sistemas não magnéticos ocorre a formação de um disco de acresção em torno da anã branca. V3885 Sgr é uma variável cataclísmica classificada como sendo do tipo nova-like. É apresentado um estudo espectrofotométrico de V3885 Sgr de alta resolução temporal feito na região do visível. A região observada é centrada em Ha e abrange também a linha de HeI 6678. O primeiro resultado obtido neste estudo é a determinação do período orbital a partir de medidas da velocidade radial da linha de Ha como sendo 0,20716071(22) dias, resolvendo inconsistências quanto a esse valor na literatura e definindo uma efeméride a longo prazo para o sistema. Com este período e as medidas de velocidade radial do perfil de linha de Ha foi construído um diagrama de massas, através do qual restringimos as massas das componentes estelares do sistema e limitamos a inclinação orbital do sistema. Foram construídos diagramas de Greenstein para as linhas de Ha e HeI, onde os espectros médios em cada intervalo de fase são representados lado a lado em escala de cinza, indicando a existência de uma emissão intensa proveniente da parte posterior do disco. A partir da tomografia Doppler obtivemos perfis de emissividade radial para o disco tanto para a linha de Ha como para HeI. Os resultados obtidos são comparados com os de outros sistemas estudados com a mesma técnica. Serão apresentados também resultados da tomografia de flickering para o sistema.

  10. Accelerating three-dimensional FDTD calculations on GPU clusters for electromagnetic field simulation.

    PubMed

    Nagaoka, Tomoaki; Watanabe, Soichi

    2012-01-01

    Electromagnetic simulation with anatomically realistic computational human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the computational human model, we adapt three-dimensional FDTD code to a multi-GPU cluster environment with Compute Unified Device Architecture and Message Passing Interface. Our multi-GPU cluster system consists of three nodes. The seven GPU boards (NVIDIA Tesla C2070) are mounted on each node. We examined the performance of the FDTD calculation on multi-GPU cluster environment. We confirmed that the FDTD calculation on the multi-GPU clusters is faster than that on a multi-GPU (a single workstation), and we also found that the GPU cluster system calculate faster than a vector supercomputer. In addition, our GPU cluster system allowed us to perform the large-scale FDTD calculation because were able to use GPU memory of over 100 GB.

  11. New tools for subsurface imaging of 3D seismic Node data in hydrocarbon exploration =

    NASA Astrophysics Data System (ADS)

    Benazzouz, Omar

    A aquisicao de dados sismicos de reflexao multicanal 3D/4D usando Ocean Bottom NODES de 4 componentes constitui atualmente um sector de importancia crescente no mercado da aquisicao de dados reflexao sismica marinha na industria petrolifera. Este tipo de dados permite obter imagens de sub-superficie de alta qualidade, com baixos niveis de ruido, banda larga, boa iluminacao azimutal, offsets longos, elevada resolucao e aquisicao de tanto ondas P como S. A aquisicao de dados e altamente repetitiva e portanto ideal para campanhas 4D. No entanto, existem diferencas significativas na geometria de aquisicao e amostragem do campo de ondas relativamente aos metodos convencionais com streamers rebocados a superficie, pelo que e necessario desenvolver de novas ferramentas para o processamento deste tipo de dados. Esta tese investiga tres aspectos do processamento de dados de OBSs/NODES ainda nao totalmente resolvidos de forma satisfatoria: a deriva aleatoria dos relogios internos, o posicionamento de precisao dos OBSs e a implementacao de algoritmos de migracao prestack 3D em profundidade eficientes para obtencao de imagens precisas de subsuperficie. Foram desenvolvidos novos procedimentos para resolver estas situacoes, que foram aplicados a dados sinteticos e a dados reais. Foi desenvolvido um novo metodo para deteccao e correccao de deriva aleatoria dos relogios internos, usando derivadas de ordem elevada. Foi ainda desenvolvido um novo metodo de posicionamento de precisao de OBSs usando multilateracao e foram criadas ferramentas de interpolacao/extrapolacao dos modelos de velocidades 3D de forma a cobrirem a extensao total area de aquisicao. Foram implementados algoritmos robustos de filtragem para preparar o campo de velocidades para o tracado de raios e minimizar os artefactos na migracao Krichhoff pre-stack 3D em profundidade. Os resultados obtidos mostram um melhoramento significativo em todas as situacoes analisadas. Foi desenvolvido o software necessario para o efeito e criadas solucoes computacionais eficientes. As solucoes computacionais desenvolvidas foram integradas num software standard de processamento de sismica (SPW) utilizado na industria, de forma a criar, conjuntamente com as ferramentas ja existentes, um workflow de processamento integrado para dados de OBS/NODES, desde a aquisicao e controle de qualidade a producao dos volumes sismicos migrados pre-stack em profundidade.

  12. gpuPOM: a GPU-based Princeton Ocean Model

    NASA Astrophysics Data System (ADS)

    Xu, S.; Huang, X.; Zhang, Y.; Fu, H.; Oey, L.-Y.; Xu, F.; Yang, G.

    2014-11-01

    Rapid advances in the performance of the graphics processing unit (GPU) have made the GPU a compelling solution for a series of scientific applications. However, most existing GPU acceleration works for climate models are doing partial code porting for certain hot spots, and can only achieve limited speedup for the entire model. In this work, we take the mpiPOM (a parallel version of the Princeton Ocean Model) as our starting point, design and implement a GPU-based Princeton Ocean Model. By carefully considering the architectural features of the state-of-the-art GPU devices, we rewrite the full mpiPOM model from the original Fortran version into a new Compute Unified Device Architecture C (CUDA-C) version. We take several accelerating methods to further improve the performance of gpuPOM, including optimizing memory access in a single GPU, overlapping communication and boundary operations among multiple GPUs, and overlapping input/output (I/O) between the hybrid Central Processing Unit (CPU) and the GPU. Our experimental results indicate that the performance of the gpuPOM on a workstation containing 4 GPUs is comparable to a powerful cluster with 408 CPU cores and it reduces the energy consumption by 6.8 times.

  13. Albumin and fibronectin adsorption and osteoblast adhesion on titanium oxides

    NASA Astrophysics Data System (ADS)

    Freitas, Susana Maria Ribeiro e. Sousa Mendes de

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  14. Birefringence and Bragg grating control in femtosecond laser written optical circuits

    NASA Astrophysics Data System (ADS)

    Fernandes, Luis A.

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  15. Single Point Incremental Forming and Multi-Stage Incremental Forming on Aluminium Alloy 1050

    NASA Astrophysics Data System (ADS)

    Suriyaprakan, Premika

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  16. Magnetism at the nanoscale: Nanoparticles, nanowires, nanotubes and their ordered arrays

    NASA Astrophysics Data System (ADS)

    Proenca, Mariana Jesus Paiva

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  17. Improving the characteristics of foundry alloys AlSiCuMg during manufacturing

    NASA Astrophysics Data System (ADS)

    Fragoso, Bruno Filipe Marques

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  18. Seismic assessment of reinforced concrete frame structures with a new flexibility based element

    NASA Astrophysics Data System (ADS)

    Arede, Antonio Jose Coelho Dias

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  19. Viscoelastic nanocapsules under flow in microdevices

    NASA Astrophysics Data System (ADS)

    Cordeiro, Ana Lucinda Teixeira

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  20. Stellar activity in high-precision photometric and spectroscopic transit observations

    NASA Astrophysics Data System (ADS)

    Oshagh, Mahmoudreza

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  1. Starch and polyethylene based bone-analogue composite biomaterials

    NASA Astrophysics Data System (ADS)

    Reis, Rui Luis Goncalves dos

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  2. Clinopyroxene based glasses and glass-ceramics for functional applications

    NASA Astrophysics Data System (ADS)

    Goel, Ashutosh

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  3. Ultrasonic location system =

    NASA Astrophysics Data System (ADS)

    Albuquerque, Daniel Filipe

    Esta tese apresenta um sistema de localizacao baseado exclusivamente em ultrassons, nao necessitando de recorrer a qualquer outra tecnologia. Este sistema de localizacao foi concebido para poder operar em ambientes onde qualquer outra tecnologia nao pode ser utilizada ou o seu uso esta condicionado, como sao exemplo aplicacoes subaquaticas ou ambientes hospitalares. O sistema de localizacao proposto faz uso de uma rede de farois fixos permitindo que estacoes moveis se localizem. Devido a necessidade de transmissao de dados e medicao de distancias foi desenvolvido um pulso de ultrassons robusto a ecos que permite realizar ambas as tarefas com sucesso. O sistema de localizacao permite que as estacoes moveis se localizem escutando apenas a informacao em pulsos de ultrassons enviados pelos farois usando para tal um algoritmo baseado em diferencas de tempo de chegada. Desta forma a privacidade dos utilizadores e garantida e o sistema torna-se completamente independente do numero de utilizadores. Por forma a facilitar a implementacao da rede de farois apenas sera necessario determinar manualmente a posicao de alguns dos farois, designados por farois ancora. Estes irao permitir que os restantes farois, completamente autonomos, se possam localizar atraves de um algoritmo iterativo de localizacao baseado na minimizacao de uma funcao de custo. Para que este sistema possa funcionar como previsto sera necessario que os farois possam sincronizar os seus relogios e medir a distancia entre eles. Para tal, esta tese propoe um protocolo de sincronizacao de relogio que permite tambem obter as medidas de distancia entre os farois trocando somente tres mensagens de ultrassons. Adicionalmente, o sistema de localizacao permite que farois danificados possam ser substituidos sem comprometer a operabilidade da rede reduzindo a complexidade na manutencao. Para alem do mencionado, foi igualmente implementado um simulador de ultrassons para ambientes fechados, o qual provou ser bastante preciso e uma ferramenta de elevado valor para simular o comportamento do sistema de localizacao sobre condicoes controladas.

  4. Detecção inesperada de efeitos de lentes fracas em grupos de galáxias pouco luminosos em raios-X

    NASA Astrophysics Data System (ADS)

    Carrasco, R.; Mendes de Oliveira, C.; Sodré, L., Jr.; Lima Neto, G. B.; Cypriano, E. S.; Lengruber, L. L.; Cuevas, H.; Ramirez, A.

    2003-08-01

    Obtivemos, como parte do programa de verificação científica do GMOS Sul, imagens profundas de três grupos de galáxias: G97 e G102 (z~0,4) e G124 (z = 0,17). Esses alvos foram selecionados a partir do catálogo de fontes extensas de Vikhlinin (1998), por terem luminosidades em raios X menores que 3´1043 ergs s-1, valor cerca de uma ou duas ordens de grandeza inferior ao de aglomerados de galáxias. O objetivo primário dessas observações é o estudo da evolução de galáxias em grupos. Grupos são ambientes menos densos que aglomerados, contêm a grande maioria das galáxias do Universo mas que, até o momento, foram estudados detalhadamente apenas no Universo local (z~0). Com esses dados efetuamos uma análise estatística da distorção na forma das galáxias de fundo (lentes gravitacionais fracas) como forma de inferir o conteúdo e a distribuição de massa nesses grupos apesar de que, em princípio, esse efeito não deveria ser detectado uma vez que os critérios de seleção adotados previlegiam sistemas de baixa massa. De fato, para G124 obtivemos apenas um limite superior para sua massa que é compatível com sua luminosidade em raios X. De modo contrário e surpreendente, os objetos G102 e G097, aparentam ter massas que resultariam em dispersões de velocidade maiores que 1000 km s-1, muito maiores do que se espera para grupos de galáxias. Com efeito, para G097 obtivemos, a partir de dados do satélite XMM, uma estimativa para a temperatura do gás intragrupo de kT = 2,6 keV, que é tipica de sistemas com dispersões de velocidade de ~ 600 km s-1, bem característica de grupos. Essas contradições aparentes entre lentes fracas e raios X podem ser explicadas de dois modos: i) a massa obtida por lentes estaria sobreestimada devido à superposição de estruturas massivas ao longo da linha de visada ou ii) a temperatura do gás do meio intra-grupo reflete o potencial gravitacional de estruturas menores que estariam se fundindo para formar uma estrutura maior.

  5. A emissão em 8mm e as bandas de Merrill-Sanford em estrelas carbonadas

    NASA Astrophysics Data System (ADS)

    de Mello, A. B.; Lorenz-Martins, S.

    2003-08-01

    Estrelas carbonadas possuem bandas moleculares em absorção no visível e, no infravermelho (IR) as principais características espectrais se devem a emissão de grãos. Recentemente foi detectada a presença de bandas de SiC2 (Merrill-Sanford, MS) em emissão sendo atribuída à presença de um disco rico em poeira. Neste trabalho analisamos uma amostra de 14 estrelas carbonadas, observadas no telescópio de 1.52 m do ESO em 4 regiões espectrais diferentes, a fim de detectar as bandas de MS em emissão. Nossa amostra é composta de estrelas que apresentam além da emissão em 11.3 mm, outra em 8 mm. Esta última emissão, não usual nestes objetos, tem sido atribuída ou a moléculas de C2H2, ou a um composto sólido ainda indefinido. A detecção de emissões de MS e aquelas no IR, simultaneamente, revelaria um cenário mais complexo que o habitualmente esperado para os ventos destes objetos. No entanto como primeiro resultado, verificamos que as bandas de Merrill-Sanford encontram-se em absorção, não revelando nenhuma conexão com a emissão a 8 mm. Assim, temos duas hipóteses: (a) a emissão a 8 mm se deve à molécula C2H2 ou (b) essa emissão é resultado da emissão térmica de grãos. Testamos a segunda hipótese modelando a amostra com grãos não-homogêneos de SiC e quartzo, o qual emite em aproximadamente 8mm. Este grão seria produzido em uma fase evolutiva anterior a das carbonadas (estrelas S) e por terem uma estrutura cristalina são destruídos apenas na presença de campos de radiação ultravioleta muito intensos. Os modelos para os envoltórios utilizam o método de Monte Carlo para descrever o problema do transporte da radiação. As conclusões deste trabalho são: (1) as bandas de Merrill-Sanford se encontram em absorção, sugerindo um cenário usual para os ventos das estrelas da amostra; (2) neste cenário, a emissão em 8 mm seria resultado de grãos de quartzo com mantos de SiC, indicando que o quartzo poderia sobreviver a fase evolutiva S.

  6. Método numérico das diferenças finitas no domínio do tempo aplicado a ondas Alfvén em plasma astrofísico

    NASA Astrophysics Data System (ADS)

    Dos Santos, L. C.; Kintopp, J. A.; Jatenco-Pereira, V.; Opher, R.

    2003-08-01

    Ondas Alfvén em plasma astrofísico têm sido objeto de intenso estudo nas últimas décadas pelo fato de apresentarem papel importante em muitas áreas de pesquisa na astrofísica. Particularmente são importantes no mecanismo de aquecimento da coroa solar; em ventos estelares; em jatos galácticos e extragalácticos; em discos protoestelares, etc. A formulação para diferenças finitas no domínio do tempo (FDTD), aplicada a plasma magnetizado é desenvolvida para estudo das propriedades de ondas Alfvén em três dimensões (3D-FDTD). O método é aplicado inicialmente a um plasma homogêneo e isotérmico imerso em uma região com campo magnético externo B0, que sofre uma pequena perturbação. Uma vez gerada a onda, esta perturbação é retirada e, então analisamos a evolução temporal das ondas, bem como a forma de seu amortecimento.

  7. GPU COMPUTING FOR PARTICLE TRACKING

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nishimura, Hiroshi; Song, Kai; Muriki, Krishna

    2011-03-25

    This is a feasibility study of using a modern Graphics Processing Unit (GPU) to parallelize the accelerator particle tracking code. To demonstrate the massive parallelization features provided by GPU computing, a simplified TracyGPU program is developed for dynamic aperture calculation. Performances, issues, and challenges from introducing GPU are also discussed. General purpose Computation on Graphics Processing Units (GPGPU) bring massive parallel computing capabilities to numerical calculation. However, the unique architecture of GPU requires a comprehensive understanding of the hardware and programming model to be able to well optimize existing applications. In the field of accelerator physics, the dynamic aperture calculationmore » of a storage ring, which is often the most time consuming part of the accelerator modeling and simulation, can benefit from GPU due to its embarrassingly parallel feature, which fits well with the GPU programming model. In this paper, we use the Tesla C2050 GPU which consists of 14 multi-processois (MP) with 32 cores on each MP, therefore a total of 448 cores, to host thousands ot threads dynamically. Thread is a logical execution unit of the program on GPU. In the GPU programming model, threads are grouped into a collection of blocks Within each block, multiple threads share the same code, and up to 48 KB of shared memory. Multiple thread blocks form a grid, which is executed as a GPU kernel. A simplified code that is a subset of Tracy++ [2] is developed to demonstrate the possibility of using GPU to speed up the dynamic aperture calculation by having each thread track a particle.« less

  8. High-Speed GPU-Based Fully Three-Dimensional Diffuse Optical Tomographic System

    PubMed Central

    Saikia, Manob Jyoti; Kanhirodan, Rajan; Mohan Vasu, Ram

    2014-01-01

    We have developed a graphics processor unit (GPU-) based high-speed fully 3D system for diffuse optical tomography (DOT). The reduction in execution time of 3D DOT algorithm, a severely ill-posed problem, is made possible through the use of (1) an algorithmic improvement that uses Broyden approach for updating the Jacobian matrix and thereby updating the parameter matrix and (2) the multinode multithreaded GPU and CUDA (Compute Unified Device Architecture) software architecture. Two different GPU implementations of DOT programs are developed in this study: (1) conventional C language program augmented by GPU CUDA and CULA routines (C GPU), (2) MATLAB program supported by MATLAB parallel computing toolkit for GPU (MATLAB GPU). The computation time of the algorithm on host CPU and the GPU system is presented for C and Matlab implementations. The forward computation uses finite element method (FEM) and the problem domain is discretized into 14610, 30823, and 66514 tetrahedral elements. The reconstruction time, so achieved for one iteration of the DOT reconstruction for 14610 elements, is 0.52 seconds for a C based GPU program for 2-plane measurements. The corresponding MATLAB based GPU program took 0.86 seconds. The maximum number of reconstructed frames so achieved is 2 frames per second. PMID:24891848

  9. High-Speed GPU-Based Fully Three-Dimensional Diffuse Optical Tomographic System.

    PubMed

    Saikia, Manob Jyoti; Kanhirodan, Rajan; Mohan Vasu, Ram

    2014-01-01

    We have developed a graphics processor unit (GPU-) based high-speed fully 3D system for diffuse optical tomography (DOT). The reduction in execution time of 3D DOT algorithm, a severely ill-posed problem, is made possible through the use of (1) an algorithmic improvement that uses Broyden approach for updating the Jacobian matrix and thereby updating the parameter matrix and (2) the multinode multithreaded GPU and CUDA (Compute Unified Device Architecture) software architecture. Two different GPU implementations of DOT programs are developed in this study: (1) conventional C language program augmented by GPU CUDA and CULA routines (C GPU), (2) MATLAB program supported by MATLAB parallel computing toolkit for GPU (MATLAB GPU). The computation time of the algorithm on host CPU and the GPU system is presented for C and Matlab implementations. The forward computation uses finite element method (FEM) and the problem domain is discretized into 14610, 30823, and 66514 tetrahedral elements. The reconstruction time, so achieved for one iteration of the DOT reconstruction for 14610 elements, is 0.52 seconds for a C based GPU program for 2-plane measurements. The corresponding MATLAB based GPU program took 0.86 seconds. The maximum number of reconstructed frames so achieved is 2 frames per second.

  10. Evolução química em galáxias compactas azuis (BCGs)

    NASA Astrophysics Data System (ADS)

    Lanfranchi, G. A.; Matteucci, F.

    2003-08-01

    Neste trabalho, a formação estelar e evolução quí mica em galáxias Compactas Azuis (Blue Compact Galaxies - BCGs) foram estudadas através da comparação de previsões de modelos de evolução quí mica a várias razões de abundância quí mica observadas nestas galáxias. Modelos detalhados com recentes dados de nucleossí ntese e que levam em consideração o papel desempenahdo por supernovas de ambos os tipos (II e Ia) na evolução galáctica foram desenvolvidos para as BCGs permitindo seguir a evolução de vários elementos quí micos (H, D, He, C, N, O, Mg, Si, S, Ca, e Fe). O modelo é caracterizado pelas prescrições adotadas para a formação estelar, a qual ocorre em vários surtos de atividade separados por longos perí odos quiescentes. Após ajustar os melhores modelos aos dados observacionais, as previsões destes modelos foram comparadas também a razões de abundância observadas em sistemas Damped Lyman alpha (DLAs) e a origem do N (primária ou secundária) foi discutida. Alguns dos resultados obtidos são: i) as razões de abundância observadas nas BCGs são reproduzidas por modelos com 2 a 7 surtos de formação estelar com eficiência entre n = 0.2-0.9 Gano-1; ii) os baixos valores de N/O observados nestas galáxias são um resultado natural de uma formação estelar em surtos; iii) os modelos para BCGs podem reproduzir os dados dos DLAs, iv) uma quantidade "baixa" de N primário produzido em estrelas de alta massa pode ser uma explicação para os baixos valores de [N/a] observados em DLAs.

  11. Estimativa de imagens solares soho através de redes neurais artificiais

    NASA Astrophysics Data System (ADS)

    Andrade, M. C.; Fernandes, F. C. R.; Cecatto, J. R.; Rios Neto, A.; Rosa, R. R.; Sawant, H. S.

    2003-08-01

    A Rede Neural Artificial (RNA), no âmbito da teoria computacional, constitui uma teoria emergente que, por possuir habilidade em aprender a partir de dados de entrada, encontra diferentes aplicações em diferentes áreas. Um exemplo é a utilização de RNA na caracterização de padrões associados à dinâmica de processos espaço-temporais relacionados a fenômenos físicos não-lineares. Para obter informações sobre o comportamento destes fenômenos físicos utiliza-se, em diversos casos, seqüências de imagens digitalizadas, onde a caracterização de alguns fenômenos espaço-temporais é o procedimento mais viável para descrever a dinâmica das regiões ativas do Sol. Com base em imagens observadas por telescópios a bordo de satélites, estudos de previsão de eventos solares podem ser programados, permitindo prever possíveis efeitos posteriores nas regiões mais próximas da Terra (tempestades geomagnéticas e irregularidades ionosféricas). Neste trabalho avaliamos o desempenho da RNA para estimar padrões espaço-temporais, ou seja, imagens solares em ultravioleta, obtidas através do telescópio a bordo do satélite SOHO. Os resultados mostraram que as RNA conseguem generalizar os padrões de maneira satisfatória sem perder de forma significativa os principais aspectos da configuração global da atmosfera solar, comprovando a eficácia da RNA como ferramenta para esse tipo de aplicação. Portanto, este trabalho comprova a viabilidade de uso desta ferramenta em projetos voltados ao estudo do comportamento solar, em trabalhos do grupo de Física do Meio Interplanetário (FMI) na DAS e em programas desenvolvidos pelo Núcleo de Simulação e Análise de Sistemas Complexos (NUSASC) do Laboratório Associado de Computação e Matemática Aplicada (LAC) do INPE.

  12. MIGS-GPU: Microarray Image Gridding and Segmentation on the GPU.

    PubMed

    Katsigiannis, Stamos; Zacharia, Eleni; Maroulis, Dimitris

    2017-05-01

    Complementary DNA (cDNA) microarray is a powerful tool for simultaneously studying the expression level of thousands of genes. Nevertheless, the analysis of microarray images remains an arduous and challenging task due to the poor quality of the images that often suffer from noise, artifacts, and uneven background. In this study, the MIGS-GPU [Microarray Image Gridding and Segmentation on Graphics Processing Unit (GPU)] software for gridding and segmenting microarray images is presented. MIGS-GPU's computations are performed on the GPU by means of the compute unified device architecture (CUDA) in order to achieve fast performance and increase the utilization of available system resources. Evaluation on both real and synthetic cDNA microarray images showed that MIGS-GPU provides better performance than state-of-the-art alternatives, while the proposed GPU implementation achieves significantly lower computational times compared to the respective CPU approaches. Consequently, MIGS-GPU can be an advantageous and useful tool for biomedical laboratories, offering a user-friendly interface that requires minimum input in order to run.

  13. Implementação de um algoritmo para a limpeza de mapas da RCFM

    NASA Astrophysics Data System (ADS)

    Souza, C. L.; Wuensche, C. A.

    2003-08-01

    A Radiação Cósmica de Fundo em Microondas (RCFM), descoberta por Penzias e Wilson em 1965, é uma das ferramentas mais poderosas para o estudo da cosmologia. Com a descoberta de flutuações de temperatura na RCFM, da ordem de uma parte em 105, pelo COBE (1992), uma nova era teve início. Nos últimos onze anos, diversos instrumentos fizeram novas medidas de alta precisão, refinando os resultados apresentados pelo COBE, culminando com os resultados recentes do satélite WMAP. A análise de dados da RCFM, especialmente no caso de experimentos com pequena cobertura do céu, apresenta uma série de dificuldades devido a emissões de contaminantes externos, tais como a emissão da Galáxia e de fontes pontuais, e de ruídos intrínsecos tanto ao sistema de detecção quanto à estratégia de observação do céu. Uma das soluções típicas para a filtragem de dados brutos de um experimento para medir flutuações de temperatura é aplicar um gabarito (template) e um filtro passa alta ao produzir mapas simplificados (sem considerar matrizes de correlação ou covariância). No caso de experimentos que utilizam detectores HEMT, essa combinação de filtros remove, satisfatoriamente, ruídos do tipo 1/f gerados pela instabilidade no ganho do detector acoplado ao movimento do instrumento, definido pela estratégia de observação. Entretanto, o sinal resultante medido, tanto em simulações quanto em séries temporais reais, sugere que parte do sinal cosmológico pode estar sendo removido junto com o ruído dos detectores. Este trabalho descreve as etapas para a produção de um mapa típico (simulado) e os testes preliminares de um algoritmo para remover ruídos do tipo 1/f introduzidos pela estratégia de observação sem prejudicar a qualidade do sinal cosmológico presente no mapa.

  14. Propriedades de estruturas temporais rápidas submilimétricas durante uma grande explosão solar

    NASA Astrophysics Data System (ADS)

    Raulin, J.-P.; Kaufmann, P.; Gimenez de Castro, C. G.; Pacini, A. A.; Makhmutov, V.; Levato, H.; Rovira, M.

    2003-08-01

    Apresentamos novas propriedades de variações rápidas da emissão submilimétrica durante uma das maiores explosões solares do ciclo solar 23. Os dados analisados neste estudo foram obtidos com o Telescópio Solar Submilimétrico (SST), que observa o Sol em 212 GHz e 405 GHz, e comparados com emissões em Raios-X duros e Raios-gama (fótons de energia > 10 MeV), que foram obtidas pelo experimento GRS do Yohkoh. Aplicamos diferentes metodologias para detectar e caracterizar, ao longo do evento, os pulsos submilimétricos (duração de 50-300 ms) detectados acima de uma componente mais lenta (alguns minutos). Os resultados mostram que durante a fase impulsiva, num instante próximo ao tempo do máximo do evento, houve um aumento da ocorrência de maiores e de mais rápidas estruturas temporais. Também identificamos uma boa correlação com as emissões em raios-X e raios-gama (até a faixa de energia 10-100 MeV), indicando que os pulsos rápidos submilimétricos refletiram injeções primárias de energia durante o evento.O espectro do fluxo desses pulsos é crescente com a freqüência entre 212 and 405 GHz, na maioria dos casos, ao contrário do observado para a componente gradual. As posições calculadas para as estruturas rápidas são discretas, compactas e localizadas em toda a área da região ativa, o que é previsto nos modelos de explosões solares decorrentes de instabilidades múltiplas em diferentes pequenas regiões. Por outro lado, a posição calculada para a componente lenta é estável durante a fase impulsiva. Assim, a comparação entre as características do espectro de fluxo e da localização da emissão, para os pulsos rápidos e para a componente gradual, sugere que as respectivas emissões são de natureza diferente.

  15. SU-D-BRD-03: A Gateway for GPU Computing in Cancer Radiotherapy Research

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jia, X; Folkerts, M; Shi, F

    Purpose: Graphics Processing Unit (GPU) has become increasingly important in radiotherapy. However, it is still difficult for general clinical researchers to access GPU codes developed by other researchers, and for developers to objectively benchmark their codes. Moreover, it is quite often to see repeated efforts spent on developing low-quality GPU codes. The goal of this project is to establish an infrastructure for testing GPU codes, cross comparing them, and facilitating code distributions in radiotherapy community. Methods: We developed a system called Gateway for GPU Computing in Cancer Radiotherapy Research (GCR2). A number of GPU codes developed by our group andmore » other developers can be accessed via a web interface. To use the services, researchers first upload their test data or use the standard data provided by our system. Then they can select the GPU device on which the code will be executed. Our system offers all mainstream GPU hardware for code benchmarking purpose. After the code running is complete, the system automatically summarizes and displays the computing results. We also released a SDK to allow the developers to build their own algorithm implementation and submit their binary codes to the system. The submitted code is then systematically benchmarked using a variety of GPU hardware and representative data provided by our system. The developers can also compare their codes with others and generate benchmarking reports. Results: It is found that the developed system is fully functioning. Through a user-friendly web interface, researchers are able to test various GPU codes. Developers also benefit from this platform by comprehensively benchmarking their codes on various GPU platforms and representative clinical data sets. Conclusion: We have developed an open platform allowing the clinical researchers and developers to access the GPUs and GPU codes. This development will facilitate the utilization of GPU in radiation therapy field.« less

  16. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads.

  17. GPU Implementation of High Rayleigh Number Three-Dimensional Mantle Convection

    NASA Astrophysics Data System (ADS)

    Sanchez, D. A.; Yuen, D. A.; Wright, G. B.; Barnett, G. A.

    2010-12-01

    Although we have entered the age of petascale computing, many factors are still prohibiting high-performance computing (HPC) from infiltrating all suitable scientific disciplines. For this reason and others, application of GPU to HPC is gaining traction in the scientific world. With its low price point, high performance potential, and competitive scalability, GPU has been an option well worth considering for the last few years. Moreover with the advent of NVIDIA's Fermi architecture, which brings ECC memory, better double-precision performance, and more RAM to GPU, there is a strong message of corporate support for GPU in HPC. However many doubts linger concerning the practicality of using GPU for scientific computing. In particular, GPU has a reputation for being difficult to program and suitable for only a small subset of problems. Although inroads have been made in addressing these concerns, for many scientists GPU still has hurdles to clear before becoming an acceptable choice. We explore the applicability of GPU to geophysics by implementing a three-dimensional, second-order finite-difference model of Rayleigh-Benard thermal convection on an NVIDIA GPU using C for CUDA. Our code reaches sufficient resolution, on the order of 500x500x250 evenly-spaced finite-difference gridpoints, on a single GPU. We make extensive use of highly optimized CUBLAS routines, allowing us to achieve performance on the order of O( 0.1 ) µs per timestep*gridpoint at this resolution. This performance has allowed us to study high Rayleigh number simulations, on the order of 2x10^7, on a single GPU.

  18. GPU: the biggest key processor for AI and parallel processing

    NASA Astrophysics Data System (ADS)

    Baji, Toru

    2017-07-01

    Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.

  19. A bone tissue engineering strategy based on starch scaffolds and bone marrow cells cultured in a flow perfusion bioreactor

    NASA Astrophysics Data System (ADS)

    Gomes, Maria Manuela Estima

    A Engenharia de Tecidos e uma area cientifica em continua expansao. Os desenvolvimentos conseguidos por esta area tem contribuido significativamente para diversos avancos no campo da Medicina Regenerativa. Esta ciencia interdisciplinar combina os conhecimentos de diversas outras areas, tao distintas como a Engenharia de Materiais e a Biologia, com o objectivo de desenvolver substitutos sinteticos para tecidos humanos. Para se atingir este objectivo utilizam-se, de uma forma generica, combinacoes especificas de celulas e de um material de suporte tridimensional com propriedades adequadas, gerando um material hibrido cujas caracteristicas podem ainda ser moduladas atraves do sistema de cultura usado. A presente tese e centrada no desenvolvimento de estrategias de engenharia de tecido osseo baseadas na cultura in vitro de celulas previamente "semeadas" num suporte tridimensional ("scaffold"). Esta estrategia permite que as celulas adiram ao suporte, proliferem e segreguem matriz extracelular especifica do tecido osseo, ate se obter um substituto artificial funcional com caracteristicas do tecido original, que pode finalmente ser transplantado para tratar o defeito em causa. Para que uma estrategia deste tipo seja bem sucedida, pelo menos tres componentes fundamentais devem ser cuidadosamente estudados: o material de suporte (scaffold), as celulas a usar e o sistema de cultura in vitro. Dai que os principais objectivos desta tese estejam relacionados com estes tres aspectos, nomeadamente: • Desenvolvimento de scaffolds biodegradaveis a partir de polimeros a base de amido de milho que induzam a adesao e proliferacao celular e que apresentem propriedades adequadas, tais como a porosidade e interconectividade entre poros, de forma a proporcionar um ambiente que favoreca o desenvolvimento in vitro de um material hibrido com caracteristicas similares ao osso humano. • Estudo da utilizacao de celulas da medula ossea como uma potencial fonte de celulas para engenharia do tecido osseo, uma vez que estas celulas podem ser facilmente recolhidas do proprio paciente a tratar por metodos nao-invasivos (bioppsia) e em quantidades suficientes. Alem disso, tratando-se de uma fonte de celulas autologas (obtidas do proprio paciente) permitem evitar os riscos de transmissao de doencas contagiosas e/ou de rejeicao pelo sistema imunologico. • Estudo da influencia das condicoes de cultura in vitro geradas por um bioreactor de perfusao (em comparacao com os metodos tradicionais de cultura em condicoes estaticas) no desenvolvimento dos materiais hibridos, compostos pelas celulas e scaffolds, assim como as interaccoes do ambiente proporcionado por este sistema de cultura com as diferentes estruturas/arquitecturas e porosidades dos scaffolds utilizados. Estes objectivos convergem para o objectivo geral desta tese que consistiu no desenvolvimento de uma terapia de engenharia do tecido osseo alternativa as existentes e com potencial para vir a ser posteriormente utilizada na pratica clinica. Este objectivo foi avaliado atraves do estudo da funcionalidade dos materiais hibridos obtidos em diferentes condicoes de cultura in vitro (e utilizando diferentes scaffolds), partindo do principio que o sistema de perfusao poderia eventualmente superar as limitacoes de difusao tipicas dos sistema de cultura estatica e simultaneamente proporcionar estimulos mecânicos as celulas, semelhantes aos encontrados em condicoes fisiologicas. (Abstract shortened by ProQuest.).

  20. Particulate matter and polycyclic aromatic hydrocarbons from forest fires: impacts on air quality and occupational risks assessment

    NASA Astrophysics Data System (ADS)

    Oliveira, Marta Madalena Marques de

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  1. Reciprocal interaction between human microvascular endothelial cells and mesenchymal stem cells on macroporous granules of nanostructured-hydroxyapatite agglomerates

    NASA Astrophysics Data System (ADS)

    Laranjeira, Marta de Sousa

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  2. Analysis of vegetation dynamics using time-series vegetation index data from Earth observation satellites

    NASA Astrophysics Data System (ADS)

    Rodrigues, Arlete da Silva

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  3. Biological effects of polyacrylic acid-coated and non-coated superparamagnetic iron oxide nanoparticles in in vitro and in vivo experimental models

    NASA Astrophysics Data System (ADS)

    Couto, Diana Manuel Mocho de Bastos

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  4. Impact evaluation of the large scale integration of electric vehicles in the security of supply

    NASA Astrophysics Data System (ADS)

    Bremermann, Leonardo Elizeire

    As piroxenas sao um vasto grupo de silicatos minerais encontrados em muitas rochas igneas e metamorficas. Na sua forma mais simples, estes silicatos sao constituidas por cadeias de SiO3 ligando grupos tetrahedricos de SiO4. A formula quimica geral das piroxenas e M2M1T2O6, onde M2 se refere a catioes geralmente em uma coordenacao octaedrica distorcida (Mg2+, Fe2+, Mn2+, Li+, Ca2+, Na+), M1 refere-se a catioes numa coordenacao octaedrica regular (Al3+, Fe3+, Ti4+, Cr3+, V3+, Ti3+, Zr4+, Sc3+, Zn2+, Mg2+, Fe2+, Mn2+), e T a catioes em coordenacao tetrahedrica (Si4+, Al3+, Fe3+). As piroxenas com estrutura monoclinica sao designadas de clinopiroxenes. A estabilidade das clinopyroxenes num espectro de composicoes quimicas amplo, em conjugacao com a possibilidade de ajustar as suas propriedades fisicas e quimicas e a durabilidade quimica, tem gerado um interesse mundial devido a suas aplicacoes em ciencia e tecnologia de materiais. Este trabalho trata do desenvolvimento de vidros e de vitro-cerâmicos baseadas de clinopiroxenas para aplicacoes funcionais. O estudo teve objectivos cientificos e tecnologicos; nomeadamente, adquirir conhecimentos fundamentais sobre a formacao de fases cristalinas e solucoes solidas em determinados sistemas vitro-cerâmicos, e avaliar a viabilidade de aplicacao dos novos materiais em diferentes areas tecnologicas, com especial enfase sobre a selagem em celulas de combustivel de oxido solido (SOFC). Com este intuito, prepararam-se varios vidros e materiais vitro-cerâmicos ao longo das juntas Enstatite (MgSiO3) - diopsidio (CaMgSi2O6) e diopsidio (CaMgSi2O6) - Ca - Tschermak (CaAlSi2O6), os quais foram caracterizados atraves de um vasto leque de tecnicas. Todos os vidros foram preparados por fusao-arrefecimento enquanto os vitro-cerâmicos foram obtidos quer por sinterizacao e cristalizacao de fritas, quer por nucleacao e cristalizacao de vidros monoliticos. Estudaram-se ainda os efeitos de varias substituicoes ionicas em composicoes de diopsidio contendo Al na estrutura, sinterizacao e no comportamento durante a cristalizacao de vidros e nas propriedades dos materiais vitro-cerâmicos, com relevância para a sua aplicacao como selantes em SOFC. Verificou-se que Foi observado que os vidros/vitro-cerâmicos a base de enstatite nao apresentavam as caracteristicas necessarias para serem usados como materiais selantes em SOFC, enquanto as melhores propriedades apresentadas pelos vitro-cerâmicos a base de diopsidio qualificaram-nos para futuros estudos neste tipo de aplicacoes. Para alem de investigar a adequacao dos vitro-cerâmicos a base de clinopyroxene como selantes, esta tese tem tambem como objetivo estudar a influencia dos agentes de nucleacao na nucleacao em volume dos vitro-cerâmicos resultantes a base de diopsidio, de modo a qualifica-los como potenciais materiais hopedeiros de residuos nucleares radioactivos.

  5. Observações no infravermelho médio de objetos estelares jovens em NGC 3576

    NASA Astrophysics Data System (ADS)

    Barbosa, C.; Damineli, A.; Blum, R.; Conti, P.

    2003-08-01

    Apresentamos os resultados de observações no infravermelho médio de candidatos a objetos estelares jovens e massivos em NGC 3576. As imagens de alta resolução foram obtidas no observatório Gemini Sul com o uso dos filtros em 10,8, 7,9, 9,8, 12,5 e 18,2 mm. Nossas imagens mostram a fonte IRS 1 resolvida em 4 objetos pela primeira vez em 10 mm. Para cada objeto obtivemos a distribuição espectral de energia de 1.2 até 18 mm, bem como a temperatura de cor, a distribuição espacial e a profundidade óptica em 9,8 mm da poeira circunstelar. Apresentamos uma estimativa das massas dos objetos estudados, baseados na luminosidade emitida no infravermelho médio, bem como um modelo para explicar as diferentes características observadas de cada objeto. Finalmente discutimos a possível localização da(s) fonte(s) de ionização de NGC 3576.

  6. Discos de acresção em sistemas Be-X

    NASA Astrophysics Data System (ADS)

    Lopes de Oliveira, R.; Janot-Pacheco, E.

    2003-08-01

    Alguns fenômenos de outbursts em Be-X sugerem a existência, mesmo que temporária, de um disco de acresção quando da passagem do objeto compacto pelo periastro orbital. Neste trabalho avaliamos a possibilidade de formação do disco de acresção em sistemas Be+estrela de neutrons e Be+anã branca, e a influência da excentricidade orbital na ocorrência deste fenômeno. Utilizamos a expressão analítica para o momento angular específico da matéria constituinte de um meio em expansão lenta, como é o caso do disco circunstelar das estrelas Be, proposta por Wang(1981), sob a condição básica de que o raio de circularização deva ser maior do que o raio de Alfvén. Concluímos que existe um limite para o período orbital do sistema acima do qual não é possível a formação do disco de acresção, e que este valor aumenta para sistemas com excentricidade orbital maior.

  7. Indicators to assess the quality of programs to prevent occupational risk for tuberculosis: are they feasible?

    PubMed

    Santos, Talita Raquel Dos; Padoveze, Maria Clara; Nichiata, Lúcia Yasuko Izumi; Takahashi, Renata Ferreira; Ciosak, Suely Itsuko; Gryschek, Anna Luiza de Fátima Pinho Lins

    2016-06-07

    to analyze the feasibility of quality indicators for evaluation of hospital programs for preventing occupational tuberculosis. a descriptive cross-sectional study. We tested indicators for evaluating occupational tuberculosis prevention programs in six hospitals. The criterion to define feasibility was the time spent to calculate the indicators. time spent to evaluate the indicators ranged from 2h 52min to 15h11min 24sec. The indicator for structure evaluation required less time; the longest time was spent on process indicators, including the observation of healthcare workers' practices in relation to the use of N95 masks. There was an hindrance to test one of the indicators for tuberculosis outcomes in five situations, due to the lack of use of tuberculin skin test in these facilities. The time requires to calculate indicators in regarding to the outcomes for occupational tuberculosis largely depends upon the level of organizational administrative structure for gathering data. indicators to evaluate the structure for occupational tuberculosis prevention are highly feasible. Nevertheless, the feasibility of indicators for process and outcome is limited due to relevant variations in administrative issues at healthcare facilities. analisar a viabilidade de indicadores de qualidade para avaliação de programas hospitalares de prevenção de tuberculose ocupacional. estudo descritivo transversal. Testaram-se indicadores de avaliação de programas de prevenção de tuberculose ocupacional em seis hospitais. O critério para definir a viabilidade foi o tempo necessário para aplicar os indicadores. o tempo necessário para avaliar os indicadores variou de 02'52'' até 15h11'24''. O indicador para a avaliação da estrutura demandou menor tempo; o maior tempo foi utilizado com os indicadores de processo, incluindo a observação das práticas dos trabalhadores de saúde em relação ao uso de máscaras N95. Um dos indicadores de resultados de tuberculose deixou de ser testado em cinco situações devido à falta de uso do teste tuberculínico nessas instituições. O tempo necessário para aplicar indicadores em relação aos resultados de tuberculose ocupacional depende em grande parte do nível da organização da estrutura administrativa para a coleta de dados. os indicadores de avaliação da estrutura de prevenção de tuberculose ocupacional são altamente viáveis. No entanto, a viabilidade de aplicação dos indicadores de processo e de resultado é limitada devido a variações relevantes em questões administrativas nas instituições de saúde. analizar la viabilidad de los indicadores de calidad de la evaluación de los programas hospitalarios para la prevención de la tuberculosis en el trabajo. estudio descriptivo transversal. Se probaron los indicadores dirigidos a evaluar los programas para la prevención de la tuberculosis laboral en seis hospitales. El criterio para definir la viabilidad fue el tiempo para aplicar los indicadores. el tiempo empleado para la evaluación de los los indicadores varió desde 02'52 '' hasta 15h11'24 ''. El indicador para la evaluación de la estructura requiere menos tiempo; se invirtió más tiempo en los indicadores de proceso, lo que incluye la observación de las prácticas de los empleados del cuidado de salud en relación con el uso de máscaras N95. No se pudo probar uno de los indicadores de resultados de tuberculosis en cinco situaciones debido a la falta de uso de la prueba de la tuberculina en estas centros. El tiempo necesario para aplicar los indicadores en relación con los resultados por tuberculosis laboral depende en gran medida del nivel de organización de la estructura administrativa para la recopilación de datos. los indicadores para evaluar la estructura para la prevención de la tuberculosis laboral son altamente factibles. Sin embargo, la viabilidad de aplicación de los indicadores de proceso y el resultado es limitada debido a las variaciones relevantes en cuestiones administrativas en los centros sanitarios.

  8. Characterisation of gas and particle emissions from wildfires =

    NASA Astrophysics Data System (ADS)

    Vicente, Ana Margarida Proenca

    Os incendios florestais sao uma importante fonte de emissao de compostos gasosos e de aerossois. Em Portugal, onde a maioria dos incendios ocorre no norte e centro do pais, os incendios destroem todos os anos milhares de hectares, com importantes perdas em termos economicos, de vidas humanas e qualidade ambiental. As emissoes podem alterar consideravelmente a quimica da atmosfera, degradar a qualidade do ar e alterar o clima. Contudo, a informacao sobre as carateristicas das emissoes dos incendios florestais nos paises do Mediterrâneo e limitada. Tanto a nivel nacional como internacional, existe um interesse crescente na elaboracao de inventarios de emissoes e de regulamentos sobre as emissoes de carbono para a atmosfera. Do ponto de vista atmosferico da monitorizacao atmosferica, os incendios sao considerados um desafio, dada a sua variabilidade temporal e espacial, sendo de esperar um aumento da sua frequencia, dimensao e severidade, e tambem porque as estimativas de emissoes dependem das carateristicas dos biocombustiveis e da fase de combustao. O objetivo deste estudo foi quantificar e caraterizar as emissoes de gases e aerossois de alguns dos mais representativos incendios florestais que ocorreram no centro de Portugal nos veroes de 2009 e de 2010. Efetuou-se a colheita de amostras de gases e de duas fracoes de particulas (PM2.5 e PM2.5-10) nas plumas de fumo em sacos Tedlar e em filtros de quartzo acoplados a um amostrador de elevado volume, respetivamente. Os hidrocarbonetos totais (THC) e oxidos de carbono (CO e CO2) nas amostras gasosas foram analisados em instrumentos automaticos de ionizacao de chama e detetores nao dispersivos de infravermelhos, respetivamente. Para algumas amostras, foram tambem quantificados alguns compostos de carbonilo apos reamostragem do gas dos sacos Tedlar em cartuchos de silica gel revestidos com 2,4-dinitrofenilhidrazina (DNPH), seguida de analise por cromatografia liquida de alta resolucao. Nas particulas, analisou-se o carbono orgânico e elementar (tecnica termo-optica), ioes soluveis em agua (cromatografia ionica) e elementos (espectrometria de massa com plasma acoplado por inducao ou analise instrumental por ativacao com neutroes). A especiacao orgânica foi obtida por cromatografia gasosa acoplada a espectrometria de massa apos extracao com recurso a varios solventes e separacao dos extratos orgânicos em diversas classes de diferentes polaridades atraves do fracionamento com silica gel. Tendo em conta que a estimativa das emissoes dos incendios florestais requer um conhecimento de fatores de emissao apropriados para cada biocombustivel, a base de dados abrangente obtida neste estudo e potencialmente util para atualizar os inventarios de emissoes. Tem vindo a ser observado que a fase de combustao latente sem chama, a qual pode ocorrer simultaneamente com a fase de chama e durar varias horas ou dias, pode contribuir para uma quantidade consideravel de poluentes atmosfericos, pelo que os fatores de emissao correspondentes devem ser considerados no calculo das emissoes globais de incendios florestais. Devido a falta de informacao detalhada sobre perfis quimicos de emissao, a base de dados obtida neste estudo pode tambem ser util para a aplicacao de modelos no recetor no sul da Europa. (Abstract shortened by ProQuest.).

  9. Clinical implementation of a GPU-based simplified Monte Carlo method for a treatment planning system of proton beam therapy.

    PubMed

    Kohno, R; Hotta, K; Nishioka, S; Matsubara, K; Tansho, R; Suzuki, T

    2011-11-21

    We implemented the simplified Monte Carlo (SMC) method on graphics processing unit (GPU) architecture under the computer-unified device architecture platform developed by NVIDIA. The GPU-based SMC was clinically applied for four patients with head and neck, lung, or prostate cancer. The results were compared to those obtained by a traditional CPU-based SMC with respect to the computation time and discrepancy. In the CPU- and GPU-based SMC calculations, the estimated mean statistical errors of the calculated doses in the planning target volume region were within 0.5% rms. The dose distributions calculated by the GPU- and CPU-based SMCs were similar, within statistical errors. The GPU-based SMC showed 12.30-16.00 times faster performance than the CPU-based SMC. The computation time per beam arrangement using the GPU-based SMC for the clinical cases ranged 9-67 s. The results demonstrate the successful application of the GPU-based SMC to a clinical proton treatment planning.

  10. GPU Optimizations for a Production Molecular Docking Code*

    PubMed Central

    Landaverde, Raphael; Herbordt, Martin C.

    2015-01-01

    Modeling molecular docking is critical to both understanding life processes and designing new drugs. In previous work we created the first published GPU-accelerated docking code (PIPER) which achieved a roughly 5× speed-up over a contemporaneous 4 core CPU. Advances in GPU architecture and in the CPU code, however, have since reduced this relalative performance by a factor of 10. In this paper we describe the upgrade of GPU PIPER. This required an entire rewrite, including algorithm changes and moving most remaining non-accelerated CPU code onto the GPU. The result is a 7× improvement in GPU performance and a 3.3× speedup over the CPU-only code. We find that this difference in time is almost entirely due to the difference in run times of the 3D FFT library functions on CPU (MKL) and GPU (cuFFT), respectively. The GPU code has been integrated into the ClusPro docking server which has over 4000 active users. PMID:26594667

  11. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering

    PubMed Central

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905

  12. GPU Optimizations for a Production Molecular Docking Code.

    PubMed

    Landaverde, Raphael; Herbordt, Martin C

    2014-09-01

    Modeling molecular docking is critical to both understanding life processes and designing new drugs. In previous work we created the first published GPU-accelerated docking code (PIPER) which achieved a roughly 5× speed-up over a contemporaneous 4 core CPU. Advances in GPU architecture and in the CPU code, however, have since reduced this relalative performance by a factor of 10. In this paper we describe the upgrade of GPU PIPER. This required an entire rewrite, including algorithm changes and moving most remaining non-accelerated CPU code onto the GPU. The result is a 7× improvement in GPU performance and a 3.3× speedup over the CPU-only code. We find that this difference in time is almost entirely due to the difference in run times of the 3D FFT library functions on CPU (MKL) and GPU (cuFFT), respectively. The GPU code has been integrated into the ClusPro docking server which has over 4000 active users.

  13. Employing multi-GPU power for molecular dynamics simulation: an extension of GALAMOST

    NASA Astrophysics Data System (ADS)

    Zhu, You-Liang; Pan, Deng; Li, Zhan-Wei; Liu, Hong; Qian, Hu-Jun; Zhao, Yang; Lu, Zhong-Yuan; Sun, Zhao-Yan

    2018-04-01

    We describe the algorithm of employing multi-GPU power on the basis of Message Passing Interface (MPI) domain decomposition in a molecular dynamics code, GALAMOST, which is designed for the coarse-grained simulation of soft matters. The code of multi-GPU version is developed based on our previous single-GPU version. In multi-GPU runs, one GPU takes charge of one domain and runs single-GPU code path. The communication between neighbouring domains takes a similar algorithm of CPU-based code of LAMMPS, but is optimised specifically for GPUs. We employ a memory-saving design which can enlarge maximum system size at the same device condition. An optimisation algorithm is employed to prolong the update period of neighbour list. We demonstrate good performance of multi-GPU runs on the simulation of Lennard-Jones liquid, dissipative particle dynamics liquid, polymer and nanoparticle composite, and two-patch particles on workstation. A good scaling of many nodes on cluster for two-patch particles is presented.

  14. Otimização de procedimento de manobra para indução de reentrada de um satélite retornável

    NASA Astrophysics Data System (ADS)

    Schulz, W.; Suarez, M.

    2003-08-01

    Veículos espaciais que retornam à Terra passam por regimes de velocidade e condições de vôo distintos. Estas diferenças dificultam sua concepção aerodinâmica e o planejamento de seu retorno. A partir de uma proposta de um veículo orbital retornável (satélite SARA, em desenvolvimento no IAE/CTA) para realização de experimentos científicos e tecnológicos em ambiente de baixa gravidade, surge a necessidade de realizarem-se estudos considerando-se os aspectos relativos à sua aerodinâmica. Após o lançamento, o veículo deve permanecer em órbita pelo tempo necessário para a condução de experimentos, sendo depois direcionado à Terra e recuperado em solo. A concepção aerodinâmica é de importância para o vôo em suas diversas fases e deve considerar aspectos relativos à estabilização Aerodinâmica e ao arrasto atmosférico, sendo este último de importância crucial na análise do aquecimento a ser enfrentado. A manobra de retorno inclui considerações sobre as condições atmosféricas e dinâmica de reentrada, devendo ser calculada de forma mais precisa possível. O trabalho proposto avalia estudos da dinâmica de vôo de um satélite recuperável considerando aspectos relativos à determinação orbital com GPS, técnica utilizada com sucesso na CONAE, e seu comportamento aerodinâmico em vôo balístico de retorno, com ênfase em sua fase de reentrada atmosférica. Busca-se otimizar a manobra de reentrada de tal forma que a utilização do sistema GPS garanta minimizar a área de impacto com o solo.

  15. Silver segregation in Ag/a-C nanocomposite coatings for potential application as antibacterial surfaces

    NASA Astrophysics Data System (ADS)

    Manninen, Noora Kristiina Alves de Sousa

    O desenvolvimento de superficies antibacterianas representa um desafio atual em diferentes aplicacoes industriais, nomeadamente, dispositivos medicos, embalagens alimentares, texteis e sistemas de tratamento de agua. A maioria das bacterias existe em biofilmes que aderem fortemente a diferentes tipos de superficies uma vez que esta adesao representa um mecanismo estrategico de sobrevivencia. O fenomeno da adesao e colonizacao microbiana resulta na falha de diferentes dispositivos e componentes utilizados nas aplicacoes acima mencionadas, tendo como consequencia perdas economicas elevadas e representando tambem um problema de saude publica quando se tratam de aplicacoes como dispositivos medicos ou embalagens alimentares. Neste sentido, ao longo das ultimas decadas o desenvolvimento de superficies antibacterianas tem sido considerada uma estrategia emergente no desenvolvimento de materiais mais eficientes a serem aplicados em diferentes sectores. O objetivo da presente tese consiste no desenvolvimento e caracterizacao de revestimentos nanocompositos multifuncionais baseados em revestimentos de carbono amorfo dopado com nanoparticulas de prata (Ag/a-C) para potencial aplicacao em superficies antibacterianas. A Ag e atualmente considerada como o agente bactericida mais promissor e eficiente, sendo que as nanoparticulas de prata representam o material mais comercializado na area da nanotecnologia. A estrategia de modificacao superficial com revestimentos baseados em carbono amorfo (a-C) tem-se tornado popular do ponto de vista industrial essencialmente, devido entre outras propriedades, a sua resistencia ao desgaste tribologico excecional, que permite combinar uma elevada dureza com um baixo coeficiente de atrito, elevada estabilidade quimica, resistencia a corrosao e biocompatibilidade em diferentes aplicacoes biomedicas. Na atualidade os revestimentos de a-C sao utilizados em diferentes aplicacoes industriais nomeadamente dispositivos medicos, laminas de barbear e diferentes componentes mecanicos sujeitos a elevado desgaste tribologico. Neste sentido, a combinacao das propriedades intrinsecas destes materiais pode ser considerada uma abordagem promissora para o desenvolvimento de revestimentos multifuncionais, os quais podem ser aplicados em diferentes produtos, nomeadamente, dispositivos medicos. Na presente tese os revestimentos nanocompositos de Ag/a-C sao depositados por dois metodos distintos: (i) pulverizacao catodica em magnetrao e (ii) combinacao da pulverizacao catodica em magnetrao para deposicao da camada de a-C e condensacao em atmosfera inerte para a incorporacao simultanea de nanoparticulas de Ag na matrix de carbono. Os metodos acima mencionados sao comparados em relacao a uniformidade dos revestimentos depositados, permitindo efetuar a escolha do metodo de deposicao mais eficaz (pulverizacao catodica em magnetrao). Os revestimentos nanocompositos de Ag/a-C sao caraterizados relativamente a sua estrutura, estabilidade termodinamica em condicoes ambientais e propriedades funcionais (comportamento tribologico e atividade antibacteriana). O trabalho central da tese e focado na caraterizacao de revestimentos Ag/a-C contendo 20% at. de Ag, com diferentes espessuras e diferentes estruturas em multicamada. Os resultados sugerem que os revestimentos Ag/a-C sao instaveis mesmo em condicoes ambientais, sendo observado que a Ag e forma nanofibras entre as fronteiras das colunas, as quais recobrem a superficie do revestimento poucas semanas apos a producao. O processo de formacao de nanofibras e promovido pela humidade, sendo que, as particulas crescem atraves de um processo de coalescencia. As propriedades funcionais sugerem que os revestimentos Ag/a-C sao promissores do ponto de vista de actividade antibacteriana, a qual esta relacionada com a sua ionizacao. Os testes tribologicos revelam que em ambiente nao lubrificado a presenca da Ag promove a degradacao dos revestimentos a-C, contudo, em meios biologicos que simulam o liquido sinovial, presente nas articulacoes da anca, o comportamento tribologico e semelhante aos revestimentos a-C.

  16. NMF-mGPU: non-negative matrix factorization on multi-GPU systems.

    PubMed

    Mejía-Roa, Edgardo; Tabas-Madrid, Daniel; Setoain, Javier; García, Carlos; Tirado, Francisco; Pascual-Montano, Alberto

    2015-02-13

    In the last few years, the Non-negative Matrix Factorization ( NMF ) technique has gained a great interest among the Bioinformatics community, since it is able to extract interpretable parts from high-dimensional datasets. However, the computing time required to process large data matrices may become impractical, even for a parallel application running on a multiprocessors cluster. In this paper, we present NMF-mGPU, an efficient and easy-to-use implementation of the NMF algorithm that takes advantage of the high computing performance delivered by Graphics-Processing Units ( GPUs ). Driven by the ever-growing demands from the video-games industry, graphics cards usually provided in PCs and laptops have evolved from simple graphics-drawing platforms into high-performance programmable systems that can be used as coprocessors for linear-algebra operations. However, these devices may have a limited amount of on-board memory, which is not considered by other NMF implementations on GPU. NMF-mGPU is based on CUDA ( Compute Unified Device Architecture ), the NVIDIA's framework for GPU computing. On devices with low memory available, large input matrices are blockwise transferred from the system's main memory to the GPU's memory, and processed accordingly. In addition, NMF-mGPU has been explicitly optimized for the different CUDA architectures. Finally, platforms with multiple GPUs can be synchronized through MPI ( Message Passing Interface ). In a four-GPU system, this implementation is about 120 times faster than a single conventional processor, and more than four times faster than a single GPU device (i.e., a super-linear speedup). Applications of GPUs in Bioinformatics are getting more and more attention due to their outstanding performance when compared to traditional processors. In addition, their relatively low price represents a highly cost-effective alternative to conventional clusters. In life sciences, this results in an excellent opportunity to facilitate the daily work of bioinformaticians that are trying to extract biological meaning out of hundreds of gigabytes of experimental information. NMF-mGPU can be used "out of the box" by researchers with little or no expertise in GPU programming in a variety of platforms, such as PCs, laptops, or high-end GPU clusters. NMF-mGPU is freely available at https://github.com/bioinfo-cnb/bionmf-gpu .

  17. Multi-GPU hybrid programming accelerated three-dimensional phase-field model in binary alloy

    NASA Astrophysics Data System (ADS)

    Zhu, Changsheng; Liu, Jieqiong; Zhu, Mingfang; Feng, Li

    2018-03-01

    In the process of dendritic growth simulation, the computational efficiency and the problem scales have extremely important influence on simulation efficiency of three-dimensional phase-field model. Thus, seeking for high performance calculation method to improve the computational efficiency and to expand the problem scales has a great significance to the research of microstructure of the material. A high performance calculation method based on MPI+CUDA hybrid programming model is introduced. Multi-GPU is used to implement quantitative numerical simulations of three-dimensional phase-field model in binary alloy under the condition of multi-physical processes coupling. The acceleration effect of different GPU nodes on different calculation scales is explored. On the foundation of multi-GPU calculation model that has been introduced, two optimization schemes, Non-blocking communication optimization and overlap of MPI and GPU computing optimization, are proposed. The results of two optimization schemes and basic multi-GPU model are compared. The calculation results show that the use of multi-GPU calculation model can improve the computational efficiency of three-dimensional phase-field obviously, which is 13 times to single GPU, and the problem scales have been expanded to 8193. The feasibility of two optimization schemes is shown, and the overlap of MPI and GPU computing optimization has better performance, which is 1.7 times to basic multi-GPU model, when 21 GPUs are used.

  18. Perfil dos Professores de Ciências Naturais do Recôncavo da Bahia - Alunos da Disciplina Terra e Universo no Curso de Ciências Naturais do Parfor

    NASA Astrophysics Data System (ADS)

    Lima, S. R.; Cerqueira Júnior, W.; Dutra, G.

    2011-12-01

    Este trabalho foi desenvolvido pelo projeto Astronomia no Recôncavo da Bahia, no Centro de Formação de Professores da Universidade Federal do Recôncavo da Bahia. Traçamos o perfil de um grupo de professores que lecionam conteúdos de Ciências Naturais no recôncavo, alunos do curso de Licenciatura em Ciências Naturais, oferecido dentro do Plano Nacional de Formação de Professores da Educação Básica. Nosso objetivo era avaliar se eles estão preparados para trabalhar conteúdos de Astronomia e identificar suas dificuldades. Os resultados serviram para orientar o professor da disciplina “Terra e Universo”, oferecida no segundo semestre de 2010. Durante a primeira aula da disciplina Terra e Universo, os alunos responderam a um questionário contendo questões abertas e fechadas, divididas em duas partes. A primeira procurando caracterizar profissionalmente os alunos enquanto professores da rede pública da região do Recôncavo Sul da Bahia e uma segunda parte procurando identificar conhecimentos básicos em Astronomia. Os resultados indicam uma predominância de professores do sexo feminino, com idade superior aos 40 anos, pardos e sem formação específica para o ensino de ciências. A maioria leciona há mais de 15 anos para turmas do 1º ao 5º ano, alguns lecionam para turmas de 6º ao 9º ano. Quase todos nunca participaram de um curso de formação continuada em Astronomia. Além disso, não estão habituados a ler revistas especializadas e nem livros com esta temática. Os que procuram ensinar temas voltados para a Astronomia têm, no livro didático, a maior fonte de informação sobre o assunto. As respostas também indicam uma deficiência em conteúdos básicos como a compreensão da esfericidade da Terra, noções de verticalidade e gravidade, incapacidade de identificar a Terra como um planeta, no Sistema Solar, em uma galáxia, no Universo. Estes resultados ressaltam a importância de disciplinas de Astronomia básica na formação dos professores da região.

  19. Contribution to the scintillation detection optimization in double phase detectors for direct detection of dark matter

    NASA Astrophysics Data System (ADS)

    Balan, Catalin

    Na ultima decada, foram feitos grandes progressos no desenvolvimento dos detetores de detecao direta das particulas que constituem a materia negra. Com estrategias do aumento gradual do volume do alvo e, simultaneamente, de reducao dos niveis de fundo, a experiencia XENON obteve resultados muito bons e perspetivas promissoras para a detecao de materia negra. Tarefas relativas a analise de dados experimentais adquiridos com o detetor de dupla fase em uso, assim como as simulacoes do campo eletrico, desenvolvimento, montagem e testes para o proximo detetor XENON1T, assim como a participacao regular na manutencao geral e monitorizacao do prototipo atual XENON100 no LNGS, constituiram o plano de trabalhos para as atividades de investigacao do presente doutoramento e a minha contribuicao para a otimizacao da detecao de cintilacao nos detetores da experiencia XENON. A necessidade de alcancar niveis elevados de sensibilidade, requer inovacao em todos os aspetos fisicos do detetor, assim como a reducao de todas as fontes de radioatividade que contribuem para o fundo. O modo mais indicado de operacao para os detetores com enchimento a Xe no estado liquido e gasoso envolve a medicao da cintilacao primaria e da secundaria provenientes da interacao das particulas no Xe liquido. A razao entre estes dois sinais permite diferenciar claramente a maior parte dos eventos correspondentes as fundo dos eventos correspondentes a WIMPs. Deste modo, a leitura dos sinais correspondentes a cintilacao e de extrema importancia. A amplitude do sinal de cintilacao antes dos fotossensores e maximizada atraves da otimizacao de varios parametros, tais como a geometria do alvo do detetor, a transparencia das grelhas dos eletrodos, a uniformidade do ganho em cintilacao secundaria e a utilizacao de material reflectivo para cobrir as superficies que nao sao fotossensiveis.

  20. Avaliação da influência de alterações cardíacas na ultrassonografia vascular periférica de idosos

    PubMed Central

    Ribeiro, Alcides José Araújo; Ribeiro, Andréa Campos de Oliveira; Rodrigues, Márcia Marisia Maciel; Negreiros, Sandra de Barros Cobra; Nogueira, Ana Cláudia Cavalcante; Almeida, Osório Luís Rangel; Silva, José Carlos Quináglia e; de Paula, Ana Patrícia

    2016-01-01

    Resumo Contexto As cardiopatias podem causar alterações no formato das ondas da ultrassonografia vascular (UV) em vasos periféricos. Essas alterações, tipicamente bilaterais e sistêmicas, são pouco conhecidas e estudadas. Objetivo Avaliar as ondas periféricas da UV de pacientes idosos para identificar alterações decorrentes de cardiopatias. Métodos Foram estudados 183 pacientes idosos submetidos a UV periférica no ano de 2014. Resultados Foram avaliados 102 mulheres (55,7%) e 81 homens (44,3%) com idade entre 60 e 91 anos (média de 70,4±7,2 anos). Encontraram-se alterações pela UV em 84 pacientes (45,9%). Foram identificadas 138 alterações de oito dos 13 tipos descritos na literatura: arritmia, onda bisferiens de pico sistólico, baixa velocidade de pico sistólico, pulsatilidade em veias femorais, bradicardia, taquicardia, onda de pulso parvus tardus e onda de pulso alternans. Houve baixa concordância entre a presença e a não presença de alterações na UV e na avaliação cardiológica. Na análise específica das alterações, os exames tiveram uma concordância variável, que foi boa para o achado de taquicardia, moderada para arritmia e baixa para bradicardia. Não houve concordância entre a UV e os exames cardiológicos para as demais alterações. Conclusões É possível identificar determinadas alterações cardíacas em idosos por meio da análise do formato das ondas periféricas da UV. É importante reconhecer e relatar a presença dessas alterações, pela possibilidade de alertar para um diagnóstico ainda não identificado nesses pacientes. Entretanto, mais estudos são necessários para que seja definida a importância das alterações no formato das ondas Doppler periféricas no reconhecimento de cardiopatias. PMID:29930591

  1. Qualitative Description of Global Health Nursing Competencies by Nursing Faculty in Africa and the Americas.

    PubMed

    Wilson, Lynda; Moran, Laura; Zarate, Rosa; Warren, Nicole; Ventura, Carla Aparecida Arena; Tamí-Maury, Irene; Mendes, Isabel Amélia Costa

    2016-06-07

    to analyze qualitative comments from four surveys asking nursing faculty to rate the importance of 30 global health competencies for undergraduate nursing programs. qualitative descriptive study that included 591 individuals who responded to the survey in English (49 from Africa and 542 from the Americas), 163 who responded to the survey in Spanish (all from Latin America), and 222 Brazilian faculty who responded to the survey in Portuguese. Qualitative comments were recorded at the end of the surveys by 175 respondents to the English survey, 75 to the Spanish survey, and 70 to the Portuguese survey. Qualitative description and a committee approach guided data analysis. ten new categories of global health competencies emerged from the analysis. Faculty also demonstrated concern about how and when these competencies could be integrated into nursing curricula. the additional categories should be considered for addition to the previously identified global health competencies. These, in addition to the guidance about integration into existing curricula, can be used to guide refinement of the original list of global health competencies. Further research is needed to seek consensus about these competencies and to develop recommendations and standards to guide nursing curriculum development. analisar os dados qualitativos obtidos em quatro surveys realizados com docentes de enfermagem que avaliaram a importância de 30 competências em saúde global para cursos de graduação em enfermagem. pesquisa qualitativa-descritiva com 591 indivíduos que responderam ao survey em inglês (49 da África e 542 das Américas), 163 que responderam ao survey em espanhol (todos da América Latina), e 222 docentes brasileiros que responderam ao survey em português. Os comentários qualitativos foram registrados ao final dos surveys por 175 respondentes na língua inglesa, 75 na espanhola e 70 na portuguesa. A análise dos dados foi dirigida por uma descrição qualitativa e desenvolvido por um comitê. a análise revelou dez novas categorias de competências em saúde global. Os docentes também se mostraram preocupados com a forma e o momento de integrar essas competências nos currículos de enfermagem. as categorias adicionais devem ser consideradas para inclusão nas competências em saúde global identificadas anteriormente. Essas, além das orientações para fins de integração nos currículos existentes, podem ser usadas para direcionar o refinamento da lista original de competências em saúde global. São necessárias outras investigações em busca de consenso sobre essas competências para formulação de recomendações e padrões que orientem o desenvolvimento dos currículos de enfermagem. analizar los comentarios cualitativos de cuatro encuestas entre docentes de enfermería, a los que se solicitò evaluar la importancia de 30 competencias de salud global para cursos de pregrado en enfermería. investigación cualitativa-descriptiva con 591 individuos que contestaron la encuesta en inglés (49 de África y 542 de las Américas), 163 que contestaron la encuesta en español (todos de América Latina), y 222 docentes brasileños que contestaron la encuesta en portugués. Los comentarios cualitativos fueron registrados al final de las encuestas por 175 respondientes en inglés, 75 en español y 70 en portugués. El análisis de los datos consistió en una descripción cualitativa y aproximación a través de un comité. el análisis reveló diez nuevas categorías de competencias de salud global. Los docentes también se mostraron preocupados con la forma y el momento de integrar esas competencias en los currículos de enfermería. las categorías adicionales deben ser consideradas para inclusión en las competencias de salud global identificadas anteriormente. Esas, además de las orientaciones para integración en los currículos existentes, pueden ser usadas para dirigir la lista original de competencias en salud global. Son necesarias otras investigaciones en búsqueda de consenso sobre estas competencias y para desarrollar recomendaciones y normas que guíen los currículos de enfermería.

  2. Estudo de soluções locais e cosmológicas em teorias do tipo tensor-escalar

    NASA Astrophysics Data System (ADS)

    Silva E Costa, S.

    2003-08-01

    Teorias do tipo tensor-escalar são a mais simples extensão possí vel da Relatividade Geral. Nessas teorias, cujo modelo padrão é a teoria de Brans-Dicke, a curvatura do espaço-tempo, descrita por componentes tensoriais, aparece acoplada a um campo escalar que, de certo modo, representa uma variação na constante de acoplamento da gravitação. Tais teorias apresentam soluções locais e cosmológicas que, em determinados limites, recaem nas apresentadas pela Relatividade Geral, mas que em outros limites trazem novidades, tais como conseqüências observacionais da evolução de flutuações primordiais distintas daquelas previstas pela Relatividade Geral (ver, por ex., Nagata et al., PRD 66, p. 103510 (2002)). Graças a esta possibilidade de trazer à luz novidades em relação à gravitação, teorias do tipo tensor-escalar podem ser vistas como um interessante campo alternativo de pesquisas para soluções dos problemas de massa faltante (ou escura) e/ou energia escura. Seguindo tal linha, este trabalho, ainda em sua fase inicial, apresenta soluções gerais de teorias do tipo tensor-escalar para diversas situações, verificando-se em que consiste a divergência dessas soluções dos casos tradicionais possí veis na Relatividade Geral. Como exemplos das soluções aqui apresentadas pode-se destacar uma expressão geral para diferentes soluções cosmológicas englobando diferentes tipos de matéria (representados por diferentes equações de estado), e a expressão para uma solução local representando um buraco negro com rotação, similar à solução de Kerr da Relatividade Geral. Por fim, é importante ressaltar que, embora aqui apresentem-se poucos resultados novos, na literatura sobre o assunto a maior parte das soluções apresentadas limita-se a uns poucos casos especí ficos, tal como soluções cosmológicas apenas com curvatura nula, e que mesmo as soluções disponí veis são, em geral, pouco divulgadas e, portanto, pouco conhecidas, e é tal situação que este trabalho busca, em parte, reverter.

  3. Parallel computing in experimental mechanics and optical measurement: A review (II)

    NASA Astrophysics Data System (ADS)

    Wang, Tianyi; Kemao, Qian

    2018-05-01

    With advantages such as non-destructiveness, high sensitivity and high accuracy, optical techniques have successfully integrated into various important physical quantities in experimental mechanics (EM) and optical measurement (OM). However, in pursuit of higher image resolutions for higher accuracy, the computation burden of optical techniques has become much heavier. Therefore, in recent years, heterogeneous platforms composing of hardware such as CPUs and GPUs, have been widely employed to accelerate these techniques due to their cost-effectiveness, short development cycle, easy portability, and high scalability. In this paper, we analyze various works by first illustrating their different architectures, followed by introducing their various parallel patterns for high speed computation. Next, we review the effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging. In our survey, we have found that high parallelism can always be exploited in such applications for the development of high-performance systems.

  4. A distribuição de velocidades na linha de visada em galáxias barradas vistas de face

    NASA Astrophysics Data System (ADS)

    Gadotti, D. A.; de Souza, R. E.

    2003-08-01

    Com o objetivo de realizar um estudo cinemático da componente vertical de barras em galáxias, obtivemos espectros de fenda longa de alta razão S/N ao longo dos eixos maior e menor de 14 galáxias barradas vistas de face, nos telescópios de 1.52m do ESO em La Silla, Chile, e de 2.3m do Steward Observatory em Kitt Peak, Arizona. Estes dados nos permitiram determinar a distribuição de velocidades das estrelas ao longo do eixo vertical das barras e discos destes sistemas, tanto no centro como em pontos que distam cerca de 5 e 20 segundos de arco do núcleo, correspondendo a distâncias de cerca de 0.7 e 2.8 kpc, respectivamente. Desta forma, a variação radial da distribuição de velocidades também pôde ser avaliada. Este tipo de análise tem raros exemplos na literatura por ser caro em termos de tempo de telescópio. Entretanto, é de fácil justificativa, considerando que traz novas informações que podem ser utilizadas para aperfeiçoar modelos teóricos acerca da formação e evolução de galáxias. Um algoritmo por nós desenvolvido foi utilizado para obter as distribuições de velocidades como Gaussianas generalizadas (polinômios de Gauss-Hermite), o que traz um ingrediente a mais neste tipo de estudo que, tradicionalmente, se utiliza de Gaussianas puras, uma hipótese nem sempre razoável. Apresentaremos os resultados deste trabalho, que incluem um diagnóstico para a identificação de barras recém formadas, e testes para o modelo isotérmico de discos. Mostraremos que: (i) a escolha das estrelas padrão em velocidade, e dos parâmetros da Gaussiana, deve ser muito bem justificada já que tem influência significativa nos resultados; (ii) muitas galáxias apresentam uma depressão na dispersão de velocidades na região central, que pode estar associada a um disco interno; e (iii) a dispersão de velocidades é constante ao longo da barra, nos eixos maior e menor, mas cai substancialmente quando se passa da barra para o disco.

  5. GPU Accelerated Chemical Similarity Calculation for Compound Library Comparison

    PubMed Central

    Ma, Chao; Wang, Lirong; Xie, Xiang-Qun

    2012-01-01

    Chemical similarity calculation plays an important role in compound library design, virtual screening, and “lead” optimization. In this manuscript, we present a novel GPU-accelerated algorithm for all-vs-all Tanimoto matrix calculation and nearest neighbor search. By taking advantage of multi-core GPU architecture and CUDA parallel programming technology, the algorithm is up to 39 times superior to the existing commercial software that runs on CPUs. Because of the utilization of intrinsic GPU instructions, this approach is nearly 10 times faster than existing GPU-accelerated sparse vector algorithm, when Unity fingerprints are used for Tanimoto calculation. The GPU program that implements this new method takes about 20 minutes to complete the calculation of Tanimoto coefficients between 32M PubChem compounds and 10K Active Probes compounds, i.e., 324G Tanimoto coefficients, on a 128-CUDA-core GPU. PMID:21692447

  6. Improving the performance of heterogeneous multi-core processors by modifying the cache coherence protocol

    NASA Astrophysics Data System (ADS)

    Fang, Juan; Hao, Xiaoting; Fan, Qingwen; Chang, Zeqing; Song, Shuying

    2017-05-01

    In the Heterogeneous multi-core architecture, CPU and GPU processor are integrated on the same chip, which poses a new challenge to the last-level cache management. In this architecture, the CPU application and the GPU application execute concurrently, accessing the last-level cache. CPU and GPU have different memory access characteristics, so that they have differences in the sensitivity of last-level cache (LLC) capacity. For many CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can tolerate increase in memory access latency when there is sufficient thread-level parallelism. Taking into account the GPU program memory latency tolerance characteristics, this paper presents a method that let GPU applications can access to memory directly, leaving lots of LLC space for CPU applications, in improving the performance of CPU applications and does not affect the performance of GPU applications. When the CPU application is cache sensitive, and the GPU application is insensitive to the cache, the overall performance of the system is improved significantly.

  7. Parallelization and checkpointing of GPU applications through program transformation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solano-Quinde, Lizandro Damian

    2012-01-01

    GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solvemore » the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and to develop support for application-level fault tolerance in applications using multiple GPUs. Our techniques reduce the burden of enhancing single-GPU applications to support these features. To achieve our goal, this work designs and implements a framework for enhancing a single-GPU OpenCL application through application transformation.« less

  8. Fotometria de grupos compactos de galáxias no infravermelho próximo

    NASA Astrophysics Data System (ADS)

    Brasileiro, F.; Mendes de Oliveira, C.

    2003-08-01

    Apresentamos medidas nas bandas J, H e K de cerca de 90 galáxias em 34 grupos compactos. Através da combinação dos novos dados, com dados obtidos na literatura para a banda B, investigamos como as luminosidades, cores, tamanhos e massas das galáxias em grupos compactos foram afetadas por processos dinâmicos, e como essas diferem de galáxias em ambientes menos densos. Uma comparação dos novos valores obtidos com aqueles listados no catálogo 2MASS, mostram que para 50 galáxias estudadas em comum, as diferenças nas magnitudes J, H e K estão dentro dos erros fotométricos. Através da construção dos diagramas de cor (J-H x H-K e B-H x J-K), percebemos que as galáxias em grupos compactos ocupam posições no diagrama diferentes das posições de galáxias em campo ou em aglomerados, sendo mais parecidas com as posições ocupadas por galáxias HII, ou com excesso de poeira, acreditamos que tal deslocamento é derivado do aumento da taxa de formação estelar.

  9. Alterações Induzidas Pelo Exercício no Número, Função e Morfologia de Monócitos de Ratos

    PubMed Central

    GUERESCHI, MARCIA G.; PRESTES, JONATO; DONATTO, FELIPE F.; DIAS, RODRIGO; FROLLINI, ANELENA B.; FERREIRA, CLÍLTON KO.; CAVAGLIERI, CLAUDIA R.; PALANCH, ADRIANNE C.

    2008-01-01

    O propósito desse estudo foi verificar as alterações histofisiológicas em monócitos e macrófagos induzidas por curtos períodos de exercícios. Ratos Wistar (idade = 2 meses, peso corporal = 200g) foram divididos em sete grupos (n=6 cada): controle sedentário (C), grupos exercitados (natação) na intensidade leve por 5 (5L), 10 (10L) e 15 minutos (15L), e grupos exercitados em intensidade moderada por 5 (5M), 10 (10M) e 15 minutes (15M). Na intensidade moderada os animais carregaram uma carga de 5% do peso corporal dos mesmos em seus respectivos dorsos. Os monócitos sangüíneos foram avaliados quanto à quantidade e morfologia e os macrófagos peritoneais foram analisados quanto à quantidade e atividade fagocitária. Os dados foram analisados usando ANOVA e Tukey’s post hoc test (p ≤ 0,05). Os grupos de intensidade leve e 5M apresentaram aumento nos níveis dos monócitos quando comparados com o controle. Foi observado aumento na área celular dos monócitos para os grupos 5L, 10L, 5M e 10M; a área nuclear aumentou para os grupos 10L, 5M e 10M em comparação com o controle. Houve aumento nos macrófagos peritoneais para os grupos 15L, 10M, 15M e diminuição no grupo 5M. A capacidade fagocitária dos macrófagos aumentou nos grupos de intensidade leve e para o grupo 10M. O exercício realizado por curtos períodos modulou o número e função dos macrófagos, assim como o número e morfologia dos monócitos, sendo tais alterações dependentes da intensidade. A soma das respostas agudas observadas nesse estudo pode exercer um efeito protetor contra doenças, podendo ser utilizada para a melhora da saúde e qualidade de vida.

  10. SU-E-T-493: Accelerated Monte Carlo Methods for Photon Dosimetry Using a Dual-GPU System and CUDA.

    PubMed

    Liu, T; Ding, A; Xu, X

    2012-06-01

    To develop a Graphics Processing Unit (GPU) based Monte Carlo (MC) code that accelerates dose calculations on a dual-GPU system. We simulated a clinical case of prostate cancer treatment. A voxelized abdomen phantom derived from 120 CT slices was used containing 218×126×60 voxels, and a GE LightSpeed 16-MDCT scanner was modeled. A CPU version of the MC code was first developed in C++ and tested on Intel Xeon X5660 2.8GHz CPU, then it was translated into GPU version using CUDA C 4.1 and run on a dual Tesla m 2 090 GPU system. The code was featured with automatic assignment of simulation task to multiple GPUs, as well as accurate calculation of energy- and material- dependent cross-sections. Double-precision floating point format was used for accuracy. Doses to the rectum, prostate, bladder and femoral heads were calculated. When running on a single GPU, the MC GPU code was found to be ×19 times faster than the CPU code and ×42 times faster than MCNPX. These speedup factors were doubled on the dual-GPU system. The dose Result was benchmarked against MCNPX and a maximum difference of 1% was observed when the relative error is kept below 0.1%. A GPU-based MC code was developed for dose calculations using detailed patient and CT scanner models. Efficiency and accuracy were both guaranteed in this code. Scalability of the code was confirmed on the dual-GPU system. © 2012 American Association of Physicists in Medicine.

  11. Exploration of priority actions for strengthening the role of nurses in achieving universal health coverage.

    PubMed

    Maaitah, Rowaida Al; AbuAlRub, Raeda Fawzi

    2017-01-30

    to explore priority actions for strengthening the role of Advanced Practice Nurses (APNs) towards the achievement of Universal Health Converge (UHC) as perceived by health key informants in Jordan. an exploratory qualitative design, using a semi-structured survey, was utilized. A purposive sample of seventeen key informants from various nursing and health care sectors was recruited for the purpose of the study. Content analysis utilizing the five-stage framework approach was used for data analysis. the findings revealed that policy and regulation, nursing education, research, and workforce were identified as the main elements that influence the role of APNs in contributing to the achievement of UHC. Priority actions were identified by the participants for the main four elements. study findings confirm the need to strengthen the role of APNs to achieve UHC through a major transformation in nursing education, practice, research, leadership, and regulatory system. Nurses should unite to come up with solid nursing competencies related to APNs, PHC, UHC, leadership and policy making to strengthen their position as main actors in influencing the health care system and evidence creation. analisar as ações prioritárias para o fortalecimento do papel da enfermeira em prática avançada na Cobertura Universal de Saúde , segundo a percepção dos informantes-chave na Jordânia. foi utilizado desenho qualitativo exploratório, com um questionário semiestruturado. A amostra intencional de dezessete informantes-chave de vários setores de enfermagem e de saúde foi recrutado para o propósito do estudo. A análise de conteúdo utilizando a abordagem do quadro de cinco estágios foi utilizada para a análise de dados. os resultados revelaram que as políticas e regulações, educação em enfermagem, pesquisa e força de trabalho foram identificados como os principais elementos que influenciam o papel da enfermeira em prática avançada em contribuir para a realização da Cobertura Universal de Saúde. As ações prioritárias foram identificadas pelos participantes para os quatro principais elementos. os resultados do estudo confirmam a necessidade de reforçar o papel da enfermeira em prática avançada para alcançar Cobertura Universal de Saúde através de uma grande transformação no ensino de enfermagem, prática, pesquisa, liderança e sistema regulatório. A Enfermagem deve unir-se para obter competências consistentes relacionadas com a enfermeira em prática avançada, atenção primaria de saúde, Cobertura Universal em Saúde, liderança e elaboração de políticas para reforçar sua posição como atores principais que influenciam o sistema de saúde e a geração de evidências. explorar las acciones prioritarias para el fortalecimiento del papel de las enfermeras con práctica avanzada para el logro de la Cobertura Universal de Salud según la percepción de informantes clave de la salud en Jordania. se utilizó diseño cualitativo exploratorio, utilizando una encuesta semi-estructurada. Una muestra intencional de diecisiete informantes clave de diversos sectores de enfermería y cuidados de la salud fue reclutada para el propósito de este estudio. Se utilizó análisis de contenido usando el método del marco de cinco etapas para el análisis de datos. los resultados revelaron que la política y la regulación, la formación de enfermería, la investigación y la fuerza laboral fueron identificados como los principales elementos que influyen en el papel de la enfermeras con práctica avanzada para contribuir a la consecución de la Cobertura Universal de Salud. Las acciones prioritarias fueron identificadas por los participantes para los cuatro elementos principales. los hallazgos del estudio confirman la necesidad de fortalecer el papel de la enfermeras con práctica avanzada para lograr la Cobertura Universal de Salud a través de una transformación importante en la formación, la práctica, la investigación, el liderazgo, y el sistema de regulación. Las enfermeras deben unirse para alcanzar competencias consistentes de enfermería relacionados con la enfermeras con práctica avanzada, la Atención Primaria de Salud , Cobertura Universal de Salud, liderazgo y formulación de políticas, para fortalecer su posición como actores principales que influyen en el sistema de salud y la generación de evidencias.

  12. Accelerated rescaling of single Monte Carlo simulation runs with the Graphics Processing Unit (GPU).

    PubMed

    Yang, Owen; Choi, Bernard

    2013-01-01

    To interpret fiber-based and camera-based measurements of remitted light from biological tissues, researchers typically use analytical models, such as the diffusion approximation to light transport theory, or stochastic models, such as Monte Carlo modeling. To achieve rapid (ideally real-time) measurement of tissue optical properties, especially in clinical situations, there is a critical need to accelerate Monte Carlo simulation runs. In this manuscript, we report on our approach using the Graphics Processing Unit (GPU) to accelerate rescaling of single Monte Carlo runs to calculate rapidly diffuse reflectance values for different sets of tissue optical properties. We selected MATLAB to enable non-specialists in C and CUDA-based programming to use the generated open-source code. We developed a software package with four abstraction layers. To calculate a set of diffuse reflectance values from a simulated tissue with homogeneous optical properties, our rescaling GPU-based approach achieves a reduction in computation time of several orders of magnitude as compared to other GPU-based approaches. Specifically, our GPU-based approach generated a diffuse reflectance value in 0.08ms. The transfer time from CPU to GPU memory currently is a limiting factor with GPU-based calculations. However, for calculation of multiple diffuse reflectance values, our GPU-based approach still can lead to processing that is ~3400 times faster than other GPU-based approaches.

  13. Acceleration for 2D time-domain elastic full waveform inversion using a single GPU card

    NASA Astrophysics Data System (ADS)

    Jiang, Jinpeng; Zhu, Peimin

    2018-05-01

    Full waveform inversion (FWI) is a challenging procedure due to the high computational cost related to the modeling, especially for the elastic case. The graphics processing unit (GPU) has become a popular device for the high-performance computing (HPC). To reduce the long computation time, we design and implement the GPU-based 2D elastic FWI (EFWI) in time domain using a single GPU card. We parallelize the forward modeling and gradient calculations using the CUDA programming language. To overcome the limitation of relatively small global memory on GPU, the boundary saving strategy is exploited to reconstruct the forward wavefield. Moreover, the L-BFGS optimization method used in the inversion increases the convergence of the misfit function. A multiscale inversion strategy is performed in the workflow to obtain the accurate inversion results. In our tests, the GPU-based implementations using a single GPU device achieve >15 times speedup in forward modeling, and about 12 times speedup in gradient calculation, compared with the eight-core CPU implementations optimized by OpenMP. The test results from the GPU implementations are verified to have enough accuracy by comparing the results obtained from the CPU implementations.

  14. Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations

    PubMed Central

    Hallock, Michael J.; Stone, John E.; Roberts, Elijah; Fry, Corey; Luthey-Schulten, Zaida

    2014-01-01

    Simulation of in vivo cellular processes with the reaction-diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel e ciency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli. Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems. PMID:24882911

  15. Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations.

    PubMed

    Hallock, Michael J; Stone, John E; Roberts, Elijah; Fry, Corey; Luthey-Schulten, Zaida

    2014-05-01

    Simulation of in vivo cellular processes with the reaction-diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel e ciency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli . Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems.

  16. Impact of memory bottleneck on the performance of graphics processing units

    NASA Astrophysics Data System (ADS)

    Son, Dong Oh; Choi, Hong Jun; Kim, Jong Myon; Kim, Cheol Hong

    2015-12-01

    Recent graphics processing units (GPUs) can process general-purpose applications as well as graphics applications with the help of various user-friendly application programming interfaces (APIs) supported by GPU vendors. Unfortunately, utilizing the hardware resource in the GPU efficiently is a challenging problem, since the GPU architecture is totally different to the traditional CPU architecture. To solve this problem, many studies have focused on the techniques for improving the system performance using GPUs. In this work, we analyze the GPU performance varying GPU parameters such as the number of cores and clock frequency. According to our simulations, the GPU performance can be improved by 125.8% and 16.2% on average as the number of cores and clock frequency increase, respectively. However, the performance is saturated when memory bottleneck problems incur due to huge data requests to the memory. The performance of GPUs can be improved as the memory bottleneck is reduced by changing GPU parameters dynamically.

  17. GPU-Accelerated Forward and Back-Projections with Spatially Varying Kernels for 3D DIRECT TOF PET Reconstruction.

    PubMed

    Ha, S; Matej, S; Ispiryan, M; Mueller, K

    2013-02-01

    We describe a GPU-accelerated framework that efficiently models spatially (shift) variant system response kernels and performs forward- and back-projection operations with these kernels for the DIRECT (Direct Image Reconstruction for TOF) iterative reconstruction approach. Inherent challenges arise from the poor memory cache performance at non-axis aligned TOF directions. Focusing on the GPU memory access patterns, we utilize different kinds of GPU memory according to these patterns in order to maximize the memory cache performance. We also exploit the GPU instruction-level parallelism to efficiently hide long latencies from the memory operations. Our experiments indicate that our GPU implementation of the projection operators has slightly faster or approximately comparable time performance than FFT-based approaches using state-of-the-art FFTW routines. However, most importantly, our GPU framework can also efficiently handle any generic system response kernels, such as spatially symmetric and shift-variant as well as spatially asymmetric and shift-variant, both of which an FFT-based approach cannot cope with.

  18. GPU-accelerated phase-field simulation of dendritic solidification in a binary alloy

    NASA Astrophysics Data System (ADS)

    Yamanaka, Akinori; Aoki, Takayuki; Ogawa, Satoi; Takaki, Tomohiro

    2011-03-01

    The phase-field simulation for dendritic solidification of a binary alloy has been accelerated by using a graphic processing unit (GPU). To perform the phase-field simulation of the alloy solidification on GPU, a program code was developed with computer unified device architecture (CUDA). In this paper, the implementation technique of the phase-field model on GPU is presented. Also, we evaluated the acceleration performance of the three-dimensional solidification simulation by using a single NVIDIA TESLA C1060 GPU and the developed program code. The results showed that the GPU calculation for 5763 computational grids achieved the performance of 170 GFLOPS by utilizing the shared memory as a software-managed cache. Furthermore, it can be demonstrated that the computation with the GPU is 100 times faster than that with a single CPU core. From the obtained results, we confirmed the feasibility of realizing a real-time full three-dimensional phase-field simulation of microstructure evolution on a personal desktop computer.

  19. GPU-Accelerated Forward and Back-Projections With Spatially Varying Kernels for 3D DIRECT TOF PET Reconstruction

    NASA Astrophysics Data System (ADS)

    Ha, S.; Matej, S.; Ispiryan, M.; Mueller, K.

    2013-02-01

    We describe a GPU-accelerated framework that efficiently models spatially (shift) variant system response kernels and performs forward- and back-projection operations with these kernels for the DIRECT (Direct Image Reconstruction for TOF) iterative reconstruction approach. Inherent challenges arise from the poor memory cache performance at non-axis aligned TOF directions. Focusing on the GPU memory access patterns, we utilize different kinds of GPU memory according to these patterns in order to maximize the memory cache performance. We also exploit the GPU instruction-level parallelism to efficiently hide long latencies from the memory operations. Our experiments indicate that our GPU implementation of the projection operators has slightly faster or approximately comparable time performance than FFT-based approaches using state-of-the-art FFTW routines. However, most importantly, our GPU framework can also efficiently handle any generic system response kernels, such as spatially symmetric and shift-variant as well as spatially asymmetric and shift-variant, both of which an FFT-based approach cannot cope with.

  20. On the effective implementation of a boundary element code on graphics processing units unsing an out-of-core LU algorithm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    D'Azevedo, Ed F; Nintcheu Fata, Sylvain

    2012-01-01

    A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from \\url{http://www.intetec.org}, has been adapted to run on an Nvidia Tesla general purpose graphics processing unit (GPU). Global matrix assembly and LU factorization of the resulting dense matrix were performed on the GPU. Out-of-core techniques were used to solve problems larger than available GPU memory. The code achieved over eight times speedup in matrix assembly and about 56~Gflops/sec in the LU factorization using only 512~Mbytes of GPU memory. Details of the GPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance ofmore » the GPU code.« less

  1. Combating the Reliability Challenge of GPU Register File at Low Supply Voltage

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tan, Jingweijia; Song, Shuaiwen; Yan, Kaige

    Supply voltage reduction is an effective approach to significantly reduce GPU energy consumption. As the largest on-chip storage structure, the GPU register file becomes the reliability hotspot that prevents further supply voltage reduction below the safe limit (Vmin) due to process variation effects. This work addresses the reliability challenge of the GPU register file at low supply voltages, which is an essential first step for aggressive supply voltage reduction of the entire GPU chip. We propose GR-Guard, an architectural solution that leverages long register dead time to enable reliable operations from unreliable register file at low voltages.

  2. Campo de velocidade peculiar na teoria linear

    NASA Astrophysics Data System (ADS)

    Pires, N.

    2003-08-01

    Aglomerados e superaglomerados de galáxias são responsáveis pela chamada velocidade peculiar (movimentos relativos à expansão pura do universo) das galáxias. A amplitude destas perturbações depende da densidade de matéria do universo e do contraste de densidade no interior do volume onde está localizada a galáxia. Em 1980, Peebles introduziu o fator "f", que relaciona a amplitude das perturbações da velocidade com o campo gravitacional peculiar, no contexto da teoria linear. No presente trabalho obtemos uma solução geral analítica para o fator "f" de Peebles do campo de velocidades peculiares, em termos de funções hipergeométricas, válida para qualquer geometria do universo. Como um teste de nossa solução, os resultados encontrados originalmente por Peebles em 1980 e os resultados mais gerais encontrados por O. Lahav e colaboradores em 1991, são reobtidos.

  3. Projeto educação em ciências com observatórios virtuais: a participação da Escola Moppe no período 2000-2003

    NASA Astrophysics Data System (ADS)

    Wuensche, C. A.; Gavioli, E.; Oliveira, A. L. P. R. S.; da Silva, C.; Cardoso, H. P.; Estácio, S.

    2003-08-01

    O projeto Educação em Ciências com Observatórios Virtuais foi concebido pelo Instituto Astronômico e Geofísico da USP, agregando diversas instituições de ensino e pesquisa no país para desenvolver competências diversas na educação fundamental, média e superior utilizando a astronomia como ferramenta multidisciplinar. Este trabalho descreve a participação da MOPPE, escola-piloto do INPE no projeto, no período de 2000-2003. Serão apresentadas 1) a criação de um clube de ciências (1999 a 2001) cujo tema foi a colonização de Marte e 2) a ementa de astronomia trabalhada com as 7a. e 8a. séries do ensino fundamental. A proposta do projeto Colonizando Marte foi estudar diversos aspectos de uma missão interplanetária e construir experiementos que permitissem quantificar esses aspectos. Os resultados obtidos incluiram apresentações nas SBPC Jovem em 2000 e 2001. Discutiremos também as ementas de astronomia trabalhadas desde 2001 e o envolvimento dos alunos com atividades ligadas a astronomia, fora da aula de ciências. A inclusão de astronomia no currículo das últimas séries motivou a participação de mais alunos culminando com a conquista de duas medalhas para alunos da 7a. série na Olimpiada Brasileira de Astronomia em 2002. Houve também um aumento no número de participantes na OBA 2003 e nos projetos de astronomia mais elaborados nas Feira de Ciências de 2001 e 2002. Destacamos em 2003 a inclusão da MOPPE no projeto TIE - Telescopes in Education - da NASA, que utiliza o telescópio do Observatório de Mount Wilson (EUA) para observações remotas em projetos pedagógicos para o ensino de astronomia.

  4. Life-cycle optimization model for distributed generation in buildings

    NASA Astrophysics Data System (ADS)

    Safaei, Amir

    O setor da construcao e responsavel por uma grande parte do consumo de energia e emissoes na Uniao Europeia. A Geracao Distribuida (GD) de energia, nomeadamente atraves de sistemas de cogeracao e tecnologias solares, representa um papel importante no futuro energetico deste setor. A otimizacao do funcionamento dos sistemas de cogeracao e uma tarefa complexa, devido as diversas variaveis em jogo, designadamente: os diferentes tipos de necessidades energeticas (eletricidade, aquecimento e arrefecimento), os precos dinamicos dos combustiveis (gas natural) e da eletricidade, e os custos fixos e variaveis dos diferentes sistemas de GD. Tal torna-se mais complexo considerando a natureza flutuante das tecnologias solares termicas e fotovoltaicas. Ao mesmo tempo, a liberalizacao do mercado da eletricidade permite exportar para a rede, a electricidade gerada localmente. Adicionalmente, a operacao estrategica de um sistema de GD deve atender aos quadros politicos nacionais, se tiver como objetivo beneficiar de tais regimes. Alem disso, considerando os elevados impactes ambientais do setor da construcao, qualquer avaliacao energetica de edificios rigorosa deve tambem integrar aspetos ambientais, utilizando uma abordagem de Ciclo de Vida (CV). Uma avaliacao de Ciclo de Vida (ACV) completa de um sistema de GD deve incluir as fases relativas a operacao e construcao do sistema, bem como os impactes associados a producao dos combustiveis. Foram analisadas as emissoes da producao de GN, as quais variam de acordo com a origem, tipo (convencional ou nao-convencional), e estado (na forma de GN Liquefeito (GNL) ou gas). Do mesmo modo, o impacte dos sistemas solares e afetado pela meteorologia e radiacao solar, de acordo com a sua localizacao geografica. Sendo assim, uma avaliacao adequada dos sistemas de GD exige um modelo de ACV adequado a localizacao geografica (Portugal), integrando tambem a producao de combustivel (GN), tendo em conta as suas diferentes fontes de abastecimento. O principal objetivo desta tese de doutoramento foi desenvolver um modelo para otimizar o desenho e operacao de sistemas de GD para o setor da construcao de edificios comerciais em Portugal, considerando os respetivos Impactes de Ciclo de Vida (IAVC) e Custos de Ciclo de Vida (CCV), de modo a satisfazer as necessidades energeticas do edificio. Tres tipos de tecnologias de cogeracao (Micro-Turbinas, Motores de combustao interna, e Celulas combustiveis de Oxido solido), e dois tipos de tecnologias de energia solar, solar termica e fotovoltaica, constituem os sistemas de GD que sao acoplados aos sistemas convencionais. Foi desenvolvido um modelo de CV, tendo em conta todos os impactes relacionados com a construcao e operacao dos sistemas de energia, bem como os processos a montante relacionados com a producao do GN. Em particular, o mix de GN consumido em Portugal em 2011 foi identificado (60% da Nigeria, 40% da Argelia) e os impactes relativos a cada uma das vias de abastecimento foram avaliados separadamente para quatro categorias de impacte ambiental: Consumo de Energia Primaria (CEP), Gases com Efeito de Estufa (GEE), acidificacao, e eutrofizacao. Devido a importancia das emissoes de GEE na formulacao de politicas, foi tambem realizada uma analise de incerteza as emissoes de GEE do GN fornecido a Portugal. Foi desenvolvido um modelo matematico, em linguagem de Programacao General Algebraic Modeling System (GAMS), que utiliza os resultados da ACV dos sistemas de energia e as suas implicacoes economicas para minimizar o CCV e IACV ao longo de um horizonte de planeamento definido pelo decisor. Foram derivadas fronteiras otimas de Pareto, representando as relacoes entre o tipo de IACV (CEP, GEE, acidificacao, eutrofizacao) e CCV decorrentes da satisfacao das necessidades energeticas do edificio. Para aumentar a robustez do modelo, dada a incerteza dos precos dos combustiveis (GN e eletricidade), foi desenvolvido um modelo de custos robusto para os sistemas de GD, que e menos afetado por perturbacoes relativas aos custos de combustivel. A aplicacao do modelo proposto foi testada num caso de estudo real, um edificio comercial localizado na cidade de Coimbra, em Portugal.

  5. The Performance Improvement of the Lagrangian Particle Dispersion Model (LPDM) Using Graphics Processing Unit (GPU) Computing

    DTIC Science & Technology

    2017-08-01

    access to the GPU for general purpose processing .5 CUDA is designed to work easily with multiple programming languages , including Fortran. CUDA is a...Using Graphics Processing Unit (GPU) Computing by Leelinda P Dawson Approved for public release; distribution unlimited...The Performance Improvement of the Lagrangian Particle Dispersion Model (LPDM) Using Graphics Processing Unit (GPU) Computing by Leelinda

  6. SU-E-T-395: Multi-GPU-Based VMAT Treatment Plan Optimization Using a Column-Generation Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tian, Z; Shi, F; Jia, X

    Purpose: GPU has been employed to speed up VMAT optimizations from hours to minutes. However, its limited memory capacity makes it difficult to handle cases with a huge dose-deposition-coefficient (DDC) matrix, e.g. those with a large target size, multiple arcs, small beam angle intervals and/or small beamlet size. We propose multi-GPU-based VMAT optimization to solve this memory issue to make GPU-based VMAT more practical for clinical use. Methods: Our column-generation-based method generates apertures sequentially by iteratively searching for an optimal feasible aperture (referred as pricing problem, PP) and optimizing aperture intensities (referred as master problem, MP). The PP requires accessmore » to the large DDC matrix, which is implemented on a multi-GPU system. Each GPU stores a DDC sub-matrix corresponding to one fraction of beam angles and is only responsible for calculation related to those angles. Broadcast and parallel reduction schemes are adopted for inter-GPU data transfer. MP is a relatively small-scale problem and is implemented on one GPU. One headand- neck cancer case was used for test. Three different strategies for VMAT optimization on single GPU were also implemented for comparison: (S1) truncating DDC matrix to ignore its small value entries for optimization; (S2) transferring DDC matrix part by part to GPU during optimizations whenever needed; (S3) moving DDC matrix related calculation onto CPU. Results: Our multi-GPU-based implementation reaches a good plan within 1 minute. Although S1 was 10 seconds faster than our method, the obtained plan quality is worse. Both S2 and S3 handle the full DDC matrix and hence yield the same plan as in our method. However, the computation time is longer, namely 4 minutes and 30 minutes, respectively. Conclusion: Our multi-GPU-based VMAT optimization can effectively solve the limited memory issue with good plan quality and high efficiency, making GPUbased ultra-fast VMAT planning practical for real clinical use.« less

  7. Validation of GPU based TomoTherapy dose calculation engine.

    PubMed

    Chen, Quan; Lu, Weiguo; Chen, Yu; Chen, Mingli; Henderson, Douglas; Sterpin, Edmond

    2012-04-01

    The graphic processing unit (GPU) based TomoTherapy convolution/superposition(C/S) dose engine (GPU dose engine) achieves a dramatic performance improvement over the traditional CPU-cluster based TomoTherapy dose engine (CPU dose engine). Besides the architecture difference between the GPU and CPU, there are several algorithm changes from the CPU dose engine to the GPU dose engine. These changes made the GPU dose slightly different from the CPU-cluster dose. In order for the commercial release of the GPU dose engine, its accuracy has to be validated. Thirty eight TomoTherapy phantom plans and 19 patient plans were calculated with both dose engines to evaluate the equivalency between the two dose engines. Gamma indices (Γ) were used for the equivalency evaluation. The GPU dose was further verified with the absolute point dose measurement with ion chamber and film measurements for phantom plans. Monte Carlo calculation was used as a reference for both dose engines in the accuracy evaluation in heterogeneous phantom and actual patients. The GPU dose engine showed excellent agreement with the current CPU dose engine. The majority of cases had over 99.99% of voxels with Γ(1%, 1 mm) < 1. The worst case observed in the phantom had 0.22% voxels violating the criterion. In patient cases, the worst percentage of voxels violating the criterion was 0.57%. For absolute point dose verification, all cases agreed with measurement to within ±3% with average error magnitude within 1%. All cases passed the acceptance criterion that more than 95% of the pixels have Γ(3%, 3 mm) < 1 in film measurement, and the average passing pixel percentage is 98.5%-99%. The GPU dose engine also showed similar degree of accuracy in heterogeneous media as the current TomoTherapy dose engine. It is verified and validated that the ultrafast TomoTherapy GPU dose engine can safely replace the existing TomoTherapy cluster based dose engine without degradation in dose accuracy.

  8. SU-E-J-91: FFT Based Medical Image Registration Using a Graphics Processing Unit (GPU).

    PubMed

    Luce, J; Hoggarth, M; Lin, J; Block, A; Roeske, J

    2012-06-01

    To evaluate the efficiency gains obtained from using a Graphics Processing Unit (GPU) to perform a Fourier Transform (FT) based image registration. Fourier-based image registration involves obtaining the FT of the component images, and analyzing them in Fourier space to determine the translations and rotations of one image set relative to another. An important property of FT registration is that by enlarging the images (adding additional pixels), one can obtain translations and rotations with sub-pixel resolution. The expense, however, is an increased computational time. GPUs may decrease the computational time associated with FT image registration by taking advantage of their parallel architecture to perform matrix computations much more efficiently than a Central Processor Unit (CPU). In order to evaluate the computational gains produced by a GPU, images with known translational shifts were utilized. A program was written in the Interactive Data Language (IDL; Exelis, Boulder, CO) to performCPU-based calculations. Subsequently, the program was modified using GPU bindings (Tech-X, Boulder, CO) to perform GPU-based computation on the same system. Multiple image sizes were used, ranging from 256×256 to 2304×2304. The time required to complete the full algorithm by the CPU and GPU were benchmarked and the speed increase was defined as the ratio of the CPU-to-GPU computational time. The ratio of the CPU-to- GPU time was greater than 1.0 for all images, which indicates the GPU is performing the algorithm faster than the CPU. The smallest improvement, a 1.21 ratio, was found with the smallest image size of 256×256, and the largest speedup, a 4.25 ratio, was observed with the largest image size of 2304×2304. GPU programming resulted in a significant decrease in computational time associated with a FT image registration algorithm. The inclusion of the GPU may provide near real-time, sub-pixel registration capability. © 2012 American Association of Physicists in Medicine.

  9. Falls in long-term care institutions for elderly people: protocol validation.

    PubMed

    Baixinho, Cristina Rosa Soares Lavareda; Dixe, Maria Dos Anjos Coelho Rodrigues; Henriques, Maria Adriana Pereira

    2017-01-01

    To validate the content of a fall management risk protocol in long-term institutions for elderly people. Methodological, quanti-qualitative study using the Delphi technique. The tool, based on the literature, was sent electronically to obtain consensus among the 14 experts that meet the defined inclusion criteria. The 27 indicators of the protocol are organized in three dimensions: prepare for the institutionalization (IRA=.88); manage the risk of falls throughout the institutionalization (IRA=.9); and lead the communication and formation (IRA=1), with a CVI=.91. Two rounds were performed to get a consensus superior to 80% in every item. The values obtained in the reliability test (>0.8) show that the protocol can be used to meet the intended goal. The next step is the clinic validation of the protocol with residents of long-term care institutions for elderly people. Validar o conteúdo de um protocolo para a gestão do risco de queda em Instituições de Longa Permanência para Idosos. Estudo metodológico, de abordagem quantiqualitativa, utilizando a técnica de Delphi. O instrumento, construído com base na literatura, foi enviado por via electrónica, para obter consenso entre os 14 peritos que respeitam os critérios de inclusão definidos. Os 27 indicadores do protocolo estão organizados em três dimensões: Preparar a Institucionalização (IRA=,88); Gerir o Risco de Queda ao longo da Institucionalização (IRA=,9) e Liderar a comunicação e formação (IRA=1), com um CVI=,91. Foram efetuadas duas rodadas para se obter consenso superior a 80% em todos os itens. Os valores obtidos no teste de fidedignidade (>0,8) atestam que o protocolo pode ser utilizado para atingir o fim que se pretende. A próxima etapa é a validação clínica do protocolo com idosos residentes em Instituições de Longa Permanência para Idosos.

  10. Next-generation acceleration and code optimization for light transport in turbid media using GPUs

    PubMed Central

    Alerstam, Erik; Lo, William Chun Yip; Han, Tianyi David; Rose, Jonathan; Andersson-Engels, Stefan; Lilge, Lothar

    2010-01-01

    A highly optimized Monte Carlo (MC) code package for simulating light transport is developed on the latest graphics processing unit (GPU) built for general-purpose computing from NVIDIA - the Fermi GPU. In biomedical optics, the MC method is the gold standard approach for simulating light transport in biological tissue, both due to its accuracy and its flexibility in modelling realistic, heterogeneous tissue geometry in 3-D. However, the widespread use of MC simulations in inverse problems, such as treatment planning for PDT, is limited by their long computation time. Despite its parallel nature, optimizing MC code on the GPU has been shown to be a challenge, particularly when the sharing of simulation result matrices among many parallel threads demands the frequent use of atomic instructions to access the slow GPU global memory. This paper proposes an optimization scheme that utilizes the fast shared memory to resolve the performance bottleneck caused by atomic access, and discusses numerous other optimization techniques needed to harness the full potential of the GPU. Using these techniques, a widely accepted MC code package in biophotonics, called MCML, was successfully accelerated on a Fermi GPU by approximately 600x compared to a state-of-the-art Intel Core i7 CPU. A skin model consisting of 7 layers was used as the standard simulation geometry. To demonstrate the possibility of GPU cluster computing, the same GPU code was executed on four GPUs, showing a linear improvement in performance with an increasing number of GPUs. The GPU-based MCML code package, named GPU-MCML, is compatible with a wide range of graphics cards and is released as an open-source software in two versions: an optimized version tuned for high performance and a simplified version for beginners (http://code.google.com/p/gpumcml). PMID:21258498

  11. Memory-Scalable GPU Spatial Hierarchy Construction.

    PubMed

    Qiming Hou; Xin Sun; Kun Zhou; Lauterbach, C; Manocha, D

    2011-04-01

    Recent GPU algorithms for constructing spatial hierarchies have achieved promising performance for moderately complex models by using the breadth-first search (BFS) construction order. While being able to exploit the massive parallelism on the GPU, the BFS order also consumes excessive GPU memory, which becomes a serious issue for interactive applications involving very complex models with more than a few million triangles. In this paper, we propose to use the partial breadth-first search (PBFS) construction order to control memory consumption while maximizing performance. We apply the PBFS order to two hierarchy construction algorithms. The first algorithm is for kd-trees that automatically balances between the level of parallelism and intermediate memory usage. With PBFS, peak memory consumption during construction can be efficiently controlled without costly CPU-GPU data transfer. We also develop memory allocation strategies to effectively limit memory fragmentation. The resulting algorithm scales well with GPU memory and constructs kd-trees of models with millions of triangles at interactive rates on GPUs with 1 GB memory. Compared with existing algorithms, our algorithm is an order of magnitude more scalable for a given GPU memory bound. The second algorithm is for out-of-core bounding volume hierarchy (BVH) construction for very large scenes based on the PBFS construction order. At each iteration, all constructed nodes are dumped to the CPU memory, and the GPU memory is freed for the next iteration's use. In this way, the algorithm is able to build trees that are too large to be stored in the GPU memory. Experiments show that our algorithm can construct BVHs for scenes with up to 20 M triangles, several times larger than previous GPU algorithms.

  12. The Metropolis Monte Carlo method with CUDA enabled Graphic Processing Units

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hall, Clifford; School of Physics, Astronomy, and Computational Sciences, George Mason University, 4400 University Dr., Fairfax, VA 22030; Ji, Weixiao

    2014-02-01

    We present a CPU–GPU system for runtime acceleration of large molecular simulations using GPU computation and memory swaps. The memory architecture of the GPU can be used both as container for simulation data stored on the graphics card and as floating-point code target, providing an effective means for the manipulation of atomistic or molecular data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform atomistic and molecular simulations are essential. Our system implements a versatile molecular engine, including inter-molecule interactions and orientational variables for performing the Metropolis Monte Carlo (MMC) algorithm,more » which is one type of Markov chain Monte Carlo. By combining memory objects with floating-point code fragments we have implemented an MMC parallel engine that entirely avoids the communication time of molecular data at runtime. Our runtime acceleration system is a forerunner of a new class of CPU–GPU algorithms exploiting memory concepts combined with threading for avoiding bus bandwidth and communication. The testbed molecular system used here is a condensed phase system of oligopyrrole chains. A benchmark shows a size scaling speedup of 60 for systems with 210,000 pyrrole monomers. Our implementation can easily be combined with MPI to connect in parallel several CPU–GPU duets. -- Highlights: •We parallelize the Metropolis Monte Carlo (MMC) algorithm on one CPU—GPU duet. •The Adaptive Tempering Monte Carlo employs MMC and profits from this CPU—GPU implementation. •Our benchmark shows a size scaling-up speedup of 62 for systems with 225,000 particles. •The testbed involves a polymeric system of oligopyrroles in the condensed phase. •The CPU—GPU parallelization includes dipole—dipole and Mie—Jones classic potentials.« less

  13. Experiências internacionais da aplicação de sistemas de apoio à decisão clínica em gastroenterologia

    PubMed Central

    Tenório, Josceli Maria; Hummel, Anderson Diniz; Sdepanian, Vera Lucia; Pisa, Ivan Torres; de Fátima Marin, Heimar

    2015-01-01

    Objetivo Descrever as experiências recentes com a aplicação de sistemas de apoio à decisão clínica em gastroenterologia, de forma a estabelecer o nível de desenvolvimento, testes e vantagens conferidas à prática médica com a introdução desses softwares. Métodos Foi realizada busca nas bases de dados PubMed, LILACS e ISI Web of Knowledge, utilizando termos relacionados à sistemas de apoio à decisão e à gastroenterogia, incluindo artigos originais publicados no período entre 2005 e 2010. Foram recuperadas 104 publicações, na busca inicial e, após a aplicação dos critérios de inclusão e exclusão, foram eleitos nove estudos para leitura do texto completo. Resultados Os sistemas de apoio à decisão clínica apresentam grande multiplicidade de problemas clínicos e investigação de doenças. Em 89% dos casos, são descritos modelos experimentais para o desenvolvimento de sistemas de apoio à decisão clínica. A descrição dos resultados obtidos por técnicas de inteligência artificial em 78% das publicações. Em dois dos estudos foram realizadas comparações com o médico e em apenas uma publicação um estudo controlado foi descrito, mostrando evidências de melhorias na prática médica. Conclusão Os estudos mostram potenciais benefícios dos sistemas de apoio à decisão clínica à prática médica, porém, estudos controlados em ambiente real devem ser realizados para comprovar esta perspectiva. PMID:26491625

  14. In vivo retinal and choroidal hypoxia imaging using a novel activatable hypoxia-selective near-infrared fluorescent probe.

    PubMed

    Fukuda, Shinichi; Okuda, Kensuke; Kishino, Genichiro; Hoshi, Sujin; Kawano, Itsuki; Fukuda, Masahiro; Yamashita, Toshiharu; Beheregaray, Simone; Nagano, Masumi; Ohneda, Osamu; Nagasawa, Hideko; Oshika, Tetsuro

    2016-12-01

    Retinal hypoxia plays a crucial role in ocular neovascular diseases, such as diabetic retinopathy, retinopathy of prematurity, and retinal vascular occlusion. Fluorescein angiography is useful for identifying the hypoxia extent by detecting non-perfusion areas or neovascularization, but its ability to detect early stages of hypoxia is limited. Recently, in vivo fluorescent probes for detecting hypoxia have been developed; however, these have not been extensively applied in ophthalmology. We evaluated whether a novel donor-excited photo-induced electron transfer (d-PeT) system based on an activatable hypoxia-selective near-infrared fluorescent (NIRF) probe (GPU-327) responds to both mild and severe hypoxia in various ocular ischemic diseases animal models. The ocular fundus examination offers unique opportunities for direct observation of the retina through the transparent cornea and lens. After injection of GPU-327 in various ocular hypoxic diseases of mouse and rabbit models, NIRF imaging in the ocular fundus can be performed noninvasively and easily by using commercially available fundus cameras. To investigate the safety of GPU-327, electroretinograms were also recorded after GPU-327 and PBS injection. Fluorescence of GPU-327 increased under mild hypoxic conditions in vitro. GPU-327 also yielded excellent signal-to-noise ratio without washing out in vivo experiments. By using near-infrared region, GPU-327 enables imaging of deeper ischemia, such as choroidal circulation. Additionally, from an electroretinogram, GPU-327 did not cause neurotoxicity. GPU-327 identified hypoxic area both in vivo and in vitro.

  15. Large-scale neural circuit mapping data analysis accelerated with the graphical processing unit (GPU).

    PubMed

    Shi, Yulin; Veidenbaum, Alexander V; Nicolau, Alex; Xu, Xiangmin

    2015-01-15

    Modern neuroscience research demands computing power. Neural circuit mapping studies such as those using laser scanning photostimulation (LSPS) produce large amounts of data and require intensive computation for post hoc processing and analysis. Here we report on the design and implementation of a cost-effective desktop computer system for accelerated experimental data processing with recent GPU computing technology. A new version of Matlab software with GPU enabled functions is used to develop programs that run on Nvidia GPUs to harness their parallel computing power. We evaluated both the central processing unit (CPU) and GPU-enabled computational performance of our system in benchmark testing and practical applications. The experimental results show that the GPU-CPU co-processing of simulated data and actual LSPS experimental data clearly outperformed the multi-core CPU with up to a 22× speedup, depending on computational tasks. Further, we present a comparison of numerical accuracy between GPU and CPU computation to verify the precision of GPU computation. In addition, we show how GPUs can be effectively adapted to improve the performance of commercial image processing software such as Adobe Photoshop. To our best knowledge, this is the first demonstration of GPU application in neural circuit mapping and electrophysiology-based data processing. Together, GPU enabled computation enhances our ability to process large-scale data sets derived from neural circuit mapping studies, allowing for increased processing speeds while retaining data precision. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Large scale neural circuit mapping data analysis accelerated with the graphical processing unit (GPU)

    PubMed Central

    Shi, Yulin; Veidenbaum, Alexander V.; Nicolau, Alex; Xu, Xiangmin

    2014-01-01

    Background Modern neuroscience research demands computing power. Neural circuit mapping studies such as those using laser scanning photostimulation (LSPS) produce large amounts of data and require intensive computation for post-hoc processing and analysis. New Method Here we report on the design and implementation of a cost-effective desktop computer system for accelerated experimental data processing with recent GPU computing technology. A new version of Matlab software with GPU enabled functions is used to develop programs that run on Nvidia GPUs to harness their parallel computing power. Results We evaluated both the central processing unit (CPU) and GPU-enabled computational performance of our system in benchmark testing and practical applications. The experimental results show that the GPU-CPU co-processing of simulated data and actual LSPS experimental data clearly outperformed the multi-core CPU with up to a 22x speedup, depending on computational tasks. Further, we present a comparison of numerical accuracy between GPU and CPU computation to verify the precision of GPU computation. In addition, we show how GPUs can be effectively adapted to improve the performance of commercial image processing software such as Adobe Photoshop. Comparison with Existing Method(s) To our best knowledge, this is the first demonstration of GPU application in neural circuit mapping and electrophysiology-based data processing. Conclusions Together, GPU enabled computation enhances our ability to process large-scale data sets derived from neural circuit mapping studies, allowing for increased processing speeds while retaining data precision. PMID:25277633

  17. Usina de ciências: um espaço pedagógico para aprendizagens múltiplas

    NASA Astrophysics Data System (ADS)

    Martin, V. A. F.; Poppe, P. C. R.; Orrico, A. C. P.; Pereira, M. G.

    2003-08-01

    Entendemos que o Ensino de Astronomia é especialmente apropriado para motivar os alunos e aprofundar conteúdos em diversas áreas do conhecimento, pois envolve temas ligados à Física, Matemática, Química, Computação, Tratamento de Imagens e Instrumentação de Alta Precisão, além daqueles pertinentes as áreas de Geografia, História e Antropologia. Contudo, apesar do caráter interdisciplinar que esta ciência possui, a realidade atual é que a maioria dos professores em sala de aula não foram devidamente capacitados, durante o período de formação acadêmica, para ministrar conteúdos de Astronomia nos atuais Ensinos Fundamental e Médio. Neste trabalho, discutiremos de maneira ampla, num primeiro momento, a realidade do atual ensino de ciências praticado no Estado da Bahia, apontando por dependência administrativa, o crescimento e a redução do número de escolas, da taxa de analfabetismo por faixa etária, da escolarização, do atendimento, da aprovação, reprovação e abandono, de equipamentos e laboratórios e o grau de formação dos nossos atuais professores em pleno exercício de atividade docente. Num segundo momento, discutiremos o papel do Observatório Astronômico Antares/UEFS dentro desse contexto, ou seja, suas ações implementadas ao longo dos últimos anos e em particular, o recente projeto de extensão Ensino e Difusão de Astronomia, financiado pela Fundação Vitae, que procura traduzir no lúdico, no brincar de ciências, um espaço pedagógico para aprendizagens múltiplas. Neste, o papel do professor multiplicador associado ao laboratório de kits didáticos, de fácil construção e manipulação (alguns dos quais serão mostrados), perfazem os principais veículos para o desenvolvimento de conhecimentos, atitudes, habilidades e valores que preparam os nossos alunos para a carreira técnico-científica e para sua participação crítica e criativa na Sociedade.

  18. GPU accelerated cell-based adaptive mesh refinement on unstructured quadrilateral grid

    NASA Astrophysics Data System (ADS)

    Luo, Xisheng; Wang, Luying; Ran, Wei; Qin, Fenghua

    2016-10-01

    A GPU accelerated inviscid flow solver is developed on an unstructured quadrilateral grid in the present work. For the first time, the cell-based adaptive mesh refinement (AMR) is fully implemented on GPU for the unstructured quadrilateral grid, which greatly reduces the frequency of data exchange between GPU and CPU. Specifically, the AMR is processed with atomic operations to parallelize list operations, and null memory recycling is realized to improve the efficiency of memory utilization. It is found that results obtained by GPUs agree very well with the exact or experimental results in literature. An acceleration ratio of 4 is obtained between the parallel code running on the old GPU GT9800 and the serial code running on E3-1230 V2. With the optimization of configuring a larger L1 cache and adopting Shared Memory based atomic operations on the newer GPU C2050, an acceleration ratio of 20 is achieved. The parallelized cell-based AMR processes have achieved 2x speedup on GT9800 and 18x on Tesla C2050, which demonstrates that parallel running of the cell-based AMR method on GPU is feasible and efficient. Our results also indicate that the new development of GPU architecture benefits the fluid dynamics computing significantly.

  19. Projeto observatórios virtuais: educação através de telescópios robóticos

    NASA Astrophysics Data System (ADS)

    Santana, P. H. S.; Shida, R. Y.

    2003-08-01

    O principal objetivo do projeto Observatórios Virtuais é o ensino na área de ciências através de atividades práticas desenvolvidas em colaboração entre instituições de pesquisa em astronomia e escolas de ensino médio e fundamental. Este ano deverá ser concluída a implantação do programa piloto de estudos, pesquisas e observação astronômica direta, com utilização em tempo real de telescópios robóticos, que assim funcionarão como "observatórios virtuais". O objetivo pedagógico das atividades práticas baseadas nas imagens atronômicas é desenvolver as habilidades e competências dos alunos no uso do método científico. Para isso, serão realizados projetos interdisciplinares, a partir de observações astronômicas, já que a astronomia é uma área interdisciplinar por excelência. Essas atividades terão níveis diferenciados de complexidade, que podem ser adequados aos vários graus do ensino e realidades regionais. Será dada ênfase ao desenvolvimento e aplicação em São Paulo, onde atua a equipe do IAG/USP. Como resultados apresentados no presente trabalho, temos a criação de um software em português para o processamento de imagens obtidas através de CCDs e a elaboração de material para as atividades educacionais relacionadas.

  20. Multi-GPU implementation of a VMAT treatment plan optimization algorithm.

    PubMed

    Tian, Zhen; Peng, Fei; Folkerts, Michael; Tan, Jun; Jia, Xun; Jiang, Steve B

    2015-06-01

    Volumetric modulated arc therapy (VMAT) optimization is a computationally challenging problem due to its large data size, high degrees of freedom, and many hardware constraints. High-performance graphics processing units (GPUs) have been used to speed up the computations. However, GPU's relatively small memory size cannot handle cases with a large dose-deposition coefficient (DDC) matrix in cases of, e.g., those with a large target size, multiple targets, multiple arcs, and/or small beamlet size. The main purpose of this paper is to report an implementation of a column-generation-based VMAT algorithm, previously developed in the authors' group, on a multi-GPU platform to solve the memory limitation problem. While the column-generation-based VMAT algorithm has been previously developed, the GPU implementation details have not been reported. Hence, another purpose is to present detailed techniques employed for GPU implementation. The authors also would like to utilize this particular problem as an example problem to study the feasibility of using a multi-GPU platform to solve large-scale problems in medical physics. The column-generation approach generates VMAT apertures sequentially by solving a pricing problem (PP) and a master problem (MP) iteratively. In the authors' method, the sparse DDC matrix is first stored on a CPU in coordinate list format (COO). On the GPU side, this matrix is split into four submatrices according to beam angles, which are stored on four GPUs in compressed sparse row format. Computation of beamlet price, the first step in PP, is accomplished using multi-GPUs. A fast inter-GPU data transfer scheme is accomplished using peer-to-peer access. The remaining steps of PP and MP problems are implemented on CPU or a single GPU due to their modest problem scale and computational loads. Barzilai and Borwein algorithm with a subspace step scheme is adopted here to solve the MP problem. A head and neck (H&N) cancer case is then used to validate the authors' method. The authors also compare their multi-GPU implementation with three different single GPU implementation strategies, i.e., truncating DDC matrix (S1), repeatedly transferring DDC matrix between CPU and GPU (S2), and porting computations involving DDC matrix to CPU (S3), in terms of both plan quality and computational efficiency. Two more H&N patient cases and three prostate cases are used to demonstrate the advantages of the authors' method. The authors' multi-GPU implementation can finish the optimization process within ∼ 1 min for the H&N patient case. S1 leads to an inferior plan quality although its total time was 10 s shorter than the multi-GPU implementation due to the reduced matrix size. S2 and S3 yield the same plan quality as the multi-GPU implementation but take ∼4 and ∼6 min, respectively. High computational efficiency was consistently achieved for the other five patient cases tested, with VMAT plans of clinically acceptable quality obtained within 23-46 s. Conversely, to obtain clinically comparable or acceptable plans for all six of these VMAT cases that the authors have tested in this paper, the optimization time needed in a commercial TPS system on CPU was found to be in an order of several minutes. The results demonstrate that the multi-GPU implementation of the authors' column-generation-based VMAT optimization can handle the large-scale VMAT optimization problem efficiently without sacrificing plan quality. The authors' study may serve as an example to shed some light on other large-scale medical physics problems that require multi-GPU techniques.

  1. GPU Accelerated Clustering for Arbitrary Shapes in Geoscience Data

    NASA Astrophysics Data System (ADS)

    Pankratius, V.; Gowanlock, M.; Rude, C. M.; Li, J. D.

    2016-12-01

    Clustering algorithms have become a vital component in intelligent systems for geoscience that helps scientists discover and track phenomena of various kinds. Here, we outline advances in Density-Based Spatial Clustering of Applications with Noise (DBSCAN) which detects clusters of arbitrary shape that are common in geospatial data. In particular, we propose a hybrid CPU-GPU implementation of DBSCAN and highlight new optimization approaches on the GPU that allows clustering detection in parallel while optimizing data transport during CPU-GPU interactions. We employ an efficient batching scheme between the host and GPU such that limited GPU memory is not prohibitive when processing large and/or dense datasets. To minimize data transfer overhead, we estimate the total workload size and employ an execution that generates optimized batches that will not overflow the GPU buffer. This work is demonstrated on space weather Total Electron Content (TEC) datasets containing over 5 million measurements from instruments worldwide, and allows scientists to spot spatially coherent phenomena with ease. Our approach is up to 30 times faster than a sequential implementation and therefore accelerates discoveries in large datasets. We acknowledge support from NSF ACI-1442997.

  2. A real-time spike sorting method based on the embedded GPU.

    PubMed

    Zelan Yang; Kedi Xu; Xiang Tian; Shaomin Zhang; Xiaoxiang Zheng

    2017-07-01

    Microelectrode arrays with hundreds of channels have been widely used to acquire neuron population signals in neuroscience studies. Online spike sorting is becoming one of the most important challenges for high-throughput neural signal acquisition systems. Graphic processing unit (GPU) with high parallel computing capability might provide an alternative solution for increasing real-time computational demands on spike sorting. This study reported a method of real-time spike sorting through computing unified device architecture (CUDA) which was implemented on an embedded GPU (NVIDIA JETSON Tegra K1, TK1). The sorting approach is based on the principal component analysis (PCA) and K-means. By analyzing the parallelism of each process, the method was further optimized in the thread memory model of GPU. Our results showed that the GPU-based classifier on TK1 is 37.92 times faster than the MATLAB-based classifier on PC while their accuracies were the same with each other. The high-performance computing features of embedded GPU demonstrated in our studies suggested that the embedded GPU provide a promising platform for the real-time neural signal processing.

  3. Comparação de modelos para o cálculo de perturbações orbitais devidas à maré terrestre

    NASA Astrophysics Data System (ADS)

    Vieira Pinto, J.; Vilhena de Moraes, R.

    2003-08-01

    Aplicações recentes de satélites artificiais com finalidades geodinâmicas requerem órbitas determinadas com bastante precisão. Em particular marés terrestres influenciam o potencial terrestre causando perturbações adicionais no movimento de satélites artificiais, as quais tem sido medidas por diversos processos. A atração exercida pela lua e pelo sol sobre a terra produz deslocamentos elásticos em seu interior e uma protuberância em sua superfície. O resultado é uma pequena variação na distribuição da massa na terra, consequentemente no geopotencial. As perturbações nos elementos orbitais de satélites artificiais terrestres devidas a maré terrestre podem ser estudadas a partir das equações de Lagrange, considerando-se um conveniente potencial. Por outro lado, como tem sido feito pelo IERS, as mudanças induzidas pela maré terrestre no geopotencial podem ser convenientemente modeladas como variações nos coeficientes Cnm e Snm do geopotencial. As duas teorias ainda não foram comparados para um mesmo satélite. Neste trabalho são apresentadas e comparadas as variações de longo período e seculares nas perturbações orbitais devidas à maré terrestre, calculadas por um modelo simples, o de Kozai, e pelo modelo do IERS. Resultados preliminares mostram, para os satélites SCD2 e CBERS1, e para a Lua em movimento elíptico e precessionando, as perturbações seculares no argumento do perigeu e na longitude do nodo ascendente.

  4. Imagens do céu ontem e hoje - um multimídia interativo de astronomia e uma nova exposição no MAST

    NASA Astrophysics Data System (ADS)

    Caretta, C. A.; Lima, F. P.; Requeijo, F.; Vieira, G. G.; Alves, F.; Valente, M. E. A.; de Almeida, R.; de Garcia, G. C.; Quixadá, A. C.

    2003-08-01

    "Imagens do Céu Ontem e Hoje" é o título de uma nova exposição que está sendo inaugurada no Museu de Astronomia e Ciências Afins (MCT), que inclui experimentos interativos, maquetes, réplicas e 8 terminais de computador com um multimídia interativo sobre Astronomia para consulta dos visitantes. O multimídia apresenta um conteúdo bastante extenso, que engloba quase todos os temas em Astronomia, consistindo numa fonte de divulgação e pesquisa para um público que vai das crianças até estudantes universitários. O conteúdo está distribuído em mais de 500 páginas de texto divididas em 4 módulos: "O Universo", "Espectroscopia", "Telescópios" e "Observando o Céu". Cada módulo é subdividido em 5 seções, em média, cada uma iniciada por uma animação que ilustra os temas a serem abordados na seção. Ao final da animação, uma lista de temas é apresentada sob o título "Saiba Mais". Para exemplificar, o módulo "O Universo" contém as seguintes seções: "O Universo visto pelo homem", "Conhecendo o Sistema Solar", "Indo além do Sistema Solar", "Nossa Galáxia, a Via-Láctea" e "Indo mais além, a imensidão do Universo". A seção "Conhecendo o Sistema Solar", por sua vez, tem os seguintes temas: "A origem do Sistema Solar", "O Sol", "Os planetas", "Satélites, asteróides, cometas e outros bichos..." e "O Sistema Solar em números". Cada texto é repleto de imagens, quadros, desenhos, esquemas, etc, além de passatempos ao final de cada seção, incluindo jogos interativos, quadrinhos e curiosidades, que auxiliam o aprendizado de forma divertida. Apresentamos neste trabalho as idéias gerais que permearam a produção da exposição, e uma viagem pelo multimídia para exemplificar sua estrutura e conteúdo. O multimídia será posteriormente disponibilizado para o público externo pela página eletrônica do MAst e/ou por intermédio de uma publicação comercial.

  5. A sample implementation for parallelizing Divide-and-Conquer algorithms on the GPU.

    PubMed

    Mei, Gang; Zhang, Jiayin; Xu, Nengxiong; Zhao, Kunyang

    2018-01-01

    The strategy of Divide-and-Conquer (D&C) is one of the frequently used programming patterns to design efficient algorithms in computer science, which has been parallelized on shared memory systems and distributed memory systems. Tzeng and Owens specifically developed a generic paradigm for parallelizing D&C algorithms on modern Graphics Processing Units (GPUs). In this paper, by following the generic paradigm proposed by Tzeng and Owens, we provide a new and publicly available GPU implementation of the famous D&C algorithm, QuickHull, to give a sample and guide for parallelizing D&C algorithms on the GPU. The experimental results demonstrate the practicality of our sample GPU implementation. Our research objective in this paper is to present a sample GPU implementation of a classical D&C algorithm to help interested readers to develop their own efficient GPU implementations with fewer efforts.

  6. GPU MrBayes V3.1: MrBayes on Graphics Processing Units for Protein Sequence Data.

    PubMed

    Pang, Shuai; Stones, Rebecca J; Ren, Ming-Ming; Liu, Xiao-Guang; Wang, Gang; Xia, Hong-ju; Wu, Hao-Yang; Liu, Yang; Xie, Qiang

    2015-09-01

    We present a modified GPU (graphics processing unit) version of MrBayes, called ta(MC)(3) (GPU MrBayes V3.1), for Bayesian phylogenetic inference on protein data sets. Our main contributions are 1) utilizing 64-bit variables, thereby enabling ta(MC)(3) to process larger data sets than MrBayes; and 2) to use Kahan summation to improve accuracy, convergence rates, and consequently runtime. Versus the current fastest software, we achieve a speedup of up to around 2.5 (and up to around 90 vs. serial MrBayes), and more on multi-GPU hardware. GPU MrBayes V3.1 is available from http://sourceforge.net/projects/mrbayes-gpu/. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. GPU-based High-Performance Computing for Radiation Therapy

    PubMed Central

    Jia, Xun; Ziegenhein, Peter; Jiang, Steve B.

    2014-01-01

    Recent developments in radiotherapy therapy demand high computation powers to solve challenging problems in a timely fashion in a clinical environment. Graphics processing unit (GPU), as an emerging high-performance computing platform, has been introduced to radiotherapy. It is particularly attractive due to its high computational power, small size, and low cost for facility deployment and maintenance. Over the past a few years, GPU-based high-performance computing in radiotherapy has experienced rapid developments. A tremendous amount of studies have been conducted, in which large acceleration factors compared with the conventional CPU platform have been observed. In this article, we will first give a brief introduction to the GPU hardware structure and programming model. We will then review the current applications of GPU in major imaging-related and therapy-related problems encountered in radiotherapy. A comparison of GPU with other platforms will also be presented. PMID:24486639

  8. GPU accelerated implementation of NCI calculations using promolecular density.

    PubMed

    Rubez, Gaëtan; Etancelin, Jean-Matthieu; Vigouroux, Xavier; Krajecki, Michael; Boisson, Jean-Charles; Hénon, Eric

    2017-05-30

    The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand-protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual-GPU version leads to a 39-fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  9. A GPU-based calculation using the three-dimensional FDTD method for electromagnetic field analysis.

    PubMed

    Nagaoka, Tomoaki; Watanabe, Soichi

    2010-01-01

    Numerical simulations with the numerical human model using the finite-difference time domain (FDTD) method have recently been performed frequently in a number of fields in biomedical engineering. However, the FDTD calculation runs too slowly. We focus, therefore, on general purpose programming on the graphics processing unit (GPGPU). The three-dimensional FDTD method was implemented on the GPU using Compute Unified Device Architecture (CUDA). In this study, we used the NVIDIA Tesla C1060 as a GPGPU board. The performance of the GPU is evaluated in comparison with the performance of a conventional CPU and a vector supercomputer. The results indicate that three-dimensional FDTD calculations using a GPU can significantly reduce run time in comparison with that using a conventional CPU, even a native GPU implementation of the three-dimensional FDTD method, while the GPU/CPU speed ratio varies with the calculation domain and thread block size.

  10. Study of heat and salt transport processes in the Espinheiro Channel (Ria de Aveiro)

    NASA Astrophysics Data System (ADS)

    Vaz, Nuno Alexandre Firmino

    O principal objectivo deste trabalho consistiu no estudo da dinâmica termohalina do Canal do Espinheiro em funcao de dois forcamentos principais: mare e caudal fluvial, usando duas abordagens distintas: trabalho experimental e modelacao numerica. A propagacao da mare e o caudal fluvial do Rio Vouga sao determinantes no estabelecimento da estrutura horizontal da salinidade ao longo do canal. A estrutura termica horizontal ao longo do canal e, em grande parte, determinada pela variacao sazonal da temperatura da agua do Rio Vouga, bem como, pela variacao sazonal das condicoes meteorologicas devido a reduzida profundidade. Foi observada a formacao de fortes gradientes de salinidade (relacionados com a formacao de frentes estuarinas) numa regiao a cerca de 7-8 km da embocadura do canal, observando-se a sua migracao numa regiao de aproximadamente 1 km, dependendo do regime de mare. O balanco entre o transporte de sal de natureza advectiva e difusiva foi calculado, revelando que junto a embocadura os processos fisicos que mais contribuem para o transporte de sal sao a circulacao residual e o aprisionamento da agua em canais secundarios. Junto a foz do Rio Vouga os termos devidos a descarga fluvial e a circulacao gravitacional dominam o transporte de sal. Foi calibrado e validado um modelo numerico (Mohid, em modo 2D e 3D), sendo posteriormente utilizado para estudar a hidrologia do canal. Foi concedida particular atencao ao estudo da hidrologia em condicoes extremas de caudal fluvial e de mare. Os resultados da modelacao numerica permitiram numa primeira fase avaliar o bom desempenho do Mohid na reproducao dos escoamentos barotropicos na Ria de Aveiro, bem como na evolucao temporal das propriedades termohalinas da agua. Sob condicoes de caudal fluvial reduzido, a dinâmica do canal e essencialmente dominada pela mare. Com o aumento do caudal fluvial, a influencia da agua doce estende-se para jusante, estratificando a coluna de agua. As simulacoes 3D do Canal do Espinheiro foram efectuadas para periodos marcadamente diferentes de caudal fluvial e de mare. O modelo reproduziu qualitativamente/quantitativamente as observacoes de alturas de agua, velocidade e distribuicoes longitudinais de salinidade e temperatura sob um regime fraco a medio de caudal fluvial. Sob condicoes de caudal fluvial elevado, os resultados mostram que o modelo subestima a estratificacao. Este estudo contribuiu para o aumento do conhecimento da dinâmica do Canal do Espinheiro, bem como para o desenvolvimento de um sistema numerico capaz de reproduzir e prever os processos de transporte de sal e calor. None

  11. Evaluation of thermochemical biomass conversion in fluidized bed =

    NASA Astrophysics Data System (ADS)

    Neves, Daniel dos Santos Felix das

    Dado o aumento acelerado dos precos dos combustiveis fosseis e as incertezas quanto a sua disponibilidade futura, tem surgido um novo interesse nas tecnologias da biomassa aplicadas a producao de calor, eletricidade ou combustiveis sinteticos. Nao obstante, para a conversao termoquimica de uma particula de biomassa solida concorrem fenomenos bastante complexos que levam, em primeiro lugar, a secagem do combustivel, depois a pirolise e finalmente a combustao ou gasificacao propriamente ditas. Uma descricao relativamente incompleta de alguns desses estagios de conversao constitui ainda um obstaculo ao desenvolvimento das tecnologias que importa ultrapassar. Em particular, a presenca de elevados conteudos de materia volatil na biomassa poe em evidencia o interesse pratico do estudo da pirolise. A importância da pirolise durante a combustao de biomassa foi evidenciada neste trabalho atraves de ensaios realizados num reator piloto de leito fluidizado borbulhante. Verificou-se que o processo ocorre em grande parte a superficie do leito com chamas de difusao devido a libertacao de volateis, o que dificulta o controlo da temperatura do reator acima do leito. No caso da gasificacao de biomassa a pirolise pode inclusivamente determinar a eficiencia quimica do processo. Isso foi mostrado neste trabalho durante ensaios de gasificacao num reator de leito fluidizado de 2MWth, onde um novo metodo de medicao permitiu fechar o balanco de massa ao gasificador e monitorizar o grau de conversao da biomassa. A partir destes resultados tornou-se clara a necessidade de descrever adequadamente a pirolise de biomassa com vista ao projeto e controlo dos processos. Em aplicacoes de engenharia ha particular interesse na estequiometria e propriedades dos principais produtos piroliticos. Neste trabalho procurou-se responder a esta necessidade, inicialmente atraves da estruturacao de dados bibliograficos sobre rendimentos de carbonizado, liquidos piroliticos e gases, assim como composicoes elementares e poderes calorificos. O resultado traduziu-se num conjunto de parâmetros empiricos de interesse pratico que permitiram elucidar o comportamento geral da pirolise de biomassa numa gama ampla de condicoes operatorias. Para alem disso, propos-se um modelo empirico para a composicao dos volateis que pode ser integrado em modelos compreensivos de reatores desde que os parâmetros usados sejam adequados ao combustivel ensaiado. Esta abordagem despoletou um conjunto de ensaios de pirolise com varias biomassas, lenhina e celulose, e temperaturas entre os 600 e 975ºC. Elevadas taxas de aquecimento do combustivel foram alcancadas em reatores laboratoriais de leito fluidizado borbulhante e leito fixo, ao passo que um sistema termo-gravimetrico permitiu estudar o efeito de taxas de aquecimento mais baixas. Os resultados mostram que, em condicoes tipicas de processos de combustao e gasificacao, a quantidade de volateis libertada da biomassa e pouco influenciada pela temperatura do reator mas varia bastante entre combustiveis. Uma analise mais aprofundada deste assunto permitiu mostrar que o rendimento de carbonizado esta intimamente relacionado com o racio O/C do combustivel original, sendo proposto um modelo simples para descrever esta relacao. Embora a quantidade total de volateis libertada seja estabelecida pela composicao da biomassa, a respetiva composicao quimica depende bastante da temperatura do reator. Rendimentos de especies condensaveis (agua e especies orgânicas), CO2 e hidrocarbonetos leves descrevem um maximo relativamente a temperatura para dar lugar a CO e H2 as temperaturas mais altas. Nao obstante, em certas gamas de temperatura, os rendimentos de algumas das principais especies gasosas (e.g. CO, H2, CH4) estao bem correlacionados entre si, o que permitiu desenvolver modelos empiricos que minimizam o efeito das condicoes operatorias e, ao mesmo tempo, realcam o efeito do combustivel na composicao do gas. Em suma, os ensaios de pirolise realizados neste trabalho permitiram constatar que a estequiometria da pirolise de biomassa se relaciona de varias formas com a composicao elementar do combustivel original o que levanta varias possibilidades para a avaliacao e projeto de processos de combustao e gasificacao de biomassa.

  12. Development of High-speed Visualization System of Hypocenter Data Using CUDA-based GPU computing

    NASA Astrophysics Data System (ADS)

    Kumagai, T.; Okubo, K.; Uchida, N.; Matsuzawa, T.; Kawada, N.; Takeuchi, N.

    2014-12-01

    After the Great East Japan Earthquake on March 11, 2011, intelligent visualization of seismic information is becoming important to understand the earthquake phenomena. On the other hand, to date, the quantity of seismic data becomes enormous as a progress of high accuracy observation network; we need to treat many parameters (e.g., positional information, origin time, magnitude, etc.) to efficiently display the seismic information. Therefore, high-speed processing of data and image information is necessary to handle enormous amounts of seismic data. Recently, GPU (Graphic Processing Unit) is used as an acceleration tool for data processing and calculation in various study fields. This movement is called GPGPU (General Purpose computing on GPUs). In the last few years the performance of GPU keeps on improving rapidly. GPU computing gives us the high-performance computing environment at a lower cost than before. Moreover, use of GPU has an advantage of visualization of processed data, because GPU is originally architecture for graphics processing. In the GPU computing, the processed data is always stored in the video memory. Therefore, we can directly write drawing information to the VRAM on the video card by combining CUDA and the graphics API. In this study, we employ CUDA and OpenGL and/or DirectX to realize full-GPU implementation. This method makes it possible to write drawing information to the VRAM on the video card without PCIe bus data transfer: It enables the high-speed processing of seismic data. The present study examines the GPU computing-based high-speed visualization and the feasibility for high-speed visualization system of hypocenter data.

  13. A fast three-dimensional gamma evaluation using a GPU utilizing texture memory for on-the-fly interpolations.

    PubMed

    Persoon, Lucas C G G; Podesta, Mark; van Elmpt, Wouter J C; Nijsten, Sebastiaan M J J G; Verhaegen, Frank

    2011-07-01

    A widely accepted method to quantify differences in dose distributions is the gamma (gamma) evaluation. Currently, almost all gamma implementations utilize the central processing unit (CPU). Recently, the graphics processing unit (GPU) has become a powerful platform for specific computing tasks. In this study, we describe the implementation of a 3D gamma evaluation using a GPU to improve calculation time. The gamma evaluation algorithm was implemented on an NVIDIA Tesla C2050 GPU using the compute unified device architecture (CUDA). First, several cubic virtual phantoms were simulated. These phantoms were tested with varying dose cube sizes and set-ups, introducing artificial dose differences. Second, to show applicability in clinical practice, five patient cases have been evaluated using the 3D dose distribution from a treatment planning system as the reference and the delivered dose determined during treatment as the comparison. A calculation time comparison between the CPU and GPU was made with varying thread-block sizes including the option of using texture or global memory. A GPU over CPU speed-up of 66 +/- 12 was achieved for the virtual phantoms. For the patient cases, a speed-up of 57 +/- 15 using the GPU was obtained. A thread-block size of 16 x 16 performed best in all cases. The use of texture memory improved the total calculation time, especially when interpolation was applied. Differences between the CPU and GPU gammas were negligible. The GPU and its features, such as texture memory, decreased the calculation time for gamma evaluations considerably without loss of accuracy.

  14. Perfil das alterações vasculares periféricas em dependentes de crack acompanhados em Centro de Atenção Psicossocial para Álcool e Drogas (CAPS-AD)

    PubMed Central

    da Costa, Antônio Fagundes; Baldaçara, Leonardo Rodrigo; da Silva, Sílvio Alves; Tavares, Ana Célia de Freitas Ramos; Orsolin, Ederson de Freitas; Prehl, Vinícius Barros; Gondo, Fernando Hirohito Beltran; Santana, Hernani Lopes

    2016-01-01

    Resumo Contexto O consumo de crack é um dos grandes desafios em saúde pública, e o uso dessa droga tem efeitos diretos na saúde de seus usuários. Objetivos Avaliar o perfil das alterações vasculares em pacientes com dependência de crack em Centro de Atenção Psicossocial para Álcool e Drogas (CAPS-AD) e observar os possíveis efeitos vasculares periféricos. Métodos Trata-se de um estudo observacional, descritivo, de corte transversal. Os pacientes da amostra foram submetidos a um questionário objetivo para avaliar questões demográficas, padrão de uso da droga, coexistência de diabetes melito, hipertensão arterial ou tabagismo, exame físico e ecográfico. Os dados foram sumarizados e analisados estatisticamente com teste qui-quadrado ou teste exato de Fisher. Resultados A média de idade da amostra foi de 33,29 (±7,15) anos, e 74% eram do gênero masculino. A média de idade de início de uso da droga foi de 23,4 (±7,78) anos, com tempo médio de uso de 9,58 (±5,64) anos. O consumo médio diário de pedras de crack foi de 21,45 (±8,32) pedras. A alteração de pulsos em membros inferiores foi mais frequente em mulheres. A prevalência do espessamento da parede arterial nos membros inferiores foi de 94,8%. O tempo de uso da droga apresentou associação estatística (p = 0,0096) com alteração do padrão de curva espectral das artérias dos membros inferiores. Conclusões Há alterações vasculares periféricas em usuários de crack. O tempo de uso da droga exerceu um maior impacto nesse sistema, o que sugere associação entre o uso do crack e a diminuição de fluxo arterial.

  15. GPU computing in medical physics: a review.

    PubMed

    Pratx, Guillem; Xing, Lei

    2011-05-01

    The graphics processing unit (GPU) has emerged as a competitive platform for computing massively parallel problems. Many computing applications in medical physics can be formulated as data-parallel tasks that exploit the capabilities of the GPU for reducing processing times. The authors review the basic principles of GPU computing as well as the main performance optimization techniques, and survey existing applications in three areas of medical physics, namely image reconstruction, dose calculation and treatment plan optimization, and image processing.

  16. CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications.

    PubMed

    Lei, Guoqing; Dou, Yong; Wan, Wen; Xia, Fei; Li, Rongchun; Ma, Meng; Zou, Dan

    2012-01-01

    Prediction of ribonucleic acid (RNA) secondary structure remains one of the most important research areas in bioinformatics. The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction. Thus far, few studies have been reported on the acceleration of the Zuker algorithm on general-purpose processors or on extra accelerators such as Field Programmable Gate-Array (FPGA) and Graphics Processing Units (GPU). To the best of our knowledge, no implementation combines both CPU and extra accelerators, such as GPUs, to accelerate the Zuker algorithm applications. In this paper, a CPU-GPU hybrid computing system that accelerates Zuker algorithm applications for RNA secondary structure prediction is proposed. The computing tasks are allocated between CPU and GPU for parallel cooperate execution. Performance differences between the CPU and the GPU in the task-allocation scheme are considered to obtain workload balance. To improve the hybrid system performance, the Zuker algorithm is optimally implemented with special methods for CPU and GPU architecture. Speedup of 15.93× over optimized multi-core SIMD CPU implementation and performance advantage of 16% over optimized GPU implementation are shown in the experimental results. More than 14% of the sequences are executed on CPU in the hybrid system. The system combining CPU and GPU to accelerate the Zuker algorithm is proven to be promising and can be applied to other bioinformatics applications.

  17. Efficient Implementation of MrBayes on Multi-GPU

    PubMed Central

    Zhou, Jianfu; Liu, Xiaoguang; Wang, Gang

    2013-01-01

    MrBayes, using Metropolis-coupled Markov chain Monte Carlo (MCMCMC or (MC)3), is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, the (MC)3 Bayesian algorithm and its improved and parallel versions are now not fast enough for biologists to analyze massive real-world DNA data. Recently, graphics processor unit (GPU) has shown its power as a coprocessor (or rather, an accelerator) in many fields. This article describes an efficient implementation a(MC)3 (aMCMCMC) for MrBayes (MC)3 on compute unified device architecture. By dynamically adjusting the task granularity to adapt to input data size and hardware configuration, it makes full use of GPU cores with different data sets. An adaptive method is also developed to split and combine DNA sequences to make full use of a large number of GPU cards. Furthermore, a new “node-by-node” task scheduling strategy is developed to improve concurrency, and several optimizing methods are used to reduce extra overhead. Experimental results show that a(MC)3 achieves up to 63× speedup over serial MrBayes on a single machine with one GPU card, and up to 170× speedup with four GPU cards, and up to 478× speedup with a 32-node GPU cluster. a(MC)3 is dramatically faster than all the previous (MC)3 algorithms and scales well to large GPU clusters. PMID:23493260

  18. Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing

    PubMed Central

    Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin

    2016-01-01

    With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate. PMID:27070606

  19. Efficient implementation of MrBayes on multi-GPU.

    PubMed

    Bao, Jie; Xia, Hongju; Zhou, Jianfu; Liu, Xiaoguang; Wang, Gang

    2013-06-01

    MrBayes, using Metropolis-coupled Markov chain Monte Carlo (MCMCMC or (MC)(3)), is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, the (MC)(3) Bayesian algorithm and its improved and parallel versions are now not fast enough for biologists to analyze massive real-world DNA data. Recently, graphics processor unit (GPU) has shown its power as a coprocessor (or rather, an accelerator) in many fields. This article describes an efficient implementation a(MC)(3) (aMCMCMC) for MrBayes (MC)(3) on compute unified device architecture. By dynamically adjusting the task granularity to adapt to input data size and hardware configuration, it makes full use of GPU cores with different data sets. An adaptive method is also developed to split and combine DNA sequences to make full use of a large number of GPU cards. Furthermore, a new "node-by-node" task scheduling strategy is developed to improve concurrency, and several optimizing methods are used to reduce extra overhead. Experimental results show that a(MC)(3) achieves up to 63× speedup over serial MrBayes on a single machine with one GPU card, and up to 170× speedup with four GPU cards, and up to 478× speedup with a 32-node GPU cluster. a(MC)(3) is dramatically faster than all the previous (MC)(3) algorithms and scales well to large GPU clusters.

  20. Architecting the Finite Element Method Pipeline for the GPU.

    PubMed

    Fu, Zhisong; Lewis, T James; Kirby, Robert M; Whitaker, Ross T

    2014-02-01

    The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core streaming processors like the graphical processing unit (GPU). In this paper, we present the algorithms and data-structures necessary to move the entire FEM pipeline to the GPU. First we propose an efficient GPU-based algorithm to generate local element information and to assemble the global linear system associated with the FEM discretization of an elliptic PDE. To solve the corresponding linear system efficiently on the GPU, we implement a conjugate gradient method preconditioned with a geometry-informed algebraic multi-grid (AMG) method preconditioner. We propose a new fine-grained parallelism strategy, a corresponding multigrid cycling stage and efficient data mapping to the many-core architecture of GPU. Comparison of our on-GPU assembly versus a traditional serial implementation on the CPU achieves up to an 87 × speedup. Focusing on the linear system solver alone, we achieve a speedup of up to 51 × versus use of a comparable state-of-the-art serial CPU linear system solver. Furthermore, the method compares favorably with other GPU-based, sparse, linear solvers.

  1. Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.

    PubMed

    Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin

    2016-04-07

    With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.

  2. SU-E-J-60: Efficient Monte Carlo Dose Calculation On CPU-GPU Heterogeneous Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xiao, K; Chen, D. Z; Hu, X. S

    Purpose: It is well-known that the performance of GPU-based Monte Carlo dose calculation implementations is bounded by memory bandwidth. One major cause of this bottleneck is the random memory writing patterns in dose deposition, which leads to several memory efficiency issues on GPU such as un-coalesced writing and atomic operations. We propose a new method to alleviate such issues on CPU-GPU heterogeneous systems, which achieves overall performance improvement for Monte Carlo dose calculation. Methods: Dose deposition is to accumulate dose into the voxels of a dose volume along the trajectories of radiation rays. Our idea is to partition this proceduremore » into the following three steps, which are fine-tuned for CPU or GPU: (1) each GPU thread writes dose results with location information to a buffer on GPU memory, which achieves fully-coalesced and atomic-free memory transactions; (2) the dose results in the buffer are transferred to CPU memory; (3) the dose volume is constructed from the dose buffer on CPU. We organize the processing of all radiation rays into streams. Since the steps within a stream use different hardware resources (i.e., GPU, DMA, CPU), we can overlap the execution of these steps for different streams by pipelining. Results: We evaluated our method using a Monte Carlo Convolution Superposition (MCCS) program and tested our implementation for various clinical cases on a heterogeneous system containing an Intel i7 quad-core CPU and an NVIDIA TITAN GPU. Comparing with a straightforward MCCS implementation on the same system (using both CPU and GPU for radiation ray tracing), our method gained 2-5X speedup without losing dose calculation accuracy. Conclusion: The results show that our new method improves the effective memory bandwidth and overall performance for MCCS on the CPU-GPU systems. Our proposed method can also be applied to accelerate other Monte Carlo dose calculation approaches. This research was supported in part by NSF under Grants CCF-1217906, and also in part by a research contract from the Sandia National Laboratories.« less

  3. GPU-accelerated non-uniform fast Fourier transform-based compressive sensing spectral domain optical coherence tomography.

    PubMed

    Xu, Daguang; Huang, Yong; Kang, Jin U

    2014-06-16

    We implemented the graphics processing unit (GPU) accelerated compressive sensing (CS) non-uniform in k-space spectral domain optical coherence tomography (SD OCT). Kaiser-Bessel (KB) function and Gaussian function are used independently as the convolution kernel in the gridding-based non-uniform fast Fourier transform (NUFFT) algorithm with different oversampling ratios and kernel widths. Our implementation is compared with the GPU-accelerated modified non-uniform discrete Fourier transform (MNUDFT) matrix-based CS SD OCT and the GPU-accelerated fast Fourier transform (FFT)-based CS SD OCT. It was found that our implementation has comparable performance to the GPU-accelerated MNUDFT-based CS SD OCT in terms of image quality while providing more than 5 times speed enhancement. When compared to the GPU-accelerated FFT based-CS SD OCT, it shows smaller background noise and less side lobes while eliminating the need for the cumbersome k-space grid filling and the k-linear calibration procedure. Finally, we demonstrated that by using a conventional desktop computer architecture having three GPUs, real-time B-mode imaging can be obtained in excess of 30 fps for the GPU-accelerated NUFFT based CS SD OCT with frame size 2048(axial) × 1,000(lateral).

  4. GPU-Meta-Storms: computing the structure similarities among massive amount of microbial community samples using GPU.

    PubMed

    Su, Xiaoquan; Wang, Xuetao; Jing, Gongchao; Ning, Kang

    2014-04-01

    The number of microbial community samples is increasing with exponential speed. Data-mining among microbial community samples could facilitate the discovery of valuable biological information that is still hidden in the massive data. However, current methods for the comparison among microbial communities are limited by their ability to process large amount of samples each with complex community structure. We have developed an optimized GPU-based software, GPU-Meta-Storms, to efficiently measure the quantitative phylogenetic similarity among massive amount of microbial community samples. Our results have shown that GPU-Meta-Storms would be able to compute the pair-wise similarity scores for 10 240 samples within 20 min, which gained a speed-up of >17 000 times compared with single-core CPU, and >2600 times compared with 16-core CPU. Therefore, the high-performance of GPU-Meta-Storms could facilitate in-depth data mining among massive microbial community samples, and make the real-time analysis and monitoring of temporal or conditional changes for microbial communities possible. GPU-Meta-Storms is implemented by CUDA (Compute Unified Device Architecture) and C++. Source code is available at http://www.computationalbioenergy.org/meta-storms.html.

  5. The National Geoelectromagnetic Facility - an open access resource for ultra wideband electromagnetic geophysics (Invited)

    NASA Astrophysics Data System (ADS)

    Schultz, A.; Urquhart, S.; Slater, M.

    2010-12-01

    At present, the US academic community has access to two national electromagnetic (EM) instrument pools that support long-period magnetotelluric (MT) equipment suitable for crust-mantle scale studies. The requirements of near surface geophysics, hydrology, glaciology, as well as the full range of crust and mantle investigations require development of new capabilities in data acquisition with broader frequency bandwidth than these existing units, increased instrument numbers, and concomitant developments in 3D/4D data interpretation. NSF Major Research Instrumentation support has been obtained to meet these requirements by developing an initial set of next-generation instruments as a National Geoelectromagnetic Facility (NGF), available to all PIs on a cost recovery basis, and operated by Oregon State University (OSU). In contrast to existing instruments with data acquisition systems specialized to operate within specific frequency bands and for specific electromagnetic methods, the NGF model "Zen/5" instruments being co-developed by OSU and Zonge Research and Engineering Organization are based on modular receivers with a flexible number of digital and analog input channels, designed to acquire EM data at dc, and from frequencies ranging from micro-Hz to MHz. These systems can be deployed in a compact, low power configuration for extended deployments (e.g. for crust-mantle scale experiments), or in a high frequency sampling mode for near surface work. The NGF is also acquiring controlled source EM transmitters, so that investigators may carry out magnetotelluric, audio-MT, radiofrequency-MT, as well as time-domain/transient EM and DC resistivity studies. The instruments are designed to simultaneously accommodate multiple electric field dipole sensors, magnetic fluxgates and induction coil sensors. Sample rates as high as 2.5 MHz with resolution between 24 and 32 bits, depending on sample rate, are specified to allow for high fidelity recording of waveforms. The NGF is accepting instrument use requests from investigators planning electromagnetic surveys via webform submission on its web site ngf.coas.oregonstate.edu. The site is also a port of entry to request access to the 46 long period magnetotelluric instruments also operated by OSU as national instrument pools. Cyberinfrastructure support is available to investigators, including field computers, EM data processing software, and access to a hybrid CPU-GPU parallel computing environment, currently configured with dual Intel Westmere hexacore CPUs and 960 NVidia Tesla and 1792 Nvidia Fermi GPU cores. The capabilities of the Zen/5 receivers will be presented, with examples of data acquired from a recent shallow water marine controlled source experiment conducted in coastal Oregon as part of an effort to locate a buried submarine pipeline, using a 1.1 KW 256 Hz signal source imposed on the pipeline from shore. A Zen/5 prototype instrument, modified for marine use through support by the Oregon Wave Energy Trust, demonstrated the marine capabilities of the NGF instrument design.

  6. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards

    PubMed Central

    Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G.

    2012-01-01

    In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids. The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable. In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation. We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards. PMID:22347787

  7. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards.

    PubMed

    Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G

    2011-07-01

    In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.

  8. Study on efficiency of time computation in x-ray imaging simulation base on Monte Carlo algorithm using graphics processing unit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Setiani, Tia Dwi, E-mail: tiadwisetiani@gmail.com; Suprijadi; Nuclear Physics and Biophysics Reaserch Division, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jalan Ganesha 10 Bandung, 40132

    Monte Carlo (MC) is one of the powerful techniques for simulation in x-ray imaging. MC method can simulate the radiation transport within matter with high accuracy and provides a natural way to simulate radiation transport in complex systems. One of the codes based on MC algorithm that are widely used for radiographic images simulation is MC-GPU, a codes developed by Andrea Basal. This study was aimed to investigate the time computation of x-ray imaging simulation in GPU (Graphics Processing Unit) compared to a standard CPU (Central Processing Unit). Furthermore, the effect of physical parameters to the quality of radiographic imagesmore » and the comparison of image quality resulted from simulation in the GPU and CPU are evaluated in this paper. The simulations were run in CPU which was simulated in serial condition, and in two GPU with 384 cores and 2304 cores. In simulation using GPU, each cores calculates one photon, so, a large number of photon were calculated simultaneously. Results show that the time simulations on GPU were significantly accelerated compared to CPU. The simulations on the 2304 core of GPU were performed about 64 -114 times faster than on CPU, while the simulation on the 384 core of GPU were performed about 20 – 31 times faster than in a single core of CPU. Another result shows that optimum quality of images from the simulation was gained at the history start from 10{sup 8} and the energy from 60 Kev to 90 Kev. Analyzed by statistical approach, the quality of GPU and CPU images are relatively the same.« less

  9. Hadoop-MCC: Efficient Multiple Compound Comparison Algorithm Using Hadoop.

    PubMed

    Hua, Guan-Jie; Hung, Che-Lun; Tang, Chuan Yi

    2018-01-01

    In the past decade, the drug design technologies have been improved enormously. The computer-aided drug design (CADD) has played an important role in analysis and prediction in drug development, which makes the procedure more economical and efficient. However, computation with big data, such as ZINC containing more than 60 million compounds data and GDB-13 with more than 930 million small molecules, is a noticeable issue of time-consuming problem. Therefore, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to copy with big chemical structure data efficiently. Hadoop-MCC gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. Hadoop framework adopts mapper/reducer computation model. In the proposed method, mappers response for fetching SMILES data segments and perform LINGO method on GPU, then reducers collect all comparison results produced by mappers. Due to the high availability of Hadoop, all of LINGO computational jobs on mappers can be completed, even if some of the mappers encounter problems. A comparison of LINGO is performed on each the GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device. Hadoop-MCC is able to achieve scalability, high availability, and fault tolerance granted by Hadoop, and high performance as well by integrating computational power of both of Hadoop and GPU. It has been shown that using the heterogeneous architecture as Hadoop-MCC effectively can enhance better computational performance than on a single GPU device. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  10. GPU Computing in Bayesian Inference of Realized Stochastic Volatility Model

    NASA Astrophysics Data System (ADS)

    Takaishi, Tetsuya

    2015-01-01

    The realized stochastic volatility (RSV) model that utilizes the realized volatility as additional information has been proposed to infer volatility of financial time series. We consider the Bayesian inference of the RSV model by the Hybrid Monte Carlo (HMC) algorithm. The HMC algorithm can be parallelized and thus performed on the GPU for speedup. The GPU code is developed with CUDA Fortran. We compare the computational time in performing the HMC algorithm on GPU (GTX 760) and CPU (Intel i7-4770 3.4GHz) and find that the GPU can be up to 17 times faster than the CPU. We also code the program with OpenACC and find that appropriate coding can achieve the similar speedup with CUDA Fortran.

  11. HASEonGPU-An adaptive, load-balanced MPI/GPU-code for calculating the amplified spontaneous emission in high power laser media

    NASA Astrophysics Data System (ADS)

    Eckert, C. H. J.; Zenker, E.; Bussmann, M.; Albach, D.

    2016-10-01

    We present an adaptive Monte Carlo algorithm for computing the amplified spontaneous emission (ASE) flux in laser gain media pumped by pulsed lasers. With the design of high power lasers in mind, which require large size gain media, we have developed the open source code HASEonGPU that is capable of utilizing multiple graphic processing units (GPUs). With HASEonGPU, time to solution is reduced to minutes on a medium size GPU cluster of 64 NVIDIA Tesla K20m GPUs and excellent speedup is achieved when scaling to multiple GPUs. Comparison of simulation results to measurements of ASE in Y b 3 + : Y AG ceramics show perfect agreement.

  12. Local Alignment Tool Based on Hadoop Framework and GPU Architecture

    PubMed Central

    Hung, Che-Lun; Hua, Guan-Jie

    2014-01-01

    With the rapid growth of next generation sequencing technologies, such as Slex, more and more data have been discovered and published. To analyze such huge data the computational performance is an important issue. Recently, many tools, such as SOAP, have been implemented on Hadoop and GPU parallel computing architectures. BLASTP is an important tool, implemented on GPU architectures, for biologists to compare protein sequences. To deal with the big biology data, it is hard to rely on single GPU. Therefore, we implement a distributed BLASTP by combining Hadoop and multi-GPUs. The experimental results present that the proposed method can improve the performance of BLASTP on single GPU, and also it can achieve high availability and fault tolerance. PMID:24955362

  13. Local alignment tool based on Hadoop framework and GPU architecture.

    PubMed

    Hung, Che-Lun; Hua, Guan-Jie

    2014-01-01

    With the rapid growth of next generation sequencing technologies, such as Slex, more and more data have been discovered and published. To analyze such huge data the computational performance is an important issue. Recently, many tools, such as SOAP, have been implemented on Hadoop and GPU parallel computing architectures. BLASTP is an important tool, implemented on GPU architectures, for biologists to compare protein sequences. To deal with the big biology data, it is hard to rely on single GPU. Therefore, we implement a distributed BLASTP by combining Hadoop and multi-GPUs. The experimental results present that the proposed method can improve the performance of BLASTP on single GPU, and also it can achieve high availability and fault tolerance.

  14. A survey of CPU-GPU heterogeneous computing techniques

    DOE PAGES

    Mittal, Sparsh; Vetter, Jeffrey S.

    2015-07-04

    As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and applicationmore » level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.« less

  15. GPU accelerated manifold correction method for spinning compact binaries

    NASA Astrophysics Data System (ADS)

    Ran, Chong-xi; Liu, Song; Zhong, Shuang-ying

    2018-04-01

    The graphics processing unit (GPU) acceleration of the manifold correction algorithm based on the compute unified device architecture (CUDA) technology is designed to simulate the dynamic evolution of the Post-Newtonian (PN) Hamiltonian formulation of spinning compact binaries. The feasibility and the efficiency of parallel computation on GPU have been confirmed by various numerical experiments. The numerical comparisons show that the accuracy on GPU execution of manifold corrections method has a good agreement with the execution of codes on merely central processing unit (CPU-based) method. The acceleration ability when the codes are implemented on GPU can increase enormously through the use of shared memory and register optimization techniques without additional hardware costs, implying that the speedup is nearly 13 times as compared with the codes executed on CPU for phase space scan (including 314 × 314 orbits). In addition, GPU-accelerated manifold correction method is used to numerically study how dynamics are affected by the spin-induced quadrupole-monopole interaction for black hole binary system.

  16. A survey of CPU-GPU heterogeneous computing techniques

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh; Vetter, Jeffrey S.

    As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and applicationmore » level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.« less

  17. CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications

    PubMed Central

    2012-01-01

    Background Prediction of ribonucleic acid (RNA) secondary structure remains one of the most important research areas in bioinformatics. The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction. Thus far, few studies have been reported on the acceleration of the Zuker algorithm on general-purpose processors or on extra accelerators such as Field Programmable Gate-Array (FPGA) and Graphics Processing Units (GPU). To the best of our knowledge, no implementation combines both CPU and extra accelerators, such as GPUs, to accelerate the Zuker algorithm applications. Results In this paper, a CPU-GPU hybrid computing system that accelerates Zuker algorithm applications for RNA secondary structure prediction is proposed. The computing tasks are allocated between CPU and GPU for parallel cooperate execution. Performance differences between the CPU and the GPU in the task-allocation scheme are considered to obtain workload balance. To improve the hybrid system performance, the Zuker algorithm is optimally implemented with special methods for CPU and GPU architecture. Conclusions Speedup of 15.93× over optimized multi-core SIMD CPU implementation and performance advantage of 16% over optimized GPU implementation are shown in the experimental results. More than 14% of the sequences are executed on CPU in the hybrid system. The system combining CPU and GPU to accelerate the Zuker algorithm is proven to be promising and can be applied to other bioinformatics applications. PMID:22369626

  18. Profilaxia da trombose venosa profunda em cirurgia bariátrica: estudo comparativo com doses diferentes de heparina de baixo peso molecular

    PubMed Central

    Goslan, Carlos José; Baretta, Giórgio Alfredo Pedroso; de Souza, Hemuara Grasiela Pestana; Orsi, Bruna Zanin; Zanoni, Esdras Camargo A.; Lopes, Marco Antonio Gimenez; Engelhorn, Carlos Alberto

    2018-01-01

    Resumo Contexto A cirurgia bariátrica é considerada a melhor opção para o tratamento da obesidade, cujos pacientes são considerados de alto risco para fenômenos tromboembólicos. Objetivos Comparar o uso de doses diferentes de heparina de baixo peso molecular (HBPM) na profilaxia da trombose venosa profunda (TVP) em pacientes candidatos à cirurgia bariátrica em relação ao risco de TVP, alteração na dosagem do fator anti-Xa e sangramento pré ou pós-operatório. Métodos Estudo comparativo transversal em pacientes submetidos à cirurgia bariátrica distribuídos em dois grupos, que receberam doses de HBPM de 40 mg (grupo controle, GC) e 80 mg (grupo de estudo, GE). Foram avaliados por ultrassonografia vascular e dosagem de KPTT, TAP, plaquetas e fator anti-Xa. Resultados Foram avaliados 60 pacientes, sendo 34 no GC e 26 no GE. Foi observada diferença significativa somente no peso (p = 0,003) e índice de massa corporal (p = 0,018) no GE em relação ao GC. Não houve diferença na dosagem de KPTT, TAP, plaquetas e fator anti-Xa entre os grupos. Não foram detectados TVP ou sangramentos significativos em ambos os grupos. Conclusões Não houve diferença estatisticamente significativa na utilização de doses maiores de HBPM na profilaxia da TVP em pacientes candidatos à cirurgia bariátrica em relação ao risco de TVP, dosagem do fator anti-Xa e sangramento pré ou pós-operatório.

  19. Lanthanide oxide and phosphate nanoparticles for thermometry and bimodal imaging =

    NASA Astrophysics Data System (ADS)

    Debasu, Mengistie Leweyehu

    Nesta tese relatam-se estudos de fotoluminescencia de nanoparticulas de oxidos e fosfatos dopados com ioes trivalentes de lantanideos, respectivamente, nanobastonetes de (Gd,Eu)2O3 e (Gd,Yb,Er)2O3 e nanocristais de (Gd,Yb,Tb)PO4, demonstrando-se tambem aplicacoes destes materiais em revestimentos inteligentes, sensores de temperatura e bioimagem. Estuda-se a transferencia de energia entre os sitios de Eu3+ C2 e S6 dos nanobastonetes Gd2O3. A contribuicao dos mecanismos de transferencia de energia entre sitios para o tempo de subida 5D0(C2) e descartada a favor da relaxacao directa 5D1(C2) 5D0(C2) (i.e., transferencia de energia entre niveis). O maior tempo de decaimento do nivel 5D0(C2) nos nanobastonetes, relativamente ao valor medido para o mesmo material na forma de microcristais, e atribuido, quer a existencia de espacos livres entre nanobastonetes proximos (factor de enchimento ou fraccao volumica), quer a variacao do indice de refraccao efectivo do meio em torno dos ioes Eu3+. A dispersao de nanobastonetes de (Gd,Eu)2O3 em tres resinas epoxi comerciais atraves da cura por UV permite obter nanocompositos epoxi- (Gd,Eu)2O3. Relatam-se estudos cineticos e das propriedades termicas e de fotoluminescencia destes nanocompositos. Estes, preservam as tipicas propriedades de emissao do Eu3+, mostrando o potencial do metodo de cura por UV para obter revistimentos inteligentes e fotoactivos. Considera-se um avanco significativo a realizacao de uma nanoplataforma optica, incorporando aquecedor e termometro e capaz de medir uma ampla gama de temperaturas (300-2000 K) a escala nano, baseada em nanobastonetes de (Gd,Yb,Er)2O3 (termometros) cuja superficie se encontra revestida com nanoparticulas de ouro. A temperature local e calculada usando, quer a distribuicao de Boltzmann (300-1050 K) do racio de intensidades da conversao ascendente 2H11=2!4I15=2/4S3=2!4I15=2, quer a lei de Planck (1200-2000 K) para uma emissao de luz branca atribuida a radiacao do corpo negro. Finalmente, estudam-se as propriedades de fotoluminescencia correspondentes as conversoes ascendente e descendente de energia em nanocristais de (Gd,Yb,Tb)PO4 sintetizados por via hidrotermica. A relaxividade (ressonancia magnetica) do 1H destes materiais sao investigadas, tendo em vista possiveis aplicacoes em imagem bimodal (luminescencia e ressonancia magnetica nuclear).

  20. On acoplamento mecânico entre a antena e o transdutor no detector de ondas gravitacionais Mario Schenberg

    NASA Astrophysics Data System (ADS)

    Melo, J. L.; Aguiar, O. D.; Velloso, W. F., Jr.; Lucena, A. U.

    2003-08-01

    O detector de ondas gravitacionais MARIO SCHENBERG consistirá de uma massa esférica de cobre-alumínio de 1150kg resfriada a 4K, sobre a qual serão instalados 6 transdutores de nióbio. Com estes trandutores pretende-se converter um possível sinal de onda gravitacional detectado em sinal elétrico, para tanto é necessário que o acoplamento mecânico entre os transdutores e a massa ressonante seja o maior possível. Isto significa que o transdutor deve ser ressonante na mesma freqüência que a antena (aproximadamente 3200Hz). Neste trabalho foi desenvolvida uma geometria para a estrutura mecânica do trandutor. Isto foi feito criando-se modelos em elementos finitos usando-se o "software" MSC/Nastran. Estes modelos criados foram analisados estaticamente (cálculo de tensões) e dinamicamente (cálculo das freqüências de ressonâncias e seus respectivos modos normais) de maneira a se obter o primeiro modo normal do transdutor em 3200Hz. A partir destes cálculos escolheu-se a melhor geometria para o transdutor. Os próximos passos do trabalho serão: usinar este transdutor em uma barra de nióbio e testá-lo à temperatura ambiente e à baixa temperatura. Após isto, pretende-se testá-lo na própria antena resfriada.

  1. Social networks in nursing work processes: an integrative literature review.

    PubMed

    Mesquita, Ana Cláudia; Zamarioli, Cristina Mara; Fulquini, Francine Lima; Carvalho, Emilia Campos de; Angerami, Emilia Luigia Saporiti

    2017-03-20

    To identify and analyze the available evidence in the literature on the use of social networks in nursing work processes. An integrative review of the literature conducted in PubMed, CINAHL, EMBASE and LILACS databases in January 2016, using the descriptors social media, social networking, nursing, enfermagem, redes sociais, mídias sociais, and the keyword nursing practice, without year restriction. The sample consisted of 27 international articles which were published between 2011 and 2016. The social networks used were Facebook (66.5%), Twitter (30%) and WhatsApp (3.5%). In 70.5% of the studies, social networks were used for research purposes, in 18.5% they were used as a tool aimed to assist students in academic activities, and in 11% for executing interventions via the internet. Nurses have used social networks in their work processes such as Facebook, Twitter and WhatsApp to research, teach and watch. The articles show several benefits in using such tools in the nursing profession; however, ethical considerations regarding the use of social networks deserve further discussion. Identificar e analisar as evidências disponíveis na literatura sobre a utilização de redes sociais nos processos de trabalho em enfermagem. Revisão integrativa da literatura realizada em janeiro de 2016, nas bases de dados PubMed, CINAHL, EMBASE e LILACS, com os descritores social media, social networking, nursing, enfermagem, redes sociais, mídias sociais e a palavra-chave nursing practice, sem restrição de ano. A amostra foi composta por 27 artigos, os quais foram publicados entre 2011 e 2016, todos internacionais. As redes sociais utilizadas foram o Facebook (66,5%), o Twitter (30%) e o WhatsApp (3,5%). Em 70,5% dos estudos as redes sociais foram utilizadas para fins de pesquisa, em 18,5% como ferramenta para auxiliar estudantes nas atividades acadêmicas, e em 11% para a realização de intervenções via internet. Em seus processos de trabalho, os enfermeiros têm utilizado as redes sociais Facebook, Twitter e WhatsApp para pesquisar, ensinar e assistir. Os artigos evidenciam diversos benefícios sobre o uso de tais ferramentas na profissão de enfermagem, entretanto, as considerações éticas a respeito da utilização das redes sociais merecem maior discussão.

  2. A atuação do Observatório Nacional registrada nos relatórios ministeriais 1889 a 1930

    NASA Astrophysics Data System (ADS)

    Rodrigues, T.

    2003-08-01

    O período republicano até 1930 foi marcante na história do Observatório Nacional. Diversas reformas levaram a instituição a três ministérios diferentes e mudaram a ênfase do seu trabalho. A tão aguardada mudança para uma nova sede, em São Cristóvão, em 1920, não foi suficiente para que a instituição acompanhasse o ritmo tomado pela astronomia no mundo e se firmasse como ambiente de pesquisa. Uma análise simplificada poderia caracterizar um período de produção científica insignificante, dado o distanciamento da instituição dos novos rumos da astrofísica e da rápida inovação dos instrumentos, além do pequeno volume de publicações. Era uma época em que ainda não existiam os mecanismos formais de apoio e avaliação da atividade científica. Esse trabalho procura identificar a real atividade do Observatório no conteúdo dos Relatórios Ministeriais que, ao final de cada ano, apresentava as atividades, sucessos e problemas enfrentados pela instituição. Questões como instrumental e recursos humanos necessários; entraves burocráticos e financeiros; e articulações com outros observatórios se complementaram entre si ao longo desses anos para definir o perfil institucional e alguns aspectos fundamentais para a construção da astronomia no país. É possível concluir que a ênfase em serviços geográficos e de meteorologia, ao lado da inadequação dos instrumentos e do local, quase fizeram desaparecer a pesquisa em astronomia. Porém, vale destacar a sobrevivência de alguns trabalhos, como, por exemplo, variação de latitude e observação de estrelas duplas que mantiveram importante intercâmbio com outros grupos de pesquisa, demonstrando o constante esforço dos astrônomos e das diretorias em defesa da atividade científica.

  3. Dense GPU-enhanced surface reconstruction from stereo endoscopic images for intraoperative registration.

    PubMed

    Rohl, Sebastian; Bodenstedt, Sebastian; Suwelack, Stefan; Dillmann, Rudiger; Speidel, Stefanie; Kenngott, Hannes; Muller-Stich, Beat P

    2012-03-01

    In laparoscopic surgery, soft tissue deformations substantially change the surgical site, thus impeding the use of preoperative planning during intraoperative navigation. Extracting depth information from endoscopic images and building a surface model of the surgical field-of-view is one way to represent this constantly deforming environment. The information can then be used for intraoperative registration. Stereo reconstruction is a typical problem within computer vision. However, most of the available methods do not fulfill the specific requirements in a minimally invasive setting such as the need of real-time performance, the problem of view-dependent specular reflections and large curved areas with partly homogeneous or periodic textures and occlusions. In this paper, the authors present an approach toward intraoperative surface reconstruction based on stereo endoscopic images. The authors describe our answer to this problem through correspondence analysis, disparity correction and refinement, 3D reconstruction, point cloud smoothing and meshing. Real-time performance is achieved by implementing the algorithms on the gpu. The authors also present a new hybrid cpu-gpu algorithm that unifies the advantages of the cpu and the gpu version. In a comprehensive evaluation using in vivo data, in silico data from the literature and virtual data from a newly developed simulation environment, the cpu, the gpu, and the hybrid cpu-gpu versions of the surface reconstruction are compared to a cpu and a gpu algorithm from the literature. The recommended approach toward intraoperative surface reconstruction can be conducted in real-time depending on the image resolution (20 fps for the gpu and 14fps for the hybrid cpu-gpu version on resolution of 640 × 480). It is robust to homogeneous regions without texture, large image changes, noise or errors from camera calibration, and it reconstructs the surface down to sub millimeter accuracy. In all the experiments within the simulation environment, the mean distance to ground truth data is between 0.05 and 0.6 mm for the hybrid cpu-gpu version. The hybrid cpu-gpu algorithm shows a much more superior performance than its cpu and gpu counterpart (mean distance reduction 26% and 45%, respectively, for the experiments in the simulation environment). The recommended approach for surface reconstruction is fast, robust, and accurate. It can represent changes in the intraoperative environment and can be used to adapt a preoperative model within the surgical site by registration of these two models.

  4. Test-retest reliability of Brazilian version of Memorial Symptom Assessment Scale for assessing symptoms in cancer patients.

    PubMed

    Menezes, Josiane Roberta de; Luvisaro, Bianca Maria Oliveira; Rodrigues, Claudia Fernandes; Muzi, Camila Drumond; Guimarães, Raphael Mendonça

    2017-01-01

    To assess the test-retest reliability of the Memorial Symptom Assessment Scale translated and culturally adapted into Brazilian Portuguese. The scale was applied in an interview format for 190 patients with various cancers type hospitalized in clinical and surgical sectors of the Instituto Nacional de Câncer José de Alencar Gomes da Silva and reapplied in 58 patients. Data from the test-retest were double typed into a Microsoft Excel spreadsheet and analyzed by the weighted Kappa. The reliability of the scale was satisfactory in test-retest. The weighted Kappa values obtained for each scale item had to be adequate, the largest item was 0.96 and the lowest was 0.69. The Kappa subscale was also evaluated and values were 0.84 for high frequency physic symptoms, 0.81 for low frequency physical symptoms, 0.81 for psychological symptoms, and 0.78 for Global Distress Index. High level of reliability estimated suggests that the process of measurement of Memorial Symptom Assessment Scale aspects was adequate. Avaliar a confiabilidade teste-reteste da versão traduzida e adaptada culturalmente para o português do Brasil do Memorial Symptom Assessment Scale. A escala foi aplicada em forma de entrevista em 190 pacientes com diversos tipos de câncer internados nos setores clínicos e cirúrgicos do Instituto Nacional de Câncer José de Alencar Gomes da Silva e reaplicada em 58 pacientes. Os dados dos testes-retestes foram inseridos num banco de dados por dupla digitação independente em Excel e analisados pelo Kappa ponderado. A confiabilidade da escala mostrou-se satisfatória nos testes-retestes. Os valores do Kappa ponderado obtidos para cada item da escala apresentaram-se adequados, sendo o maior item de 0,96 e o menor de 0,69. Também se avaliou o Kappa das subescalas, sendo de 0,84 para sintomas físicos de alta frequência, de 0,81 para sintomas físicos de baixa frequência, de 0,81 também para sintomas psicológicos, e de 0,78 para Índice Geral de Sofrimento. Altos níveis de confiabilidade estimados permitem concluir que o processo de aferição dos itens do Memorial Symptom Assessment Scale foi adequado.

  5. Um supressor de fundo térmico para a câmara infravermelha CamIV

    NASA Astrophysics Data System (ADS)

    Jablonski, F.; Laporte, R.

    2003-08-01

    O ângulo sólido subtendido pelos pixels na câmara infravermelha do NexGal (CamIV) que operamos no OPD/LNA contém contribuições provenientes do sistema de coleta de fluxo propriamente dito - sendo esta a parte que interessa para as medidas astronômicas - e contribuições da obstrução central, sistema de suporte do espelho secundário e região exterior à pupila de entrada do telescópio. Estas últimas contribuições são devi-das à emissão de corpo negro à temperatura ambiente e aumentam exponencialmente para comprimentos de onda maiores que 2 micra (banda K, no infravermelho próximo). Embora a resultante pode ser quantificada e subtraída dos sinais relevantes, sua variância se adiciona à variância do sinal, e pode ser facilmente a contribuição domi-nante para a incerteza final das medidas, tornando ineficiente o processo de extração de informação e degradando a sensibilidade da câmara. A maneira clássica de resolver esse problema em sistemas ópticos que operam no infravermelho, onde os efeitos da emissão térmica do ambiente são importantes, é restringir o ângulo sólido subtendido pelos pixels individuais exclusivamente aos raios provenientes do sistema óptico. Para tanto, projeta-se uma imagem real, bastante reduzida, da pupila de entrada do sistema óptico num anteparo que transmita para o sistema de imageamento só o que interessa, bloqueando as contribuições das bordas externas à pupila de entrada, obstrução central do telescópio e sistema de suporte. Como a projeção é realizada em ambiente criogênico, a contribuição térmica espúria é efetivamente eliminada. Nós optamos por um sistema do tipo Offner para implementar na prática esta função. Trata-se de um sistema baseado em espelhos esféricos, bastante compacto e ajustado por construção. A opção por espelhos do mesmo material que o sistema de suporte (Alumínio) minimiza a dilatação diferencial, crítica nesse tipo de aplicação. Apresentamos as soluções detalhadas do projeto óptico-mecânico, bem como uma análise de flexões e desempenho em termos de qualidade de imagem.

  6. GPU real-time processing in NA62 trigger system

    NASA Astrophysics Data System (ADS)

    Ammendola, R.; Biagioni, A.; Chiozzi, S.; Cretaro, P.; Di Lorenzo, S.; Fantechi, R.; Fiorini, M.; Frezza, O.; Lamanna, G.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Neri, I.; Paolucci, P. S.; Pastorelli, E.; Piandani, R.; Piccini, M.; Pontisso, L.; Rossetti, D.; Simula, F.; Sozzi, M.; Vicini, P.

    2017-01-01

    A commercial Graphics Processing Unit (GPU) is used to build a fast Level 0 (L0) trigger system tested parasitically with the TDAQ (Trigger and Data Acquisition systems) of the NA62 experiment at CERN. In particular, the parallel computing power of the GPU is exploited to perform real-time fitting in the Ring Imaging CHerenkov (RICH) detector. Direct GPU communication using a FPGA-based board has been used to reduce the data transmission latency. The performance of the system for multi-ring reconstrunction obtained during the NA62 physics run will be presented.

  7. gWEGA: GPU-accelerated WEGA for molecular superposition and shape comparison.

    PubMed

    Yan, Xin; Li, Jiabo; Gu, Qiong; Xu, Jun

    2014-06-05

    Virtual screening of a large chemical library for drug lead identification requires searching/superimposing a large number of three-dimensional (3D) chemical structures. This article reports a graphic processing unit (GPU)-accelerated weighted Gaussian algorithm (gWEGA) that expedites shape or shape-feature similarity score-based virtual screening. With 86 GPU nodes (each node has one GPU card), gWEGA can screen 110 million conformations derived from an entire ZINC drug-like database with diverse antidiabetic agents as query structures within 2 s (i.e., screening more than 55 million conformations per second). The rapid screening speed was accomplished through the massive parallelization on multiple GPU nodes and rapid prescreening of 3D structures (based on their shape descriptors and pharmacophore feature compositions). Copyright © 2014 Wiley Periodicals, Inc.

  8. Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations

    NASA Astrophysics Data System (ADS)

    Hause, Benjamin; Parker, Scott

    2012-10-01

    We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the GPU accelerator compiler directives. We have implemented the GPU acceleration on a Core I7 gaming PC with a NVIDIA GTX 580 GPU. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. Optimization strategies and comparisons between DIRAC and the gaming PC will be presented. We will also discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.

  9. Understanding GPU Power. A Survey of Profiling, Modeling, and Simulation Methods

    DOE PAGES

    Bridges, Robert A.; Imam, Neena; Mintz, Tiffany M.

    2016-09-01

    Modern graphics processing units (GPUs) have complex architectures that admit exceptional performance and energy efficiency for high throughput applications.Though GPUs consume large amounts of power, their use for high throughput applications facilitate state-of-the-art energy efficiency and performance. Consequently, continued development relies on understanding their power consumption. Our work is a survey of GPU power modeling and profiling methods with increased detail on noteworthy efforts. Moreover, as direct measurement of GPU power is necessary for model evaluation and parameter initiation, internal and external power sensors are discussed. Hardware counters, which are low-level tallies of hardware events, share strong correlation to powermore » use and performance. Statistical correlation between power and performance counters has yielded worthwhile GPU power models, yet the complexity inherent to GPU architectures presents new hurdles for power modeling. Developments and challenges of counter-based GPU power modeling is discussed. Often building on the counter-based models, research efforts for GPU power simulation, which make power predictions from input code and hardware knowledge, provide opportunities for optimization in programming or architectural design. Noteworthy strides in power simulations for GPUs are included along with their performance or functional simulator counterparts when appropriate. Lastly, possible directions for future research are discussed.« less

  10. Acoustic reverse-time migration using GPU card and POSIX thread based on the adaptive optimal finite-difference scheme and the hybrid absorbing boundary condition

    NASA Astrophysics Data System (ADS)

    Cai, Xiaohui; Liu, Yang; Ren, Zhiming

    2018-06-01

    Reverse-time migration (RTM) is a powerful tool for imaging geologically complex structures such as steep-dip and subsalt. However, its implementation is quite computationally expensive. Recently, as a low-cost solution, the graphic processing unit (GPU) was introduced to improve the efficiency of RTM. In the paper, we develop three ameliorative strategies to implement RTM on GPU card. First, given the high accuracy and efficiency of the adaptive optimal finite-difference (FD) method based on least squares (LS) on central processing unit (CPU), we study the optimal LS-based FD method on GPU. Second, we develop the CPU-based hybrid absorbing boundary condition (ABC) to the GPU-based one by addressing two issues of the former when introduced to GPU card: time-consuming and chaotic threads. Third, for large-scale data, the combinatorial strategy for optimal checkpointing and efficient boundary storage is introduced for the trade-off between memory and recomputation. To save the time of communication between host and disk, the portable operating system interface (POSIX) thread is utilized to create the other CPU core at the checkpoints. Applications of the three strategies on GPU with the compute unified device architecture (CUDA) programming language in RTM demonstrate their efficiency and validity.

  11. GPU-based prompt gamma ray imaging from boron neutron capture therapy.

    PubMed

    Yoon, Do-Kun; Jung, Joo-Young; Jo Hong, Key; Sil Lee, Keum; Suk Suh, Tae

    2015-01-01

    The purpose of this research is to perform the fast reconstruction of a prompt gamma ray image using a graphics processing unit (GPU) computation from boron neutron capture therapy (BNCT) simulations. To evaluate the accuracy of the reconstructed image, a phantom including four boron uptake regions (BURs) was used in the simulation. After the Monte Carlo simulation of the BNCT, the modified ordered subset expectation maximization reconstruction algorithm using the GPU computation was used to reconstruct the images with fewer projections. The computation times for image reconstruction were compared between the GPU and the central processing unit (CPU). Also, the accuracy of the reconstructed image was evaluated by a receiver operating characteristic (ROC) curve analysis. The image reconstruction time using the GPU was 196 times faster than the conventional reconstruction time using the CPU. For the four BURs, the area under curve values from the ROC curve were 0.6726 (A-region), 0.6890 (B-region), 0.7384 (C-region), and 0.8009 (D-region). The tomographic image using the prompt gamma ray event from the BNCT simulation was acquired using the GPU computation in order to perform a fast reconstruction during treatment. The authors verified the feasibility of the prompt gamma ray image reconstruction using the GPU computation for BNCT simulations.

  12. Fast distributed large-pixel-count hologram computation using a GPU cluster.

    PubMed

    Pan, Yuechao; Xu, Xuewu; Liang, Xinan

    2013-09-10

    Large-pixel-count holograms are one essential part for big size holographic three-dimensional (3D) display, but the generation of such holograms is computationally demanding. In order to address this issue, we have built a graphics processing unit (GPU) cluster with 32.5 Tflop/s computing power and implemented distributed hologram computation on it with speed improvement techniques, such as shared memory on GPU, GPU level adaptive load balancing, and node level load distribution. Using these speed improvement techniques on the GPU cluster, we have achieved 71.4 times computation speed increase for 186M-pixel holograms. Furthermore, we have used the approaches of diffraction limits and subdivision of holograms to overcome the GPU memory limit in computing large-pixel-count holograms. 745M-pixel and 1.80G-pixel holograms were computed in 343 and 3326 s, respectively, for more than 2 million object points with RGB colors. Color 3D objects with 1.02M points were successfully reconstructed from 186M-pixel hologram computed in 8.82 s with all the above three speed improvement techniques. It is shown that distributed hologram computation using a GPU cluster is a promising approach to increase the computation speed of large-pixel-count holograms for large size holographic display.

  13. Understanding GPU Power. A Survey of Profiling, Modeling, and Simulation Methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bridges, Robert A.; Imam, Neena; Mintz, Tiffany M.

    Modern graphics processing units (GPUs) have complex architectures that admit exceptional performance and energy efficiency for high throughput applications.Though GPUs consume large amounts of power, their use for high throughput applications facilitate state-of-the-art energy efficiency and performance. Consequently, continued development relies on understanding their power consumption. Our work is a survey of GPU power modeling and profiling methods with increased detail on noteworthy efforts. Moreover, as direct measurement of GPU power is necessary for model evaluation and parameter initiation, internal and external power sensors are discussed. Hardware counters, which are low-level tallies of hardware events, share strong correlation to powermore » use and performance. Statistical correlation between power and performance counters has yielded worthwhile GPU power models, yet the complexity inherent to GPU architectures presents new hurdles for power modeling. Developments and challenges of counter-based GPU power modeling is discussed. Often building on the counter-based models, research efforts for GPU power simulation, which make power predictions from input code and hardware knowledge, provide opportunities for optimization in programming or architectural design. Noteworthy strides in power simulations for GPUs are included along with their performance or functional simulator counterparts when appropriate. Lastly, possible directions for future research are discussed.« less

  14. Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

    PubMed

    Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

    2010-10-01

    Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

  15. Efficient methods for implementation of multi-level nonrigid mass-preserving image registration on GPUs and multi-threaded CPUs.

    PubMed

    Ellingwood, Nathan D; Yin, Youbing; Smith, Matthew; Lin, Ching-Long

    2016-04-01

    Faster and more accurate methods for registration of images are important for research involved in conducting population-based studies that utilize medical imaging, as well as improvements for use in clinical applications. We present a novel computation- and memory-efficient multi-level method on graphics processing units (GPU) for performing registration of two computed tomography (CT) volumetric lung images. We developed a computation- and memory-efficient Diffeomorphic Multi-level B-Spline Transform Composite (DMTC) method to implement nonrigid mass-preserving registration of two CT lung images on GPU. The framework consists of a hierarchy of B-Spline control grids of increasing resolution. A similarity criterion known as the sum of squared tissue volume difference (SSTVD) was adopted to preserve lung tissue mass. The use of SSTVD consists of the calculation of the tissue volume, the Jacobian, and their derivatives, which makes its implementation on GPU challenging due to memory constraints. The use of the DMTC method enabled reduced computation and memory storage of variables with minimal communication between GPU and Central Processing Unit (CPU) due to ability to pre-compute values. The method was assessed on six healthy human subjects. Resultant GPU-generated displacement fields were compared against the previously validated CPU counterpart fields, showing good agreement with an average normalized root mean square error (nRMS) of 0.044±0.015. Runtime and performance speedup are compared between single-threaded CPU, multi-threaded CPU, and GPU algorithms. Best performance speedup occurs at the highest resolution in the GPU implementation for the SSTVD cost and cost gradient computations, with a speedup of 112 times that of the single-threaded CPU version and 11 times over the twelve-threaded version when considering average time per iteration using a Nvidia Tesla K20X GPU. The proposed GPU-based DMTC method outperforms its multi-threaded CPU version in terms of runtime. Total registration time reduced runtime to 2.9min on the GPU version, compared to 12.8min on twelve-threaded CPU version and 112.5min on a single-threaded CPU. Furthermore, the GPU implementation discussed in this work can be adapted for use of other cost functions that require calculation of the first derivatives. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  16. Elder-friendly emergency services in Brazil: necessary conditions for care.

    PubMed

    Santos, Mariana Timmers Dos; Lima, Maria Alice Dias da Silva; Zucatti, Paula Buchs

    2016-01-01

    To identify and analyze the aspects necessary to provide an elder-friendly emergency service (ES) from the perspective of nurses. This is a descriptive, quantitative study using the Delphi technique in three rounds. Nurses with professional experience in the ES and/or researchers with publications and/or conducting research in the study area were selected. The first round of the Delphi panel had 72 participants, the second 49, and the third 44. An online questionnaire was used based on a review of the scientific literature with questions organized into the central dimensions of elder-friendly hospitals. A five-point Likert scale was used for each question and a 70% consensus level was established. There were 38 aspects identified as necessary for elderly care that were organized into central dimensions. The study's results are consistent with the findings in scientific literature and suggest indicators for quality of care and training for an elder-friendly ES. Identificar e analisar aspectos necessários para um atendimento amigo do idoso nos serviços de emergência (SE), na perspectiva de enfermeiros. Estudo descritivo, quantitativo, com utilização da Técnica Delphi, em três rodadas. Foram selecionados enfermeiros com experiência profissional em SE e/ou pesquisadores com publicações e/ou desenvolvendo pesquisas na área de estudo. A primeira rodada do painel Delphi contou com 72 participantes, a segunda com 49 e a terceira com 44. Foi utilizado questionário on-line, baseado na revisão da literatura científica, com questões organizadas em dimensões centrais de hospitais amigos do idoso. Foi utilizada uma escala de Likert de 5 pontos para cada questão e estabelecido nível de consenso de 70%. Foram identificados 38 aspectos necessários para o atendimento ao idoso, organizados em dimensões centrais. Os resultados do estudo são consistentes com os achados na literatura científica e sugerem indicadores para qualidade do cuidado e para formação de SE amigos do idoso.

  17. Associação entre sintomas, veias varicosas e refluxo na veia safena magna ao eco-Doppler

    PubMed Central

    Seidel, Amélia Cristina; Campos, Mariana Baldini; Campos, Raquel Baldini; Harada, Dérica Sayuri; Rossi, Robson Marcelo; Cavalari, Pedro; Miranda, Fausto

    2017-01-01

    Resumo Contexto A doença venosa crônica requer avaliação clínica, quantificação dos efeitos hemodinâmicos e definição da distribuição anatômica para decisão diagnóstica e tratamento. Métodos Estudo prospectivo realizado em 2015 com amostra de 1.384 pacientes (2.669 membros) com idade entre 17 e 85 anos, sendo 1.227 do sexo feminino. Nas respostas do questionário aplicado, os sintomas pesquisados eram dor, cansaço, sensação de peso, queimação, câimbras e formigamento. Para a formação dos grupos, foi considerado o número de membros, distribuídos em relação ao gênero, ao índice de massa corporal e à idade. Após a definição grupos e a realização do eco-Doppler para estudo da veia safena magna (VSM), os pacientes foram distribuídos em três grupos (I: sintomas presentes e varizes ausentes, II: sintomas ausentes e varizes presentes e III: sintomas presentes e varizes presentes). A análise estatística utilizou o teste qui-quadrado ou exato de Fisher para verificar a homogeneidade entre os grupos. Em caso de associação com significância de 5%, foi calculada a razão de chances. Resultados Para ambos os gêneros, foi observada chance de insuficiência da VSM 11,2 vezes maior no grupo III. Por sua vez, os casos de obesidade mórbida ocorreram 9,1 vezes mais no mesmo grupo. Além disso, pacientes na faixa etária entre 30 e 50 anos desse grupo apresentaram chance de insuficiência da VSM 43,1 vezes maior. Conclusões A insuficiência da VSM foi significantemente mais frequente no grupo III, tanto globalmente como considerando apenas os casos de obesidade mórbida e a faixa etária mais elevada. PMID:29930616

  18. Busca de estruturas em grandes escalas em altos redshifts

    NASA Astrophysics Data System (ADS)

    Boris, N. V.; Sodré, L., Jr.; Cypriano, E.

    2003-08-01

    A busca por estruturas em grandes escalas (aglomerados de galáxias, por exemplo) é um ativo tópico de pesquisas hoje em dia, pois a detecção de um único aglomerado em altos redshifts pode por vínculos fortes sobre os modelos cosmológicos. Neste projeto estamos fazendo uma busca de estruturas distantes em campos contendo pares de quasares próximos entre si em z Â3 0.9. Os pares de quasares foram extraídos do catálogo de Véron-Cetty & Véron (2001) e estão sendo observados com os telescópios: 2,2m da University of Hawaii (UH), 2,5m do Observatório de Las Campanas e com o GEMINI. Apresentamos aqui a análise preliminar de um par de quasares observado nos filtros i'(7800 Å) e z'(9500 Å) com o GEMINI. A cor (i'-z') mostrou-se útil para detectar objetos "early-type" em redshifts menores que 1.1. No estudo do par 131046+0006/J131055+0008, com redshift ~ 0.9, o uso deste método possibilitou a detecção de sete objetos candidatos a galáxias "early-type". Num mapa da distribuição projetada dos objetos para 22 < i' < 25 observou-se que estas galáxias estão localizadas próximas a um dos quasares e há indícios de que estejam aglomeradas dentro de um área de ~ 6 arcmin2. Se esse for o caso, estes objetos seriam membros de uma estrutura em grande escala. Um outro argumento em favor dessa hipótese é que eles obedecem uma relação do tipo Kormendy (raio equivalente X brilho superficial dentro desse raio), como a apresentada pelas galáxias elípticas em z = 0.

  19. Prevalência de tromboembolismo pulmonar incidental em pacientes oncológicos: análise retrospectiva em grande centro

    PubMed Central

    Carneiro, Renata Mota; van Bellen, Bonno; Santana, Pablo Rydz Pinheiro; Gomes, Antônio Carlos Portugal

    2017-01-01

    Resumo Contexto Devido à maior aplicação de exames de imagem rotineiros, especialmente nos pacientes com neoplasia para controle da doença, vem aumentando o diagnóstico de tromboembolismo pulmonar (TEP) incidental, importante fator de morbimortalidade associado. Objetivo Identificar os casos de TEP incidental em pacientes oncológicos submetidos a tomografia computadorizada (TC) de tórax, correlacionando aspectos clínicos e fatores de risco associados. Métodos Estudo retrospectivo de todos os episódios de TEP ocorridos de janeiro de 2013 a junho de 2016, com seleção dos pacientes oncológicos e divisão deles em dois grupos: com suspeita clínica e sem suspeita clínica (incidentais) de embolia pulmonar. Resultados Foram avaliados 468 pacientes com TEP no período citado. Destes, 23,1% eram oncológicos, entre os quais 44,4% apresentaram achado incidental de embolia pulmonar na TC de tórax. Não houve diferença estatística entre os grupos para sexo, idade e tabagismo. Quanto à procedência, 58,3% dos pacientes sem suspeita clínica eram de origem ambulatorial e 41,7% com suspeita de TEP vinham do pronto-socorro (p < 0,001). As neoplasias mais prevalentes foram de pulmão (17,6%), intestino (15,7%) e mama (13,0%). Aqueles com achado incidental apresentaram significativamente mais metástases, sem diferença entre os grupos para realização de quimioterapia, radioterapia ou cirurgia recente. Quanto aos sintomas apresentados, 41,9% daqueles sem suspeita clínica tinham queixas sugestivas de TEP quando realizaram o exame. Conclusão TEP incidental é frequente em pacientes oncológicos, especialmente naqueles provenientes de seguimento ambulatorial e em estágios avançados da doença. Sintomas sugestivos de TEP estavam presentes em pacientes sem suspeita clínica ao realizarem a TC de tórax. PMID:29930652

  20. Queda dos homicídios em São Paulo, Brasil: uma análise descritiva

    PubMed Central

    Peres, Maria Fernanda Tourinho; Vicentin, Diego; Nery, Marcelo Batista; de Lima, Renato Sérgio; de Souza, Edinilsa Ramos; Cerda, Magdalena; Cardia, Nancy; Adorno, e Sérgio

    2012-01-01

    Objetivo Descrever a evolução da mortalidade por homicídios no Município de São Paulo segundo tipo de arma, sexo, raça ou cor, idade e áreas de exclusão/inclusão social entre 1996 e 2008. Métodos Estudo ecológico de série temporal. Os dados sobre óbitos ocorridos no Município foram coletados da base de dados do Programa de Aprimoramento das Informações sobre Mortalidade, seguindo a Classificação Internacional de Doenças, Décima Revisão (CID-10). Foram calculadas as taxas de mortalidade por homicídio (TMH) para a população total, por sexo, raça ou cor, faixa etária, tipo de arma e área de exclusão/inclusão social. As TMH foram padronizadas por idade pelo método direto. Foram calculados os percentuais de variação no período estudado. Para as áreas de exclusão/inclusão social foram calculados os riscos relativos de morte por homicídio. Resultados As TMH apresentaram queda de 73,7% entre 2001 e 2008. Foi observada redução da TMH em todos os grupos analisados, mais pronunciada em homens (−74,5%), jovens de 15 a 24 anos (−78,0%) e moradores de áreas de exclusão social extrema (−79,3%). A redução ocorreu, sobretudo, nos homicídios cometidos com armas de fogo (−74,1%). O risco relativo de morte por homicídio nas áreas de exclusão extrema (tendo como referência áreas com algum grau de exclusão social) foi de 2,77 em 1996, 3,9 em 2001 e 2,13 em 2008. Nas áreas de alta exclusão social, o risco relativo foi de 2,07 em 1996 e 1,96 em 2008. Conclusões Para compreender a redução dos homicídios no Município, é importante considerar macrodeterminantes que atingem todo o Município e todos os subgrupos populacionais e microdeterminantes que atuam localmente, influenciando de forma diferenciada os homicídios com armas de fogo e os homicídios na população jovem, no sexo masculino e em residentes em áreas de alta exclusão social. PMID:21390415

  1. PREVALENCE OF HELICOBACTER PYLORI TEN YEARS AGO COMPARED TO THE CURRENT PREVALENCE IN PATIENTS UNDERGOING UPPER ENDOSCOPY.

    PubMed

    Frugis, Sandra; Czeczko, Nicolau Gregori; Malafaia, Osvaldo; Parada, Artur Adolfo; Poletti, Paula Bechara; Secchi, Thiago Festa; Degiovani, Matheus; Rampanazzo-Neto, Alécio; D Agostino, Mariza D

    2016-01-01

    Helicobacter pylori has been extensively studied since 1982 it is estimated that 50% of the world population is affected. The literature lacks studies that show the change of its prevalence in the same population over time. To compare the prevalence of H. pylori in 10 years interval in a population that was submitted to upper endoscopy in the same endoscopy service. Observational, retrospective and cross-sectional study comparing the prevalence of H. pylori in two samples with 10 years apart (2004 and 2014) who underwent endoscopy with biopsy and urease. Patients were studied in three consecutive months of 2004, compared to three consecutive months of 2014. The total number of patients was 2536, and 1406 in 2004 and 1130 in 2014. There were positive for H. pylori in 17 % of the sample as a whole. There was a significant decrease in the prevalence from 19.3% in 2004 to 14.1% in 2014 (p<0.005). There was a 5.2% reduction in the prevalence of H. pylori comparing two periods of three consecutive months with 10 years apart in two equivalent population samples. Helicobacter pylori vem sendo amplamente estudado desde 1982 estimando-se que 50% da população mundial esteja afetada. A literatura carece de estudos que mostrem a mudança de sua prevalência em uma mesma população ao longo do tempo. Comparar a prevalência do H.pylori no intervalo de 10 anos em população que realizou endoscopia digestiva alta no mesmo serviço de endoscopia. Estudo observacional, retrospectivo e transversal, comparando a prevalência de H. pylori em duas amostras com intervalo de 10 anos (2004 e 2014) que realizaram endoscopia digestiva alta com biópsias e teste da urease para a pesquisa de H. pylori. Foram estudados pacientes em três meses consecutivos de 2004, comparados aos de três meses consecutivos de 2014. O número total de pacientes avaliados foi 2536, sendo 1406 em 2004 e 1130 em 2014. Constatou-se resultado positivo para H.pylori em 17% da amostra como um todo. Houve queda significativa da prevalência de H.pylori de 19,3% em 2004 para 14,1% em 2014 (p<0.005). Houve redução de 5,2% da prevalência de H. pylori comparando-se dois períodos de três meses consecutivos com intervalo de 10 anos em duas amostras populacionais equivalentes.

  2. Focus and coverage of Bolsa Família Program in the Pelotas 2004 birth cohort.

    PubMed

    Schmidt, Kelen H; Labrecque, Jeremy; Santos, Iná S; Matijasevich, Alicia; Barros, Fernando C; Barros, Aluisio J D

    2017-03-30

    To describe the focalization and coverage of Bolsa Família Program among the families of children who are part of the 2004 Pelotas birth cohort (2004 cohort). The data used derives from the integration of information from the 2004 cohort and the Cadastro Único para Programas Sociais do Governo Federal (CadÚnico - Register for Social Programs of the Federal Government), in the 2004-2010 period. We estimated the program coverage (percentage of eligible people who receive the benefit) and its focus (proportion of eligible people among the beneficiaries). We used two criteria to define eligibility: the per capita household income reported in the cohort follow-ups and belonging to the 20% poorest families according to the National Economic Indicator (IEN), an asset index. Between 2004 and 2010, the proportion of families in the cohort that received the benefit increased from 11% to 34%. We observed an increase in all wealth quintiles. In 2010, by income and wealth quintiles (IEN), 62%-72% of the families were beneficiaries among the 20% poorest people, 2%-5% among the 20% richest people, and about 30% of families of the intermediate quintile. According to household income (minus the benefit) 29% of families were eligible in 2004 and 16% in 2010. By the same criteria, the coverage of the program increased from 43% in 2004 to 71% in 2010. In the same period, by the wealth criterion (IEN), coverage increased from 29% to 63%. The focalization of the program decreased from 78% in 2004 to 32% in 2010 according to income, and remained constant (37%) according to the IEN. Among the families of the 2004 cohort, there was a significant increase in the program coverage, from its inception until 2010, when it was near 70%. The focus of the program was below 40% in 2010, indicating that more than half of the beneficiaries did not belong to the target population. Descrever a focalização e a cobertura do Programa Bolsa Família nas famílias de crianças que fazem parte da coorte de nascimentos de Pelotas, 2004 (coorte de 2004). Os dados utilizados derivam da integração de informações da coorte de 2004 e do Cadastro Único para Programas Sociais do Governo Federal, no período de 2004 a 2010. Estimamos a cobertura do programa (percentual de elegíveis que recebem bolsa) e seu foco (proporção de elegíveis entre os beneficiários). Utilizamos dois critérios para definir elegibilidade: a renda familiar per capita relatada nas avaliações da coorte e pertencer aos 20,0% mais pobres pela classificação do Indicador Econômico Nacional, um índice de bens. Entre 2004 e 2010, a proporção de famílias beneficiárias da coorte passou de 11% para 34%. Houve aumento em todos os quintis de riqueza. Em 2010, por quintis de renda e Indicador Econômico Nacional, 62%-72% das famílias eram beneficiárias entre os 20% mais pobres, 2%-5% entre os 20% mais ricos, e cerca de 30% das famílias do quintil intermediário. Pelo critério de renda familiar, excluindo-se o valor do benefício do programa, 29% das famílias eram elegíveis em 2004 e 16% em 2010. Pelo mesmo critério, a cobertura do programa passou de 43% em 2004 para 71% em 2010. No mesmo período, pelo critério de riqueza (Indicador Econômico Nacional), a cobertura passou de 29% para 63%. A focalização do programa caiu de 78% em 2004 para 32% em 2010 de acordo com a renda e permaneceu constante (37%) de acordo com o Indicador Econômico Nacional. Entre as famílias da coorte de 2004, observa-se aumento importante da cobertura do programa, de seu início até 2010, quando ficou perto de 70%. O foco do programa ficou abaixo de 40% em 2010, indicando que mais da metade dos beneficiários não pertencem à população alvo.

  3. Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

    NASA Astrophysics Data System (ADS)

    Xu, Chuanfu; Deng, Xiaogang; Zhang, Lilun; Fang, Jianbin; Wang, Guangxue; Jiang, Yi; Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua

    2014-12-01

    Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU-GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.

  4. Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Chuanfu, E-mail: xuchuanfu@nudt.edu.cn; Deng, Xiaogang; Zhang, Lilun

    Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations formore » high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.« less

  5. Zero Calcium Score as a Filter for Further Testing in Patients Admitted to the Coronary Care Unit with Chest Pain.

    PubMed

    Correia, Luis Cláudio Lemos; Esteves, Fábio P; Carvalhal, Manuela; Souza, Thiago Menezes Barbosa de; Sá, Nicole de; Correia, Vitor Calixto de Almeida; Alexandre, Felipe Kalil Beirão; Lopes, Fernanda; Ferreira, Felipe; Noya-Rabelo, Márcia

    2017-06-12

    The accuracy of zero coronary calcium score as a filter in patients with chest pain has been demonstrated at the emergency room and outpatient clinics, populations with low prevalence of coronary artery disease (CAD). To test the gatekeeping role of zero calcium score in patients with chest pain admitted to the coronary care unit (CCU), where the pretest probability of CAD is higher than that of other populations. Patients underwent computed tomography for calcium scoring, and obstructive CAD was defined by a minimum 70% stenosis on invasive angiography. In 146 patients studied, the prevalence of CAD was 41%. A zero calcium score was present in 35% of the patients. The sensitivity and specificity of zero calcium score yielded a negative likelihood ratio of 0.16. After logistic regression adjustment for pretest probability, zero calcium score was independently associated with lower odds of CAD (OR = 0.12, 95%CI = 0.04-0.36), increasing the area under the ROC curve of the clinical model from 0.76 to 0.82 (p = 0.006). Zero calcium score provided a net reclassification improvement of 0.20 (p = 0.0018) over the clinical model when using a pretest probability threshold of 10% for discharging without further testing. In patients with pretest probability < 50%, zero calcium score had a negative predictive value of 95% (95%CI = 83%-99%), with a number needed to test of 2.1 for obtaining one additional discharge. Zero calcium score substantially reduces the pretest probability of obstructive CAD in patients admitted to the CCU with acute chest pain. (Arq Bras Cardiol. 2017; [online].ahead print, PP.0-0). A acurácia do escore de cálcio coronário zero como um filtro nos pacientes com dor torácica aguda tem sido demonstrada na sala de emergência e nos ambulatórios, populações com baixa prevalência de doença arterial coronariana (DAC). Testar o papel do escore de cálcio zero como filtro nos pacientes com dor torácica admitidos numa unidade coronariana intensiva (UCI), na qual a probabilidade pré-teste de DAC é maior do que em outras populações. Pacientes foram submetidos a tomografia computadorizada para quantificar o escore de cálcio, DAC obstrutiva foi definida por uma estenose mínima de 70% na cineangiocoronariografia invasiva. Um escore clínico para estimar a probabilidade pré-teste de DAC obstrutiva foi criado em amostra de 370 pacientes, usado para definir subgrupos na definição de valores preditivos negativos do escore zero. Em 146 pacientes estudados, a prevalência de DAC foi 41% e o escore de cálcio zero foi demonstrado em 35% deles. A sensibilidade e a especificidade para escore de cálcio zero resultaram numa razão de verossimilhança negativa de 0,16. Após ajuste com um escore clínico com a regressão logística para a probabilidade pré-teste, o escore de cálcio zero foi preditor independente associado a baixa probabilidade de DAC (OR = 0,12, IC95% = 0,04-0,36), aumentando a área abaixo da curva ROC do modelo clínico de 0,76 para 0,82 (p = 0,006). Considerando a probabilidade de DAC < 10% como ponto de corte para alta precoce, o escore de cálcio aumentou a proporção de pacientes para alta precoce de 8,2% para 25% (NRI = 0,20; p = 0,0018). O escore de cálcio zero apresentou valor preditivo negativo de 90%. Em pacientes com probabilidade pré-teste < 50%, o valor preditivo negativo foi 95% (IC95% = 83%-99%). O escore de cálcio zero reduz substancialmente a probabilidade pré-teste de DAC obstrutiva em pacientes internados em UCI com dor torácica aguda. (Arq Bras Cardiol. 2017; [online].ahead print, PP.0-0).

  6. Development, health, and international policy: the research and innovation dimension.

    PubMed

    Buss, Paulo Marchiori; Chamas, Claudia; Faid, Miriam; Morel, Carlos

    2016-11-03

    This text main objective is to discuss development and health from the perspective of the influence of global health governance, using as the tracer the dimension of research, development, and innovation policies in health, which relate to both important inputs for the health system, like drugs and medicines, vaccines, diagnostic reagents, and equipment, and innovative concepts and practices for the improvement of health systems and public health. The authors examine the two main macro-processes that influence development and health: the post-2015 Development Agenda and the process under way in the World Health Organization concerning research and development, intellectual property, and access to health inputs. The article concludes, first, that much remains to be done for the Agenda to truly represent a coherent and viable international political pact, and that the two macro-processes related to innovation in health need to be streamlined. But this requires democratization of participation by the main stakeholders - patients and the general population of the poorest countries - since this is the only way to overcome a "zero sum" result in the clash in the current debates among member State representatives. Resumo: O objetivo central deste texto é discutir desenvolvimento e saúde sob a ótica da influência da governança da saúde global, utilizando como traçador a dimensão das políticas de pesquisa, desenvolvimento e inovação em saúde, que se referem, de um lado, a insumos importantes para o sistema de saúde - como fármacos e medicamentos, vacinas, reativos para diagnóstico e equipamentos e, de outro, a conceitos e práticas inovadoras para o aperfeiçoamento dos sistemas de saúde e da saúde pública. Examina os dois principais macroprocessos que influenciam o desenvolvimento e a saúde: a Agenda do Desenvolvimento para o pós-2015 e o processo sobre pesquisa e desenvolvimento, propriedade intelectual e acesso a insumos em saúde em curso na Organização Mundial da Saúde. Conclui que muito há que ser feito para que a referida Agenda possa representar um pacto político internacional coerente e viável, e que os dois macroprocessos relacionados com a inovação em saúde precisam ser agilizados, mas para isto torna-se necessária a democratização da participação dos maiores interessados - os pacientes e, de modo geral, a população dos países mais pobres - pois só desta maneira será superada a "soma zero" em que se encontra o embate entre os representantes de Estados-membros nos debates atuais.

  7. cellGPU: Massively parallel simulations of dynamic vertex models

    NASA Astrophysics Data System (ADS)

    Sussman, Daniel M.

    2017-10-01

    Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation

  8. Incompressible SPH (ISPH) with fast Poisson solver on a GPU

    NASA Astrophysics Data System (ADS)

    Chow, Alex D.; Rogers, Benedict D.; Lind, Steven J.; Stansby, Peter K.

    2018-05-01

    This paper presents a fast incompressible SPH (ISPH) solver implemented to run entirely on a graphics processing unit (GPU) capable of simulating several millions of particles in three dimensions on a single GPU. The ISPH algorithm is implemented by converting the highly optimised open-source weakly-compressible SPH (WCSPH) code DualSPHysics to run ISPH on the GPU, combining it with the open-source linear algebra library ViennaCL for fast solutions of the pressure Poisson equation (PPE). Several challenges are addressed with this research: constructing a PPE matrix every timestep on the GPU for moving particles, optimising the limited GPU memory, and exploiting fast matrix solvers. The ISPH pressure projection algorithm is implemented as 4 separate stages, each with a particle sweep, including an algorithm for the population of the PPE matrix suitable for the GPU, and mixed precision storage methods. An accurate and robust ISPH boundary condition ideal for parallel processing is also established by adapting an existing WCSPH boundary condition for ISPH. A variety of validation cases are presented: an impulsively started plate, incompressible flow around a moving square in a box, and dambreaks (2-D and 3-D) which demonstrate the accuracy, flexibility, and speed of the methodology. Fragmentation of the free surface is shown to influence the performance of matrix preconditioners and therefore the PPE matrix solution time. The Jacobi preconditioner demonstrates robustness and reliability in the presence of fragmented flows. For a dambreak simulation, GPU speed ups demonstrate up to 10-18 times and 1.1-4.5 times compared to single-threaded and 16-threaded CPU run times respectively.

  9. Testing and Validating Gadget2 for GPUs

    NASA Astrophysics Data System (ADS)

    Wibking, Benjamin; Holley-Bockelmann, K.; Berlind, A. A.

    2013-01-01

    We are currently upgrading a version of Gadget2 (Springel et al., 2005) that is optimized for NVIDIA's CUDA GPU architecture (Frigaard, unpublished) to work with the latest libraries and graphics cards. Preliminary tests of its performance indicate a ~40x speedup in the particle force tree approximation calculation, with overall speedup of 5-10x for cosmological simulations run with GPUs compared to running on the same CPU cores without GPU acceleration. We believe this speedup can be reasonably increased by an additional factor of two with futher optimization, including overlap of computation on CPU and GPU. Tests of single-precision GPU numerical fidelity currently indicate accuracy of the mass function and the spectral power density to within a few percent of extended-precision CPU results with the unmodified form of Gadget. Additionally, we plan to test and optimize the GPU code for Millenium-scale "grand challenge" simulations of >10^9 particles, a scale that has been previously untested with this code, with the aid of the NSF XSEDE flagship GPU-based supercomputing cluster codenamed "Keeneland." Current work involves additional validation of numerical results, extending the numerical precision of the GPU calculations to double precision, and evaluating performance/accuracy tradeoffs. We believe that this project, if successful, will yield substantial computational performance benefits to the N-body research community as the next generation of GPU supercomputing resources becomes available, both increasing the electrical power efficiency of ever-larger computations (making simulations possible a decade from now at scales and resolutions unavailable today) and accelerating the pace of research in the field.

  10. Lossless data compression for improving the performance of a GPU-based beamformer.

    PubMed

    Lok, U-Wai; Fan, Gang-Wei; Li, Pai-Chi

    2015-04-01

    The powerful parallel computation ability of a graphics processing unit (GPU) makes it feasible to perform dynamic receive beamforming However, a real time GPU-based beamformer requires high data rate to transfer radio-frequency (RF) data from hardware to software memory, as well as from central processing unit (CPU) to GPU memory. There are data compression methods (e.g. Joint Photographic Experts Group (JPEG)) available for the hardware front end to reduce data size, alleviating the data transfer requirement of the hardware interface. Nevertheless, the required decoding time may even be larger than the transmission time of its original data, in turn degrading the overall performance of the GPU-based beamformer. This article proposes and implements a lossless compression-decompression algorithm, which enables in parallel compression and decompression of data. By this means, the data transfer requirement of hardware interface and the transmission time of CPU to GPU data transfers are reduced, without sacrificing image quality. In simulation results, the compression ratio reached around 1.7. The encoder design of our lossless compression approach requires low hardware resources and reasonable latency in a field programmable gate array. In addition, the transmission time of transferring data from CPU to GPU with the parallel decoding process improved by threefold, as compared with transferring original uncompressed data. These results show that our proposed lossless compression plus parallel decoder approach not only mitigate the transmission bandwidth requirement to transfer data from hardware front end to software system but also reduce the transmission time for CPU to GPU data transfer. © The Author(s) 2014.

  11. GPU-accelerated automatic identification of robust beam setups for proton and carbon-ion radiotherapy

    NASA Astrophysics Data System (ADS)

    Ammazzalorso, F.; Bednarz, T.; Jelen, U.

    2014-03-01

    We demonstrate acceleration on graphic processing units (GPU) of automatic identification of robust particle therapy beam setups, minimizing negative dosimetric effects of Bragg peak displacement caused by treatment-time patient positioning errors. Our particle therapy research toolkit, RobuR, was extended with OpenCL support and used to implement calculation on GPU of the Port Homogeneity Index, a metric scoring irradiation port robustness through analysis of tissue density patterns prior to dose optimization and computation. Results were benchmarked against an independent native CPU implementation. Numerical results were in agreement between the GPU implementation and native CPU implementation. For 10 skull base cases, the GPU-accelerated implementation was employed to select beam setups for proton and carbon ion treatment plans, which proved to be dosimetrically robust, when recomputed in presence of various simulated positioning errors. From the point of view of performance, average running time on the GPU decreased by at least one order of magnitude compared to the CPU, rendering the GPU-accelerated analysis a feasible step in a clinical treatment planning interactive session. In conclusion, selection of robust particle therapy beam setups can be effectively accelerated on a GPU and become an unintrusive part of the particle therapy treatment planning workflow. Additionally, the speed gain opens new usage scenarios, like interactive analysis manipulation (e.g. constraining of some setup) and re-execution. Finally, through OpenCL portable parallelism, the new implementation is suitable also for CPU-only use, taking advantage of multiple cores, and can potentially exploit types of accelerators other than GPUs.

  12. GPU-Based Point Cloud Superpositioning for Structural Comparisons of Protein Binding Sites.

    PubMed

    Leinweber, Matthias; Fober, Thomas; Freisleben, Bernd

    2018-01-01

    In this paper, we present a novel approach to solve the labeled point cloud superpositioning problem for performing structural comparisons of protein binding sites. The solution is based on a parallel evolution strategy that operates on large populations and runs on GPU hardware. The proposed evolution strategy reduces the likelihood of getting stuck in a local optimum of the multimodal real-valued optimization problem represented by labeled point cloud superpositioning. The performance of the GPU-based parallel evolution strategy is compared to a previously proposed CPU-based sequential approach for labeled point cloud superpositioning, indicating that the GPU-based parallel evolution strategy leads to qualitatively better results and significantly shorter runtimes, with speed improvements of up to a factor of 1,500 for large populations. Binary classification tests based on the ATP, NADH, and FAD protein subsets of CavBase, a database containing putative binding sites, show average classification rate improvements from about 92 percent (CPU) to 96 percent (GPU). Further experiments indicate that the proposed GPU-based labeled point cloud superpositioning approach can be superior to traditional protein comparison approaches based on sequence alignments.

  13. irGPU.proton.Net: Irregular strong charge interaction networks of protonatable groups in protein molecules--a GPU solver using the fast multipole method and statistical thermodynamics.

    PubMed

    Kantardjiev, Alexander A

    2015-04-05

    A cluster of strongly interacting ionization groups in protein molecules with irregular ionization behavior is suggestive for specific structure-function relationship. However, their computational treatment is unconventional (e.g., lack of convergence in naive self-consistent iterative algorithm). The stringent evaluation requires evaluation of Boltzmann averaged statistical mechanics sums and electrostatic energy estimation for each microstate. irGPU: Irregular strong interactions in proteins--a GPU solver is novel solution to a versatile problem in protein biophysics--atypical protonation behavior of coupled groups. The computational severity of the problem is alleviated by parallelization (via GPU kernels) which is applied for the electrostatic interaction evaluation (including explicit electrostatics via the fast multipole method) as well as statistical mechanics sums (partition function) estimation. Special attention is given to the ease of the service and encapsulation of theoretical details without sacrificing rigor of computational procedures. irGPU is not just a solution-in-principle but a promising practical application with potential to entice community into deeper understanding of principles governing biomolecule mechanisms. © 2015 Wiley Periodicals, Inc.

  14. Optimizing Tensor Contraction Expressions for Hybrid CPU-GPU Execution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ma, Wenjing; Krishnamoorthy, Sriram; Villa, Oreste

    2013-03-01

    Tensor contractions are generalized multidimensional matrix multiplication operations that widely occur in quantum chemistry. Efficient execution of tensor contractions on Graphics Processing Units (GPUs) requires several challenges to be addressed, including index permutation and small dimension-sizes reducing thread block utilization. Moreover, to apply the same optimizations to various expressions, we need a code generation tool. In this paper, we present our approach to automatically generate CUDA code to execute tensor contractions on GPUs, including management of data movement between CPU and GPU. To evaluate our tool, GPU-enabled code is generated for the most expensive contractions in CCSD(T), a key coupledmore » cluster method, and incorporated into NWChem, a popular computational chemistry suite. For this method, we demonstrate speedup over a factor of 8.4 using one GPU (instead of one core per node) and over 2.6 when utilizing the entire system using hybrid CPU+GPU solution with 2 GPUs and 5 cores (instead of 7 cores per node). Finally, we analyze the implementation behavior on future GPU systems.« less

  15. A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection

    PubMed Central

    Chen, Yaw-Chung

    2015-01-01

    The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs) are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs) have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA) that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms. PMID:26437335

  16. A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection.

    PubMed

    Lee, Chun-Liang; Lin, Yi-Shan; Chen, Yaw-Chung

    2015-01-01

    The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs) are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs) have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA) that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms.

  17. Work stealing for GPU-accelerated parallel programs in a global address space framework: WORK STEALING ON GPU-ACCELERATED SYSTEMS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

    Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less

  18. Fast 3D elastic micro-seismic source location using new GPU features

    NASA Astrophysics Data System (ADS)

    Xue, Qingfeng; Wang, Yibo; Chang, Xu

    2016-12-01

    In this paper, we describe new GPU features and their applications in passive seismic - micro-seismic location. Locating micro-seismic events is quite important in seismic exploration, especially when searching for unconventional oil and gas resources. Different from the traditional ray-based methods, the wave equation method, such as the method we use in our paper, has a remarkable advantage in adapting to low signal-to-noise ratio conditions and does not need a person to select the data. However, because it has a conspicuous deficiency due to its computation cost, these methods are not widely used in industrial fields. To make the method useful, we implement imaging-like wave equation micro-seismic location in a 3D elastic media and use GPU to accelerate our algorithm. We also introduce some new GPU features into the implementation to solve the data transfer and GPU utilization problems. Numerical and field data experiments show that our method can achieve a more than 30% performance improvement in GPU implementation just by using these new features.

  19. GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations

    NASA Astrophysics Data System (ADS)

    Nguyen, Trung Dac

    2017-03-01

    The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.

  20. Active corrosion protection of AA2024 by sol-gel coatings with corrosion inhibitors =

    NASA Astrophysics Data System (ADS)

    Yasakau, Kiryl

    A industria aeronautica utiliza ligas de aluminio de alta resistencia para o fabrico dos elementos estruturais dos avioes. As ligas usadas possuem excelentes propriedades mecanicas mas apresentam simultaneamente uma grande tendencia para a corrosao. Por esta razao essas ligas necessitam de proteccao anticorrosiva eficaz para poderem ser utilizadas com seguranca. Ate a data, os sistemas anticorrosivos mais eficazes para ligas de aluminio contem cromio hexavalente na sua composicao, sejam pre-tratamentos, camadas de conversao ou pigmentos anticorrosivos. O reconhecimento dos efeitos carcinogenicos do cromio hexavalente levou ao aparecimento de legislacao banindo o uso desta forma de cromio pela industria. Esta decisao trouxe a necessidade de encontrar alternativas ambientalmente inocuas mas igualmente eficazes. O principal objectivo do presente trabalho e o desenvolvimento de pretratamentos anticorrosivos activos para a liga de aluminio 2024, baseados em revestimentos hibridos produzidos pelo metodo sol-gel. Estes revestimentos deverao possuir boa aderencia ao substrato metalico, boas propriedades barreira e capacidade anticorrosiva activa. A proteccao activa pode ser alcancada atraves da incorporacao de inibidores anticorrosivos no pretratamento. O objectivo foi atingido atraves de uma sucessao de etapas. Primeiro investigou-se em detalhe a corrosao localizada (por picada) da liga de aluminio 2024. Os resultados obtidos permitiram uma melhor compreensao da susceptibilidade desta liga a processos de corrosao localizada. Estudaram-se tambem varios possiveis inibidores de corrosao usando tecnicas electroquimicas e microestruturais. Numa segunda etapa desenvolveram-se revestimentos anticorrosivos hibridos organico-inorganico baseados no metodo sol-gel. Compostos derivados de titania e zirconia foram combinados com siloxanos organofuncionais a fim de obter-se boa aderencia entre o revestimento e o substrato metalico assim como boas propriedades barreira. Testes industriais mostraram que estes novos revestimentos sao compativeis com os esquemas de pintura convencionais actualmente em uso. A estabilidade e o prazo de validade das formulacoes foram optimizados modificando a temperatura de armazenamento e a quantidade de agua usada durante a sintese. As formulacoes sol-gel foram dopadas com os inibidores seleccionados durante a primeira etapa e as propriedades anticorrosivas passivas e activas dos revestimentos obtidos foram estudadas numa terceira etapa do trabalho. Os resultados comprovam a influencia dos inibidores nas propriedades anticorrosivas dos revestimentos sol-gel. Em alguns casos a accao activa dos inibidores combinou-se com a proteccao passiva dada pelo revestimento mas noutros casos tera ocorrido interaccao quimica entre o inibidor e a matriz de sol-gel, de onde resultou a perda de propriedades protectoras do sistema combinado. Atendendo aos problemas provocados pela adicao directa dos inibidores na formulacao sol-gel procurou-se, numa quarta etapa, formas alternativas de incorporacao. Na primeira, produziu-se uma camada de titania nanoporosa na superficie da liga metalica que serviu de reservatorio para os inibidores. O revestimento sol-gel foi aplicado por cima da camada nanoporosa. Os inibidores armazenados nos poros actuam quando o substrato fica exposto ao ambiente agressivo. Numa segunda, os inibidores foram armazenados em nano-reservatorios de silica ou em nanoargilas (halloysite), os quais foram revestidos por polielectrolitos montados camada a camada. A terceira alternativa consistiu no uso de nano-fios de molibdato de cerio amorfo como inibidores anticorrosivos nanoparticulados. Os nano-reservatorios foram incorporados durante a sintese do sol-gel. Qualquer das abordagens permitiu eliminar o efeito negativo do inibidor sobre a estabilidade da matriz do sol-gel. Os revestimentos sol-gel desenvolvidos neste trabalho apresentaram proteccao anticorrosiva activa e capacidade de auto-reparacao. Os resultados obtidos mostraram o elevado potencial destes revestimentos para a proteccao anticorrosiva da liga de aluminio 2024.

  1. Multi-GPU implementation of a VMAT treatment plan optimization algorithm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tian, Zhen, E-mail: Zhen.Tian@UTSouthwestern.edu, E-mail: Xun.Jia@UTSouthwestern.edu, E-mail: Steve.Jiang@UTSouthwestern.edu; Folkerts, Michael; Tan, Jun

    Purpose: Volumetric modulated arc therapy (VMAT) optimization is a computationally challenging problem due to its large data size, high degrees of freedom, and many hardware constraints. High-performance graphics processing units (GPUs) have been used to speed up the computations. However, GPU’s relatively small memory size cannot handle cases with a large dose-deposition coefficient (DDC) matrix in cases of, e.g., those with a large target size, multiple targets, multiple arcs, and/or small beamlet size. The main purpose of this paper is to report an implementation of a column-generation-based VMAT algorithm, previously developed in the authors’ group, on a multi-GPU platform tomore » solve the memory limitation problem. While the column-generation-based VMAT algorithm has been previously developed, the GPU implementation details have not been reported. Hence, another purpose is to present detailed techniques employed for GPU implementation. The authors also would like to utilize this particular problem as an example problem to study the feasibility of using a multi-GPU platform to solve large-scale problems in medical physics. Methods: The column-generation approach generates VMAT apertures sequentially by solving a pricing problem (PP) and a master problem (MP) iteratively. In the authors’ method, the sparse DDC matrix is first stored on a CPU in coordinate list format (COO). On the GPU side, this matrix is split into four submatrices according to beam angles, which are stored on four GPUs in compressed sparse row format. Computation of beamlet price, the first step in PP, is accomplished using multi-GPUs. A fast inter-GPU data transfer scheme is accomplished using peer-to-peer access. The remaining steps of PP and MP problems are implemented on CPU or a single GPU due to their modest problem scale and computational loads. Barzilai and Borwein algorithm with a subspace step scheme is adopted here to solve the MP problem. A head and neck (H and N) cancer case is then used to validate the authors’ method. The authors also compare their multi-GPU implementation with three different single GPU implementation strategies, i.e., truncating DDC matrix (S1), repeatedly transferring DDC matrix between CPU and GPU (S2), and porting computations involving DDC matrix to CPU (S3), in terms of both plan quality and computational efficiency. Two more H and N patient cases and three prostate cases are used to demonstrate the advantages of the authors’ method. Results: The authors’ multi-GPU implementation can finish the optimization process within ∼1 min for the H and N patient case. S1 leads to an inferior plan quality although its total time was 10 s shorter than the multi-GPU implementation due to the reduced matrix size. S2 and S3 yield the same plan quality as the multi-GPU implementation but take ∼4 and ∼6 min, respectively. High computational efficiency was consistently achieved for the other five patient cases tested, with VMAT plans of clinically acceptable quality obtained within 23–46 s. Conversely, to obtain clinically comparable or acceptable plans for all six of these VMAT cases that the authors have tested in this paper, the optimization time needed in a commercial TPS system on CPU was found to be in an order of several minutes. Conclusions: The results demonstrate that the multi-GPU implementation of the authors’ column-generation-based VMAT optimization can handle the large-scale VMAT optimization problem efficiently without sacrificing plan quality. The authors’ study may serve as an example to shed some light on other large-scale medical physics problems that require multi-GPU techniques.« less

  2. Implementation of Multipattern String Matching Accelerated with GPU for Intrusion Detection System

    NASA Astrophysics Data System (ADS)

    Nehemia, Rangga; Lim, Charles; Galinium, Maulahikmah; Rinaldi Widianto, Ahmad

    2017-04-01

    As Internet-related security threats continue to increase in terms of volume and sophistication, existing Intrusion Detection System is also being challenged to cope with the current Internet development. Multi Pattern String Matching algorithm accelerated with Graphical Processing Unit is being utilized to improve the packet scanning performance of the IDS. This paper implements a Multi Pattern String Matching algorithm, also called Parallel Failureless Aho Corasick accelerated with GPU to improve the performance of IDS. OpenCL library is used to allow the IDS to support various GPU, including popular GPU such as NVIDIA and AMD, used in our research. The experiment result shows that the application of Multi Pattern String Matching using GPU accelerated platform provides a speed up, by up to 141% in term of throughput compared to the previous research.

  3. An efficient spectral crystal plasticity solver for GPU architectures

    NASA Astrophysics Data System (ADS)

    Malahe, Michael

    2018-03-01

    We present a spectral crystal plasticity (CP) solver for graphics processing unit (GPU) architectures that achieves a tenfold increase in efficiency over prior GPU solvers. The approach makes use of a database containing a spectral decomposition of CP simulations performed using a conventional iterative solver over a parameter space of crystal orientations and applied velocity gradients. The key improvements in efficiency come from reducing global memory transactions, exposing more instruction-level parallelism, reducing integer instructions and performing fast range reductions on trigonometric arguments. The scheme also makes more efficient use of memory than prior work, allowing for larger problems to be solved on a single GPU. We illustrate these improvements with a simulation of 390 million crystal grains on a consumer-grade GPU, which executes at a rate of 2.72 s per strain step.

  4. Multi-GPU accelerated three-dimensional FDTD method for electromagnetic simulation.

    PubMed

    Nagaoka, Tomoaki; Watanabe, Soichi

    2011-01-01

    Numerical simulation with a numerical human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the numerical human model, we adapt three-dimensional FDTD code to a multi-GPU environment using Compute Unified Device Architecture (CUDA). In this study, we used NVIDIA Tesla C2070 as GPGPU boards. The performance of multi-GPU is evaluated in comparison with that of a single GPU and vector supercomputer. The calculation speed with four GPUs was approximately 3.5 times faster than with a single GPU, and was slightly (approx. 1.3 times) slower than with the supercomputer. Calculation speed of the three-dimensional FDTD method using GPUs can significantly improve with an expanding number of GPUs.

  5. ScipionCloud: An integrative and interactive gateway for large scale cryo electron microscopy image processing on commercial and academic clouds.

    PubMed

    Cuenca-Alba, Jesús; Del Cano, Laura; Gómez Blanco, Josué; de la Rosa Trevín, José Miguel; Conesa Mingo, Pablo; Marabini, Roberto; S Sorzano, Carlos Oscar; Carazo, Jose María

    2017-10-01

    New instrumentation for cryo electron microscopy (cryoEM) has significantly increased data collection rate as well as data quality, creating bottlenecks at the image processing level. Current image processing model of moving the acquired images from the data source (electron microscope) to desktops or local clusters for processing is encountering many practical limitations. However, computing may also take place in distributed and decentralized environments. In this way, cloud is a new form of accessing computing and storage resources on demand. Here, we evaluate on how this new computational paradigm can be effectively used by extending our current integrative framework for image processing, creating ScipionCloud. This new development has resulted in a full installation of Scipion both in public and private clouds, accessible as public "images", with all the required preinstalled cryoEM software, just requiring a Web browser to access all Graphical User Interfaces. We have profiled the performance of different configurations on Amazon Web Services and the European Federated Cloud, always on architectures incorporating GPU's, and compared them with a local facility. We have also analyzed the economical convenience of different scenarios, so cryoEM scientists have a clearer picture of the setup that is best suited for their needs and budgets. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  6. Efficient implementation of the 3D-DDA ray traversal algorithm on GPU and its application in radiation dose calculation.

    PubMed

    Xiao, Kai; Chen, Danny Z; Hu, X Sharon; Zhou, Bo

    2012-12-01

    The three-dimensional digital differential analyzer (3D-DDA) algorithm is a widely used ray traversal method, which is also at the core of many convolution∕superposition (C∕S) dose calculation approaches. However, porting existing C∕S dose calculation methods onto graphics processing unit (GPU) has brought challenges to retaining the efficiency of this algorithm. In particular, straightforward implementation of the original 3D-DDA algorithm inflicts a lot of branch divergence which conflicts with the GPU programming model and leads to suboptimal performance. In this paper, an efficient GPU implementation of the 3D-DDA algorithm is proposed, which effectively reduces such branch divergence and improves performance of the C∕S dose calculation programs running on GPU. The main idea of the proposed method is to convert a number of conditional statements in the original 3D-DDA algorithm into a set of simple operations (e.g., arithmetic, comparison, and logic) which are better supported by the GPU architecture. To verify and demonstrate the performance improvement, this ray traversal method was integrated into a GPU-based collapsed cone convolution∕superposition (CCCS) dose calculation program. The proposed method has been tested using a water phantom and various clinical cases on an NVIDIA GTX570 GPU. The CCCS dose calculation program based on the efficient 3D-DDA ray traversal implementation runs 1.42 ∼ 2.67× faster than the one based on the original 3D-DDA implementation, without losing any accuracy. The results show that the proposed method can effectively reduce branch divergence in the original 3D-DDA ray traversal algorithm and improve the performance of the CCCS program running on GPU. Considering the wide utilization of the 3D-DDA algorithm, various applications can benefit from this implementation method.

  7. GPU-accelerated Monte Carlo convolution/superposition implementation for dose calculation.

    PubMed

    Zhou, Bo; Yu, Cedric X; Chen, Danny Z; Hu, X Sharon

    2010-11-01

    Dose calculation is a key component in radiation treatment planning systems. Its performance and accuracy are crucial to the quality of treatment plans as emerging advanced radiation therapy technologies are exerting ever tighter constraints on dose calculation. A common practice is to choose either a deterministic method such as the convolution/superposition (CS) method for speed or a Monte Carlo (MC) method for accuracy. The goal of this work is to boost the performance of a hybrid Monte Carlo convolution/superposition (MCCS) method by devising a graphics processing unit (GPU) implementation so as to make the method practical for day-to-day usage. Although the MCCS algorithm combines the merits of MC fluence generation and CS fluence transport, it is still not fast enough to be used as a day-to-day planning tool. To alleviate the speed issue of MC algorithms, the authors adopted MCCS as their target method and implemented a GPU-based version. In order to fully utilize the GPU computing power, the MCCS algorithm is modified to match the GPU hardware architecture. The performance of the authors' GPU-based implementation on an Nvidia GTX260 card is compared to a multithreaded software implementation on a quad-core system. A speedup in the range of 6.7-11.4x is observed for the clinical cases used. The less than 2% statistical fluctuation also indicates that the accuracy of the authors' GPU-based implementation is in good agreement with the results from the quad-core CPU implementation. This work shows that GPU is a feasible and cost-efficient solution compared to other alternatives such as using cluster machines or field-programmable gate arrays for satisfying the increasing demands on computation speed and accuracy of dose calculation. But there are also inherent limitations of using GPU for accelerating MC-type applications, which are also analyzed in detail in this article.

  8. SU-E-T-423: Fast Photon Convolution Calculation with a 3D-Ideal Kernel On the GPU

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moriya, S; Sato, M; Tachibana, H

    Purpose: The calculation time is a trade-off for improving the accuracy of convolution dose calculation with fine calculation spacing of the KERMA kernel. We investigated to accelerate the convolution calculation using an ideal kernel on the Graphic Processing Units (GPU). Methods: The calculation was performed on the AMD graphics hardware of Dual FirePro D700 and our algorithm was implemented using the Aparapi that convert Java bytecode to OpenCL. The process of dose calculation was separated with the TERMA and KERMA steps. The dose deposited at the coordinate (x, y, z) was determined in the process. In the dose calculation runningmore » on the central processing unit (CPU) of Intel Xeon E5, the calculation loops were performed for all calculation points. On the GPU computation, all of the calculation processes for the points were sent to the GPU and the multi-thread computation was done. In this study, the dose calculation was performed in a water equivalent homogeneous phantom with 150{sup 3} voxels (2 mm calculation grid) and the calculation speed on the GPU to that on the CPU and the accuracy of PDD were compared. Results: The calculation time for the GPU and the CPU were 3.3 sec and 4.4 hour, respectively. The calculation speed for the GPU was 4800 times faster than that for the CPU. The PDD curve for the GPU was perfectly matched to that for the CPU. Conclusion: The convolution calculation with the ideal kernel on the GPU was clinically acceptable for time and may be more accurate in an inhomogeneous region. Intensity modulated arc therapy needs dose calculations for different gantry angles at many control points. Thus, it would be more practical that the kernel uses a coarse spacing technique if the calculation is faster while keeping the similar accuracy to a current treatment planning system.« less

  9. Integrative multicellular biological modeling: a case study of 3D epidermal development using GPU algorithms

    PubMed Central

    2010-01-01

    Background Simulation of sophisticated biological models requires considerable computational power. These models typically integrate together numerous biological phenomena such as spatially-explicit heterogeneous cells, cell-cell interactions, cell-environment interactions and intracellular gene networks. The recent advent of programming for graphical processing units (GPU) opens up the possibility of developing more integrative, detailed and predictive biological models while at the same time decreasing the computational cost to simulate those models. Results We construct a 3D model of epidermal development and provide a set of GPU algorithms that executes significantly faster than sequential central processing unit (CPU) code. We provide a parallel implementation of the subcellular element method for individual cells residing in a lattice-free spatial environment. Each cell in our epidermal model includes an internal gene network, which integrates cellular interaction of Notch signaling together with environmental interaction of basement membrane adhesion, to specify cellular state and behaviors such as growth and division. We take a pedagogical approach to describing how modeling methods are efficiently implemented on the GPU including memory layout of data structures and functional decomposition. We discuss various programmatic issues and provide a set of design guidelines for GPU programming that are instructive to avoid common pitfalls as well as to extract performance from the GPU architecture. Conclusions We demonstrate that GPU algorithms represent a significant technological advance for the simulation of complex biological models. We further demonstrate with our epidermal model that the integration of multiple complex modeling methods for heterogeneous multicellular biological processes is both feasible and computationally tractable using this new technology. We hope that the provided algorithms and source code will be a starting point for modelers to develop their own GPU implementations, and encourage others to implement their modeling methods on the GPU and to make that code available to the wider community. PMID:20696053

  10. Estudo de propriedades estruturais e opticas de multicamadas epitaxiais emissoras de luz baseadas em InGaN/GaN

    NASA Astrophysics Data System (ADS)

    Pereira, Sergio Manuel de Sousa

    Esta tese apresenta os resultados de uma investigacao experimental em filmes epitaxiais emissores de luz baseados em InxGa1-xN. O InxGa1-xN e uma liga semicondutora ternaria do grupo III-N muito utilizada como camada activa numa gama de dispositivos optoelectronicos em desenvolvimento, incluindo diodos emissores de luz (LEDs) e diodos laser (LDs), para operacao na regiao do visivel e ultravioleta do espectro electromagnetico. Neste estudo, caracterizam-se as propriedade opticas e estruturais de camadas simples e pocos quânticos multiplos (Multiple Quantum Wells, MQWs) de InxGa1-xN/GaN, com enfase nas suas propriedades fisicas fundamentais. O objectivo central do trabalho prende-se com a compreensao mais profunda dos processos fisicos que estao por tras das suas propriedades opticas, preenchendo o fosso existente entre aplicacoes tecnologicas e o conhecimento cientifico. Nomeadamente, a tese aborda os problemas da medicao da fraccao de InN (x) em multicamadas ultrafinas sujeitas a tensoes, a influencia da composicao e das tensoes microscopicas nas propriedades opticas e estruturais. A questao relativa a segregacao de fases em multicamadas de InxGa1-xN/GaN e tambem discutida a luz dos resultados obtidos. A metodologia seguida assenta na integracao de resultados obtidos por tecnicas complementares atraves de uma analise sistematica e multidisciplinar. Esta abordagem passa pela combinacao de: 1) Crescimento de amostras por deposicao epitaxial em fase de vapor organometalico (MOVPE) com caracteristicas especificas de forma a tentar isolar parâmetros estruturais, tais como espessura e composicao; 2) Caracterizacao nanoestrutural por microscopia de forca atomica (AFM), microscopica electronica de varrimento (SEM), difraccao de raios-X e retro-dispersao de Rutherford (RBS); 3) Caracterizacao optica a escalas complementares por: espectroscopia de absorcao optica (OA), fotoluminescencia (PL), catodoluminescencia (CL) e microscopia confocal (CM) com analise espectral. Com base nos resultados obtidos, a tese propoe modelos de interpretacao para as propriedades estruturais e opticas, dando enfase as suas correlacoes. Em particular, estabelece-se a necessidade de considerar fenomenos relacionados com tensoes microscopicas na interpretacao dos resultados experimentais. Com este trabalho fica clara a necessidade de um conhecimento detalhado das caracteristicas nanoestruturais para interpretar as propriedades opticas das ligas de InxGa1-xN. None

  11. GPU-based optimal control for RWM feedback in tokamaks

    DOE PAGES

    Clement, Mitchell; Hanson, Jeremy; Bialek, Jim; ...

    2017-08-23

    The design and implementation of a Graphics Processing Unit (GPU) based Resistive Wall Mode (RWM) controller to perform feedback control on the RWM using Linear Quadratic Gaussian (LQG) control is reported herein. Also, the control algorithm is based on a simplified DIII-D VALEN model. By using NVIDIA’s GPUDirect RDMA framework, the digitizer and output module are able to write and read directly to and from GPU memory, eliminating memory transfers between host and GPU. In conclusion, the system and algorithm was able to reduce plasma response excited by externally applied fields by 32% during development experiments.

  12. GPU-based optimal control for RWM feedback in tokamaks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clement, Mitchell; Hanson, Jeremy; Bialek, Jim

    The design and implementation of a Graphics Processing Unit (GPU) based Resistive Wall Mode (RWM) controller to perform feedback control on the RWM using Linear Quadratic Gaussian (LQG) control is reported herein. Also, the control algorithm is based on a simplified DIII-D VALEN model. By using NVIDIA’s GPUDirect RDMA framework, the digitizer and output module are able to write and read directly to and from GPU memory, eliminating memory transfers between host and GPU. In conclusion, the system and algorithm was able to reduce plasma response excited by externally applied fields by 32% during development experiments.

  13. NLSEmagic: Nonlinear Schrödinger equation multi-dimensional Matlab-based GPU-accelerated integrators using compact high-order schemes

    NASA Astrophysics Data System (ADS)

    Caplan, R. M.

    2013-04-01

    We present a simple to use, yet powerful code package called NLSEmagic to numerically integrate the nonlinear Schrödinger equation in one, two, and three dimensions. NLSEmagic is a high-order finite-difference code package which utilizes graphic processing unit (GPU) parallel architectures. The codes running on the GPU are many times faster than their serial counterparts, and are much cheaper to run than on standard parallel clusters. The codes are developed with usability and portability in mind, and therefore are written to interface with MATLAB utilizing custom GPU-enabled C codes with the MEX-compiler interface. The packages are freely distributed, including user manuals and set-up files. Catalogue identifier: AEOJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOJ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 124453 No. of bytes in distributed program, including test data, etc.: 4728604 Distribution format: tar.gz Programming language: C, CUDA, MATLAB. Computer: PC, MAC. Operating system: Windows, MacOS, Linux. Has the code been vectorized or parallelized?: Yes. Number of processors used: Single CPU, number of GPU processors dependent on chosen GPU card (max is currently 3072 cores on GeForce GTX 690). Supplementary material: Setup guide, Installation guide. RAM: Highly dependent on dimensionality and grid size. For typical medium-large problem size in three dimensions, 4GB is sufficient. Keywords: Nonlinear Schröodinger Equation, GPU, high-order finite difference, Bose-Einstien condensates. Classification: 4.3, 7.7. Nature of problem: Integrate solutions of the time-dependent one-, two-, and three-dimensional cubic nonlinear Schrödinger equation. Solution method: The integrators utilize a fully-explicit fourth-order Runge-Kutta scheme in time and both second- and fourth-order differencing in space. The integrators are written to run on NVIDIA GPUs and are interfaced with MATLAB including built-in visualization and analysis tools. Restrictions: The main restriction for the GPU integrators is the amount of RAM on the GPU as the code is currently only designed for running on a single GPU. Unusual features: Ability to visualize real-time simulations through the interaction of MATLAB and the compiled GPU integrators. Additional comments: Setup guide and Installation guide provided. Program has a dedicated web site at www.nlsemagic.com. Running time: A three-dimensional run with a grid dimension of 87×87×203 for 3360 time steps (100 non-dimensional time units) takes about one and a half minutes on a GeForce GTX 580 GPU card.

  14. Análise da medição do raio solar em ultravioleta

    NASA Astrophysics Data System (ADS)

    Saraiva, A. C. V.; Giménez de Castro, C. G.; Costa, J. E. R.; Selhorst, C. L.; Simões, P. J. A.

    2003-08-01

    A medição acurada do raio solar em qualquer banda do espectro eletromagnético é de relevância na formulação e calibração de modelos da estrutura e atmosfera solar. Esses modelos atribuem emissão do contínuo do Sol calmo em microondas à mesma região da linha Ha do Hell. Apresentamos a medição do raio solar em UV com imagens do EIT (Extreme Ultraviolet Image Telescope) entre 1996 e 2002, no comprimento de onda 30,9 nm (Ha do Hell), que se forma na região de transição/cromosfera solar. A técnica utilizada para o cálculo do raio UV foi baseada na transformada Wavelet B3spline. Fizemos um banco de dados com 1 imagem por dia durante o período citado. Obtivemos como resultado o raio médio da ordem de 975.61" e uma diminuição do mesmo para o período citado variando em média -0,45" /ano. Comparamos estes dados com os valores obtidos pelo ROI (Radio Observatório de Itapetinga) em 22/48 GHz e Nobeyama Radio Heliograph em 17 GHz mostrando que os raios médios são muito próximos o que indica que a região de formação nessas freqüências é a mesma conforme os modelos. Comparamos os resultados também com outros índices de atividade solar.

  15. GPU-based prompt gamma ray imaging from boron neutron capture therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yoon, Do-Kun; Jung, Joo-Young; Suk Suh, Tae, E-mail: suhsanta@catholic.ac.kr

    Purpose: The purpose of this research is to perform the fast reconstruction of a prompt gamma ray image using a graphics processing unit (GPU) computation from boron neutron capture therapy (BNCT) simulations. Methods: To evaluate the accuracy of the reconstructed image, a phantom including four boron uptake regions (BURs) was used in the simulation. After the Monte Carlo simulation of the BNCT, the modified ordered subset expectation maximization reconstruction algorithm using the GPU computation was used to reconstruct the images with fewer projections. The computation times for image reconstruction were compared between the GPU and the central processing unit (CPU).more » Also, the accuracy of the reconstructed image was evaluated by a receiver operating characteristic (ROC) curve analysis. Results: The image reconstruction time using the GPU was 196 times faster than the conventional reconstruction time using the CPU. For the four BURs, the area under curve values from the ROC curve were 0.6726 (A-region), 0.6890 (B-region), 0.7384 (C-region), and 0.8009 (D-region). Conclusions: The tomographic image using the prompt gamma ray event from the BNCT simulation was acquired using the GPU computation in order to perform a fast reconstruction during treatment. The authors verified the feasibility of the prompt gamma ray image reconstruction using the GPU computation for BNCT simulations.« less

  16. TU-FG-BRB-07: GPU-Based Prompt Gamma Ray Imaging From Boron Neutron Capture Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, S; Suh, T; Yoon, D

    Purpose: The purpose of this research is to perform the fast reconstruction of a prompt gamma ray image using a graphics processing unit (GPU) computation from boron neutron capture therapy (BNCT) simulations. Methods: To evaluate the accuracy of the reconstructed image, a phantom including four boron uptake regions (BURs) was used in the simulation. After the Monte Carlo simulation of the BNCT, the modified ordered subset expectation maximization reconstruction algorithm using the GPU computation was used to reconstruct the images with fewer projections. The computation times for image reconstruction were compared between the GPU and the central processing unit (CPU).more » Also, the accuracy of the reconstructed image was evaluated by a receiver operating characteristic (ROC) curve analysis. Results: The image reconstruction time using the GPU was 196 times faster than the conventional reconstruction time using the CPU. For the four BURs, the area under curve values from the ROC curve were 0.6726 (A-region), 0.6890 (B-region), 0.7384 (C-region), and 0.8009 (D-region). Conclusion: The tomographic image using the prompt gamma ray event from the BNCT simulation was acquired using the GPU computation in order to perform a fast reconstruction during treatment. The authors verified the feasibility of the prompt gamma ray reconstruction using the GPU computation for BNCT simulations.« less

  17. The CUBLAS and CULA based GPU acceleration of adaptive finite element framework for bioluminescence tomography.

    PubMed

    Zhang, Bo; Yang, Xiang; Yang, Fei; Yang, Xin; Qin, Chenghu; Han, Dong; Ma, Xibo; Liu, Kai; Tian, Jie

    2010-09-13

    In molecular imaging (MI), especially the optical molecular imaging, bioluminescence tomography (BLT) emerges as an effective imaging modality for small animal imaging. The finite element methods (FEMs), especially the adaptive finite element (AFE) framework, play an important role in BLT. The processing speed of the FEMs and the AFE framework still needs to be improved, although the multi-thread CPU technology and the multi CPU technology have already been applied. In this paper, we for the first time introduce a new kind of acceleration technology to accelerate the AFE framework for BLT, using the graphics processing unit (GPU). Besides the processing speed, the GPU technology can get a balance between the cost and performance. The CUBLAS and CULA are two main important and powerful libraries for programming on NVIDIA GPUs. With the help of CUBLAS and CULA, it is easy to code on NVIDIA GPU and there is no need to worry about the details about the hardware environment of a specific GPU. The numerical experiments are designed to show the necessity, effect and application of the proposed CUBLAS and CULA based GPU acceleration. From the results of the experiments, we can reach the conclusion that the proposed CUBLAS and CULA based GPU acceleration method can improve the processing speed of the AFE framework very much while getting a balance between cost and performance.

  18. Instruments used in the assessment of expectation toward a spine surgery: an integrative review.

    PubMed

    Nepomuceno, Eliane; Silveira, Renata Cristina de Campos Pereira; Dessotte, Carina Aparecida Marosti; Furuya, Rejane Kiyomi; Arantes, Eliana De Cássia; Cunha, Débora Cristine Prévide Teixeira da; Dantas, Rosana Aparecida Spadoti

    2016-01-01

    To identify and describe the instruments used to assess patients' expectations toward spine surgery. An integrative review was carried out in the databases PubMed, CINAHL, LILACS and PsycINFO. A total of 4,402 publications were identified, of which 25 met the selection criteria. Of the studies selected, only three used tools that had confirmed validity and reliability to be applied; in five studies, clinical scores were used, and were modified for the assessment of patients' expectations, and in 17 studies the researchers developed scales without an adequate description of the method used for their development and validation. The assessment of patients' expectations has been methodologically conducted in different ways. Until the completion of this integrative review, only two valid and reliable instruments had been used in three of the selected studies. Identificar e descrever os instrumentos usados para avaliar a expectativa dos pacientes diante do tratamento cirúrgico da coluna vertebral. Revisão Integrativa realizada nas bases de dados PubMed, CINAHL, LILACS e PsycINFO. Identificamos 4.402 publicações, das quais 25 atenderam aos critérios de seleção. Dos estudos selecionados, apenas em três os autores utilizaram instrumentos que possuíam validade e confiabilidade confirmadas para serem aplicados; em cinco estudos foram utilizados escores clínicos, modificados para a avaliação das expectativas dos pacientes, e em dezessete os pesquisadores elaboraram escalas sem adequada descrição do método usado para o seu desenvolvimento e validação. A avaliação das expectativas dos pacientes tem sido metodologicamente conduzida de diferentes maneiras. Até a finalização desta revisão integrativa, apenas dois instrumentos, válidos e confiáveis, haviam sido utilizados em três dos estudos selecionados.

  19. Escleroterapia de safena associada a enxerto de pele no tratamento de úlceras venosas

    PubMed Central

    de Oliveira, Alexandre Faraco; de Oliveira, Horácio

    2017-01-01

    Resumo Contexto Úlceras são a resultante final de varizes associadas a refluxo de veias safenas. Objetivo Demonstrar a possibilidade de associar dois procedimentos, a escleroterapia com espuma de veias safenas e o enxerto de pele parcial, para o tratamento de pacientes com úlceras venosas relacionadas a refluxo de veias safenas. Métodos Foram tratados 20 membros em 20 pacientes, todos com ulcerações relacionadas a refluxo de veias safenas. Realizamos o enxerto de pele expandida, seguido da escleroterapia ecoguiada com espuma de polidocanol nas veias associadas às úlceras, através de punção ou dissecção da veia. Resultados Em todos os casos, houve melhora dos sintomas relacionados à úlcera e cicatrização da lesão. Em 11 casos, obtivemos a viabilidade do enxerto de pele por completo; em quatro casos, houve cicatrização de cerca de 50% da lesão; e nos cinco casos restantes, houve cicatrização de aproximadamente 75% da lesão. A primeira ultrassonografia de controle revelou esclerose completa dos vasos tratados em 19 dos 20 casos e esclerose parcial sem refluxo detectável em um caso. Na segunda ultrassonografia, realizada após 45 dias, observamos esclerose completa de 15 casos; em cinco casos, houve esclerose parcial, dos quais três sem refluxo detectável e dois com refluxo em segmentos isolados associados a varizes. A complicação mais frequente foi a pigmentação nos trajetos venosos, observada em 13 pacientes. Um caso apresentou trombose assintomática de veias musculares da perna. Conclusão Essa associação de procedimentos consiste em uma opção válida com potencial para promover um tratamento mais breve e de menor custo. PMID:29930660

  20. GPULife

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kelly, Priscilla N.

    2016-08-12

    The code runs the Game of Life among several processors. Each processor uses CUDA to set up the grid's buffer on the GPU, and that buffer is fed to other GPU languages to apply the rules of the game of life. Only the halo is copied off the buffer and exchanged using MPI. This code looks at the interoperability of GPU languages among current platforms.

  1. Observação do abrilhantamento de limbo solar e de estruturas filamentares em 48 ghz utilizando a técnica de regularização adaptativa

    NASA Astrophysics Data System (ADS)

    Machado, W. R. S.; Mascarenhas, N.; Costa, J. E. R.; Silva, A. V. R.

    2003-08-01

    O radiotelescópio do Itapetinga tem sido utilizado em campanhas de observações de explosões solares gerando um grande número de mapas diários em 48 GHz como sub-produto destas observações. A resolução espacial do telescópio de 14m do Itapetinga nesta freqüência é de aproximadamente dois minutos de arco. Estruturas de interesse para análise da atmosfera solar quiescente tais como os filamentos e o anel de abrilhantamento do limbo são de dimensão angular moderada da ordem ou ligeiramente menores que a resolução do telescópio. É conhecido que a convolução da função de espalhamento do telescópio, PSF (padrão de ganho do feixe) borra as estruturas de dimensão angular abaixo do HPBW (largura a meia potência do feixe) e portanto é comum a busca por técnicas de restauração que eliminem pelo menos em parte este borramento. Estudamos a restauração destas radioimagens usando a técnica de regularização adaptativa e os resultados ressaltam estas estruturas espaciais de pequeno contraste. O algoritmo da regularização adaptativa faz uso de k imagens, chamadas protótipos, obtidas através da variação de parâmetros de um filtro de regularização. Para controle da qualidade da restauração utilizamos uma imagem de alta resolução espacial obtida na linha H-a e a PSF do Itapetinga para borrá-la. Pequenos desvios, entre a PSF utilizada para o borramento e a PSF utilizada na restauração, produziram alguns desvios notáveis na imagem restaurada porém a adição de ruído nas simulações de restauração foram mais influentes no cálculo da rugosidade da imagem e portanto mais limitante para a restauração. Apresentamos como nosso primeiro resultado uma imagem em 48 GHz com a presença clara do abrilhantamento de limbo que não estava evidente na imagem original e traços de estruturas filamentares, porém ainda sem grande evidência.

  2. Telescópio de patrulhamento solar em 12 GHz

    NASA Astrophysics Data System (ADS)

    Utsumi, F.; Costa, J. E. R.

    2003-08-01

    O telescópio de patrulhamento solar é um instrumento dedicado à observação de explosões solares com início de suas operações em janeiro de 2002, trabalhando próximo ao pico de emissão do espectro girossincrotrônico (12 GHz). Trata-se de um arranjo de três antenas concebido para a detecção de explosões e determinação em tempo real da localização da região emissora. Porém, desde sua implementação em uma montagem equatorial movimentada por um sistema de rotação constante (15 graus/hora) o rastreio apresentou pequenas variações de velocidade e folgas nas caixas de engrenagens. Assim, tornou-se necessária a construção de um sistema de correção automática do apontamento que era de fundamental importância para os objetivos do projeto. No segundo semestre de 2002 empreendemos uma série de tarefas com o objetivo de automatizar completamente o rastreio, a calibração, a aquisição de dados, controle de ganhos, offsets e transferência dos dados pela internet através de um projeto custeado pela FAPESP. O rastreio automático é realizado através de um inversor que controla a freqüência da rede de alimentação do motor de rastreio podendo fazer micro-correções na direção leste-oeste conforme os radiômetros desta direção detectem uma variação relativa do sinal. Foi adicionado também um motor na direção da declinação para correção automática da variação da direção norte-sul. Após a implementação deste sistema a precisão do rastreio melhorou para um desvio máximo de 30 segundos de arco, o que está muito bom para este projeto. O Telescópio se encontra em funcionamento automático desde março de 2003 e já conta com várias explosões observadas após a conclusão desta fase de automação. Estamos apresentando as explosões mais intensas do período e com as suas respectivas posições no disco solar.

  3. Teamwork in nursing: restricted to nursing professionals or an interprofessional collaboration?

    PubMed

    Souza, Geisa Colebrusco de; Peduzzi, Marina; Silva, Jaqueline Alcântara Marcelino da; Carvalho, Brígida Gimenez

    2016-01-01

    To understand the nursing professionals' conceptions of teamwork and their elements. A qualitative study conducted in an oncological hospital using a semi-structured interview with 21 nursing professionals. Two conceptions emerged from the accounts: teamwork restricted to nursing professionals and teamwork with interprofessional collaboration with particular importance for interactive dimensions: communication, trust and professional bonds, mutual respect and recognition of the other's work, collaboration, and conflict, with this last subcategory considered as an obstacle to teamwork. Nursing conceives teamwork as an interprofessional practice, which is a result of the quality of interaction among professionals from different areas and involves the recognition and handling of conflicts. Compreender as concepções dos profissionais de enfermagem sobre trabalho em equipe e seus elementos constituintes. Pesquisa qualitativa, realizada em hospital oncológico, por meio de entrevista semiestruturada com 21 profissionais de enfermagem. Duas concepções emergiram dos relatos, trabalho em equipe circunscrito à enfermagem e trabalho em equipe com colaboração interprofissional, com destaque para dimensão interativa: comunicação, confiança e vínculo, respeito mútuo e reconhecimento do trabalho do outro, colaboração e conflito. Esta última subcategoria foi apontada como obstáculo para o trabalho em equipe. A enfermagem concebe majoritariamente o trabalho em equipe como ação interprofissional, e isto decorre da qualidade da interação entre os profissionais das diferentes áreas e o reconhecimento e manejo de conflitos.

  4. Multi-phase SPH modelling of violent hydrodynamics on GPUs

    NASA Astrophysics Data System (ADS)

    Mokos, Athanasios; Rogers, Benedict D.; Stansby, Peter K.; Domínguez, José M.

    2015-11-01

    This paper presents the acceleration of multi-phase smoothed particle hydrodynamics (SPH) using a graphics processing unit (GPU) enabling large numbers of particles (10-20 million) to be simulated on just a single GPU card. With novel hardware architectures such as a GPU, the optimum approach to implement a multi-phase scheme presents some new challenges. Many more particles must be included in the calculation and there are very different speeds of sound in each phase with the largest speed of sound determining the time step. This requires efficient computation. To take full advantage of the hardware acceleration provided by a single GPU for a multi-phase simulation, four different algorithms are investigated: conditional statements, binary operators, separate particle lists and an intermediate global function. Runtime results show that the optimum approach needs to employ separate cell and neighbour lists for each phase. The profiler shows that this approach leads to a reduction in both memory transactions and arithmetic operations giving significant runtime gains. The four different algorithms are compared to the efficiency of the optimised single-phase GPU code, DualSPHysics, for 2-D and 3-D simulations which indicate that the multi-phase functionality has a significant computational overhead. A comparison with an optimised CPU code shows a speed up of an order of magnitude over an OpenMP simulation with 8 threads and two orders of magnitude over a single thread simulation. A demonstration of the multi-phase SPH GPU code is provided by a 3-D dam break case impacting an obstacle. This shows better agreement with experimental results than an equivalent single-phase code. The multi-phase GPU code enables a convergence study to be undertaken on a single GPU with a large number of particles that otherwise would have required large high performance computing resources.

  5. Democratic population decisions result in robust policy-gradient learning: a parametric study with GPU simulations.

    PubMed

    Richmond, Paul; Buesing, Lars; Giugliano, Michele; Vasilaki, Eleni

    2011-05-04

    High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU programming to simulate a two-layer network of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and investigate its ability to learn a simplified navigation task using a policy-gradient learning rule stemming from Reinforcement Learning. The purpose of this paper is twofold. First, we want to support the use of GPUs in the field of Computational Neuroscience. Second, using GPU computing power, we investigate the conditions under which the said architecture and learning rule demonstrate best performance. Our work indicates that networks featuring strong Mexican-Hat-shaped recurrent connections in the top layer, where decision making is governed by the formation of a stable activity bump in the neural population (a "non-democratic" mechanism), achieve mediocre learning results at best. In absence of recurrent connections, where all neurons "vote" independently ("democratic") for a decision via population vector readout, the task is generally learned better and more robustly. Our study would have been extremely difficult on a desktop computer without the use of GPU programming. We present the routines developed for this purpose and show that a speed improvement of 5x up to 42x is provided versus optimised Python code. The higher speed is achieved when we exploit the parallelism of the GPU in the search of learning parameters. This suggests that efficient GPU programming can significantly reduce the time needed for simulating networks of spiking neurons, particularly when multiple parameter configurations are investigated.

  6. Adaptive multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model

    NASA Astrophysics Data System (ADS)

    Navarro, Cristóbal A.; Huang, Wei; Deng, Youjin

    2016-08-01

    This work presents an adaptive multi-GPU Exchange Monte Carlo approach for the simulation of the 3D Random Field Ising Model (RFIM). The design is based on a two-level parallelization. The first level, spin-level parallelism, maps the parallel computation as optimal 3D thread-blocks that simulate blocks of spins in shared memory with minimal halo surface, assuming a constant block volume. The second level, replica-level parallelism, uses multi-GPU computation to handle the simulation of an ensemble of replicas. CUDA's concurrent kernel execution feature is used in order to fill the occupancy of each GPU with many replicas, providing a performance boost that is more notorious at the smallest values of L. In addition to the two-level parallel design, the work proposes an adaptive multi-GPU approach that dynamically builds a proper temperature set free of exchange bottlenecks. The strategy is based on mid-point insertions at the temperature gaps where the exchange rate is most compromised. The extra work generated by the insertions is balanced across the GPUs independently of where the mid-point insertions were performed. Performance results show that spin-level performance is approximately two orders of magnitude faster than a single-core CPU version and one order of magnitude faster than a parallel multi-core CPU version running on 16-cores. Multi-GPU performance is highly convenient under a weak scaling setting, reaching up to 99 % efficiency as long as the number of GPUs and L increase together. The combination of the adaptive approach with the parallel multi-GPU design has extended our possibilities of simulation to sizes of L = 32 , 64 for a workstation with two GPUs. Sizes beyond L = 64 can eventually be studied using larger multi-GPU systems.

  7. CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.

    PubMed

    Liu, Yongchao; Wirawan, Adrianto; Schmidt, Bertil

    2013-04-04

    The maximal sensitivity for local alignments makes the Smith-Waterman algorithm a popular choice for protein sequence database search based on pairwise alignment. However, the algorithm is compute-intensive due to a quadratic time complexity. Corresponding runtimes are further compounded by the rapid growth of sequence databases. We present CUDASW++ 3.0, a fast Smith-Waterman protein database search algorithm, which couples CPU and GPU SIMD instructions and carries out concurrent CPU and GPU computations. For the CPU computation, this algorithm employs SSE-based vector execution units as accelerators. For the GPU computation, we have investigated for the first time a GPU SIMD parallelization, which employs CUDA PTX SIMD video instructions to gain more data parallelism beyond the SIMT execution model. Moreover, sequence alignment workloads are automatically distributed over CPUs and GPUs based on their respective compute capabilities. Evaluation on the Swiss-Prot database shows that CUDASW++ 3.0 gains a performance improvement over CUDASW++ 2.0 up to 2.9 and 3.2, with a maximum performance of 119.0 and 185.6 GCUPS, on a single-GPU GeForce GTX 680 and a dual-GPU GeForce GTX 690 graphics card, respectively. In addition, our algorithm has demonstrated significant speedups over other top-performing tools: SWIPE and BLAST+. CUDASW++ 3.0 is written in CUDA C++ and PTX assembly languages, targeting GPUs based on the Kepler architecture. This algorithm obtains significant speedups over its predecessor: CUDASW++ 2.0, by benefiting from the use of CPU and GPU SIMD instructions as well as the concurrent execution on CPUs and GPUs. The source code and the simulated data are available at http://cudasw.sourceforge.net.

  8. Revisiting Molecular Dynamics on a CPU/GPU system: Water Kernel and SHAKE Parallelization.

    PubMed

    Ruymgaart, A Peter; Elber, Ron

    2012-11-13

    We report Graphics Processing Unit (GPU) and Open-MP parallel implementations of water-specific force calculations and of bond constraints for use in Molecular Dynamics simulations. We focus on a typical laboratory computing-environment in which a CPU with a few cores is attached to a GPU. We discuss in detail the design of the code and we illustrate performance comparable to highly optimized codes such as GROMACS. Beside speed our code shows excellent energy conservation. Utilization of water-specific lists allows the efficient calculations of non-bonded interactions that include water molecules and results in a speed-up factor of more than 40 on the GPU compared to code optimized on a single CPU core for systems larger than 20,000 atoms. This is up four-fold from a factor of 10 reported in our initial GPU implementation that did not include a water-specific code. Another optimization is the implementation of constrained dynamics entirely on the GPU. The routine, which enforces constraints of all bonds, runs in parallel on multiple Open-MP cores or entirely on the GPU. It is based on Conjugate Gradient solution of the Lagrange multipliers (CG SHAKE). The GPU implementation is partially in double precision and requires no communication with the CPU during the execution of the SHAKE algorithm. The (parallel) implementation of SHAKE allows an increase of the time step to 2.0fs while maintaining excellent energy conservation. Interestingly, CG SHAKE is faster than the usual bond relaxation algorithm even on a single core if high accuracy is expected. The significant speedup of the optimized components transfers the computational bottleneck of the MD calculation to the reciprocal part of Particle Mesh Ewald (PME).

  9. GPU-accelerated track reconstruction in the ALICE High Level Trigger

    NASA Astrophysics Data System (ADS)

    Rohr, David; Gorbunov, Sergey; Lindenstruth, Volker; ALICE Collaboration

    2017-10-01

    ALICE (A Large Heavy Ion Experiment) is one of the four major experiments at the Large Hadron Collider (LHC) at CERN. The High Level Trigger (HLT) is an online compute farm which reconstructs events measured by the ALICE detector in real-time. The most compute-intensive part is the reconstruction of particle trajectories called tracking and the most important detector for tracking is the Time Projection Chamber (TPC). The HLT uses a GPU-accelerated algorithm for TPC tracking that is based on the Cellular Automaton principle and on the Kalman filter. The GPU tracking has been running in 24/7 operation since 2012 in LHC Run 1 and 2. In order to better leverage the potential of the GPUs, and speed up the overall HLT reconstruction, we plan to bring more reconstruction steps (e.g. the tracking for other detectors) onto the GPUs. There are several tasks running so far on the CPU that could benefit from cooperation with the tracking, which is hardly feasible at the moment due to the delay of the PCI Express transfers. Moving more steps onto the GPU, and processing them on the GPU at once, will reduce PCI Express transfers and free up CPU resources. On top of that, modern GPUs and GPU programming APIs provide new features which are not yet exploited by the TPC tracking. We present our new developments for GPU reconstruction, both with a focus on the online reconstruction on GPU for the online offline computing upgrade in ALICE during LHC Run 3, and also taking into account how the current HLT in Run 2 can profit from these improvements.

  10. Vínculos observacionais para o processo-S em estrelas gigantes de Bário

    NASA Astrophysics Data System (ADS)

    Smiljanic, R. H. S.; Porto de Mello, G. F.; da Silva, L.

    2003-08-01

    Estrelas de bário são gigantes vermelhas de tipo GK que apresentam excessos atmosféricos dos elementos do processo-s. Tais excessos são esperados em estrelas na fase de pulsos térmicos do AGB (TP-AGB). As estrelas de bário são, no entanto, menos massivas e menos luminosas que as estrelas do AGB, assim, não poderiam ter se auto-enriquecido. Seu enriquecimento teria origem em uma estrela companheira, inicialmente mais massiva, que evolui pelo TP-AGB, se auto-enriquece com os elementos do processo-s e transfere material contaminado para a atmosfera da atual estrela de bário. A companheira evolui então para anã branca deixando de ser observada diretamente. As estrelas de bário são, portanto, úteis como testes observacionais para teorias de nucleossíntese pelo processo-s, convecção e perda de massa. Análises detalhadas de abundância com dados de alta qualidade para estes objetos são ainda escassas na literatura. Neste trabalho construímos modelos de atmosferas e, procedendo a uma análise diferencial, determinamos parâmetros atmosféricos e evolutivos de uma amostra de dez gigantes de bário e quatro normais. Determinamos seus padrões de abundância para Na, Mg, Al, Si, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Sr, Y, Zr, Ba, La, Ce, Nd, Sm, Eu e Gd, concluindo que algumas estrelas classificadas na literatura como gigantes de bário são na verdade gigantes normais. Comparamos dois padrões médios de abundância, para estrelas com grandes excessos e estrelas com excessos moderados, com modelos teóricos de enriquecimento pelo processo-s. Os dois grupos de estrelas são ajustados pelos mesmos parâmetros de exposição de nêutrons. Tal resultado sugere que a ocorrência do fenômeno de bário com diferentes intensidades não se deve a diferentes exposições de nêutrons. Discutimos ainda efeitos nucleossintéticos, ligados ao processo-s, sugeridos na literatura para os elementos Cu, Mn, V e Sc.

  11. Storage strategies of eddy-current FE-BI model for GPU implementation

    NASA Astrophysics Data System (ADS)

    Bardel, Charles; Lei, Naiguang; Udpa, Lalita

    2013-01-01

    In the past few years graphical processing units (GPUs) have shown tremendous improvements in computational throughput over standard CPU architecture. However, this comes at the cost of restructuring the algorithms to meet the strengths and drawbacks of this GPU architecture. A major drawback is the state of limited memory, and hence storage of FE stiffness matrices on the GPU is important. In contrast to storage on CPU the GPU storage format has significant influence on the overall performance. This paper presents an investigation of a storage strategy in the implementation of a two-dimensional finite element-boundary integral (FE-BI) model for Eddy current NDE applications, on GPU architecture. Specifically, the high dimensional matrices are manipulated by examining the matrix structure and optimally splitting into structurally independent component matrices for efficient storage and retrieval of each component. Results obtained using the proposed approach are compared to those of conventional CPU implementation for validating the method.

  12. Semiempirical Quantum Chemical Calculations Accelerated on a Hybrid Multicore CPU-GPU Computing Platform.

    PubMed

    Wu, Xin; Koslowski, Axel; Thiel, Walter

    2012-07-10

    In this work, we demonstrate that semiempirical quantum chemical calculations can be accelerated significantly by leveraging the graphics processing unit (GPU) as a coprocessor on a hybrid multicore CPU-GPU computing platform. Semiempirical calculations using the MNDO, AM1, PM3, OM1, OM2, and OM3 model Hamiltonians were systematically profiled for three types of test systems (fullerenes, water clusters, and solvated crambin) to identify the most time-consuming sections of the code. The corresponding routines were ported to the GPU and optimized employing both existing library functions and a GPU kernel that carries out a sequence of noniterative Jacobi transformations during pseudodiagonalization. The overall computation times for single-point energy calculations and geometry optimizations of large molecules were reduced by one order of magnitude for all methods, as compared to runs on a single CPU core.

  13. GPU-Powered Coherent Beamforming

    NASA Astrophysics Data System (ADS)

    Magro, A.; Adami, K. Zarb; Hickish, J.

    2015-03-01

    Graphics processing units (GPU)-based beamforming is a relatively unexplored area in radio astronomy, possibly due to the assumption that any such system will be severely limited by the PCIe bandwidth required to transfer data to the GPU. We have developed a CUDA-based GPU implementation of a coherent beamformer, specifically designed and optimized for deployment at the BEST-2 array which can generate an arbitrary number of synthesized beams for a wide range of parameters. It achieves ˜1.3 TFLOPs on an NVIDIA Tesla K20, approximately 10x faster than an optimized, multithreaded CPU implementation. This kernel has been integrated into two real-time, GPU-based time-domain software pipelines deployed at the BEST-2 array in Medicina: a standalone beamforming pipeline and a transient detection pipeline. We present performance benchmarks for the beamforming kernel as well as the transient detection pipeline with beamforming capabilities as well as results of test observation.

  14. Impact of the Use of Different Diagnostic Criteria in the Prevalence of Dyslipidemia in Pregnant Women.

    PubMed

    Feitosa, Alina Coutinho Rodrigues; Barreto, Luciana Tedgue; Silva, Isabela Matos da; Silva, Felipe Freire da; Feitosa, Gilson Soares

    2017-07-01

    There is a physiologic elevation of total cholesterol (TC) and triglycerides (TG) during pregnancy. Some authors define dyslipidemia (DLP) in pregnant women when TC, LDL and TG concentrations are above the 95th percentile (p95%) and HDL concentration is below the 5th percentile (P5%) for gestational age (GA). To compare the prevalence of DLP in pregnant women using percentiles criteria with the V Brazilian Guidelines on Dyslipidemia and the association with maternal and fetal outcomes. Pregnant women with high-risk conditions, aged 18-50 years, and at least one lipid profile during pregnancy was classified as the presence of DLP by two diagnostic criteria. Clinical and laboratorial data of mothers and newborns were evaluated. 433 pregnant women aged 32.9 ± 6.5 years were studied. Most (54.6%) had lipid profile collected during third trimester. The prevalence of any lipid abnormalities according to the criteria of the National Guidelines was 83.8%: TC ≥ 200 mg/dL was found in 49.9%; LDL ≥ 160 mg/dL, in 14.3%, HDL ≤ 50 mg/dL in 44.4% and TG ≥ 150 mg/dL in 65.3%. Any changes of lipid according to percentiles criteria was found in 19.6%: elevation above the P95% for TC was found in 0.7%; for LDL, 1.7%; for TG 6.4% and HDL lower than the P5% in 13%. The frequency of comorbidity: hypertension, diabetes, smoking, obesity and preeclampsia was similar among pregnant women when DLP was compared by both criteria. The prevalence of DLP during pregnancy varies significantly depending on the criteria used, however none demonstrated superiority in association with comorbidities. Durante a gestação ocorrem, fisiologicamente, elevações do colesterol total (CT) e triglicerídios (TG). Alguns autores definem dislipidemia (DLP) gestacional quando as concentrações de CT, LDL e TG são superiores ao percentil 95 (P95%) e de HDL, inferiores ao percentil 5 (P5%) para a idade gestacional. Comparar a prevalência da DLP em gestantes conforme critério por percentis com o da V Diretriz Brasileira de Dislipidemia e avaliar a associação com desfechos materno-fetais. Gestantes com patologias de alto risco, idade entre 18 a 50 anos, e, pelo menos um perfil lipídico durante a gestação foram classificadas quanto à presença de DLP por dois critérios. Dados clínicos e laboratoriais das mães e neonatos foram avaliados. Estudou-se 433 gestantes com idade de 32,9 ± 6,5 anos. A maioria (54,6%) teve o perfil lipídico coletado no terceiro trimestre. A prevalência de quaisquer das alterações lipídicas, conforme os critérios da Diretriz Nacional, foi de 83,8%: CT ≥ 200 mg/dL foi encontrado em 49,9%; LDL ≥ 160 mg/dL, em 14,3%, HDL ≤ 50 mg/dL em 44,4% e TG ≥ 150 mg/dL, em 65,3%. Quaisquer das alterações lipídicas pelo critério dos percentis foi encontrada em 19,6%: sendo que elevação superior ao P95% para CT foi encontrada em 0,7%; para LDL, em 1,7%; para TG, em 6,4% e inferiores ao P5% para o HDL em 13%. A frequência das comorbidades: hipertensão, diabetes, tabagismo, obesidade e pré-eclâmpsia foi semelhante entre as gestantes quando se comparou DLP pelos dois critérios. A prevalência de DLP na gestação variou significativamente conforme o critério utilizado, entretanto nenhum demonstrou superioridade na associação com comorbidades.

  15. Particle-in-cell simulations on graphic processing units

    NASA Astrophysics Data System (ADS)

    Ren, C.; Zhou, X.; Li, J.; Huang, M. C.; Zhao, Y.

    2014-10-01

    We will show our recent progress in using GPU's to accelerate the PIC code OSIRIS [Fonseca et al. LNCS 2331, 342 (2002)]. The OISRIS parallel structure is retained and the computation-intensive kernels are shipped to GPU's. Algorithms for the kernels are adapted for the GPU, including high-order charge-conserving current deposition schemes with few branching and parallel particle sorting [Kong et al., JCP 230, 1676 (2011)]. These algorithms make efficient use of the GPU shared memory. This work was supported by U.S. Department of Energy under Grant No. DE-FC02-04ER54789 and by NSF under Grant No. PHY-1314734.

  16. Improving GPU-accelerated adaptive IDW interpolation algorithm using fast kNN search.

    PubMed

    Mei, Gang; Xu, Nengxiong; Xu, Liangliang

    2016-01-01

    This paper presents an efficient parallel Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm on modern Graphics Processing Unit (GPU). The presented algorithm is an improvement of our previous GPU-accelerated AIDW algorithm by adopting fast k-nearest neighbors (kNN) search. In AIDW, it needs to find several nearest neighboring data points for each interpolated point to adaptively determine the power parameter; and then the desired prediction value of the interpolated point is obtained by weighted interpolating using the power parameter. In this work, we develop a fast kNN search approach based on the space-partitioning data structure, even grid, to improve the previous GPU-accelerated AIDW algorithm. The improved algorithm is composed of the stages of kNN search and weighted interpolating. To evaluate the performance of the improved algorithm, we perform five groups of experimental tests. The experimental results indicate: (1) the improved algorithm can achieve a speedup of up to 1017 over the corresponding serial algorithm; (2) the improved algorithm is at least two times faster than our previous GPU-accelerated AIDW algorithm; and (3) the utilization of fast kNN search can significantly improve the computational efficiency of the entire GPU-accelerated AIDW algorithm.

  17. Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

    DOE PAGES

    Basu, Protonu; Williams, Samuel; Van Straalen, Brian; ...

    2017-04-05

    GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less

  18. Thread scheduling for GPU-based OPC simulation on multi-thread

    NASA Astrophysics Data System (ADS)

    Lee, Heejun; Kim, Sangwook; Hong, Jisuk; Lee, Sooryong; Han, Hwansoo

    2018-03-01

    As semiconductor product development based on shrinkage continues, the accuracy and difficulty required for the model based optical proximity correction (MBOPC) is increasing. OPC simulation time, which is the most timeconsuming part of MBOPC, is rapidly increasing due to high pattern density in a layout and complex OPC model. To reduce OPC simulation time, we attempt to apply graphic processing unit (GPU) to MBOPC because OPC process is good to be programmed in parallel. We address some issues that may typically happen during GPU-based OPC simulation in multi thread system, such as "out of memory" and "GPU idle time". To overcome these problems, we propose a thread scheduling method, which manages OPC jobs in multiple threads in such a way that simulations jobs from multiple threads are alternatively executed on GPU while correction jobs are executed at the same time in each CPU cores. It was observed that the amount of GPU peak memory usage decreases by up to 35%, and MBOPC runtime also decreases by 4%. In cases where out of memory issues occur in a multi-threaded environment, the thread scheduler was used to improve MBOPC runtime up to 23%.

  19. GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method

    NASA Astrophysics Data System (ADS)

    Gong, Chunye; Liu, Jie; Chi, Lihua; Huang, Haowei; Fang, Jingyue; Gong, Zhenghu

    2011-07-01

    Graphics Processing Unit (GPU), originally developed for real-time, high-definition 3D graphics in computer games, now provides great faculty in solving scientific applications. The basis of particle transport simulation is the time-dependent, multi-group, inhomogeneous Boltzmann transport equation. The numerical solution to the Boltzmann equation involves the discrete ordinates ( Sn) method and the procedure of source iteration. In this paper, we present a GPU accelerated simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The performance of the GPU simulations are reported with the simulations of vacuum boundary condition. The discussion of the relative advantages and disadvantages of the GPU implementation, the simulation on multi GPUs, the programming effort and code portability are also reported. The results show that the overall performance speedup of one NVIDIA Tesla M2050 GPU ranges from 2.56 compared with one Intel Xeon X5670 chip to 8.14 compared with one Intel Core Q6600 chip for no flux fixup. The simulation with flux fixup on one M2050 is 1.23 times faster than on one X5670.

  20. Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU

    PubMed Central

    Xia, Yong; Zhang, Henggui

    2015-01-01

    Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations. PMID:26581957

  1. Fast MPEG-CDVS Encoder With GPU-CPU Hybrid Computing

    NASA Astrophysics Data System (ADS)

    Duan, Ling-Yu; Sun, Wei; Zhang, Xinfeng; Wang, Shiqi; Chen, Jie; Yin, Jianxiong; See, Simon; Huang, Tiejun; Kot, Alex C.; Gao, Wen

    2018-05-01

    The compact descriptors for visual search (CDVS) standard from ISO/IEC moving pictures experts group (MPEG) has succeeded in enabling the interoperability for efficient and effective image retrieval by standardizing the bitstream syntax of compact feature descriptors. However, the intensive computation of CDVS encoder unfortunately hinders its widely deployment in industry for large-scale visual search. In this paper, we revisit the merits of low complexity design of CDVS core techniques and present a very fast CDVS encoder by leveraging the massive parallel execution resources of GPU. We elegantly shift the computation-intensive and parallel-friendly modules to the state-of-the-arts GPU platforms, in which the thread block allocation and the memory access are jointly optimized to eliminate performance loss. In addition, those operations with heavy data dependence are allocated to CPU to resolve the extra but non-necessary computation burden for GPU. Furthermore, we have demonstrated the proposed fast CDVS encoder can work well with those convolution neural network approaches which has harmoniously leveraged the advantages of GPU platforms, and yielded significant performance improvements. Comprehensive experimental results over benchmarks are evaluated, which has shown that the fast CDVS encoder using GPU-CPU hybrid computing is promising for scalable visual search.

  2. High Performance GPU-Based Fourier Volume Rendering.

    PubMed

    Abdellah, Marwan; Eldeib, Ayman; Sharawi, Amr

    2015-01-01

    Fourier volume rendering (FVR) is a significant visualization technique that has been used widely in digital radiography. As a result of its (N (2)log⁡N) time complexity, it provides a faster alternative to spatial domain volume rendering algorithms that are (N (3)) computationally complex. Relying on the Fourier projection-slice theorem, this technique operates on the spectral representation of a 3D volume instead of processing its spatial representation to generate attenuation-only projections that look like X-ray radiographs. Due to the rapid evolution of its underlying architecture, the graphics processing unit (GPU) became an attractive competent platform that can deliver giant computational raw power compared to the central processing unit (CPU) on a per-dollar-basis. The introduction of the compute unified device architecture (CUDA) technology enables embarrassingly-parallel algorithms to run efficiently on CUDA-capable GPU architectures. In this work, a high performance GPU-accelerated implementation of the FVR pipeline on CUDA-enabled GPUs is presented. This proposed implementation can achieve a speed-up of 117x compared to a single-threaded hybrid implementation that uses the CPU and GPU together by taking advantage of executing the rendering pipeline entirely on recent GPU architectures.

  3. Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU.

    PubMed

    Xia, Yong; Wang, Kuanquan; Zhang, Henggui

    2015-01-01

    Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations.

  4. Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Basu, Protonu; Williams, Samuel; Van Straalen, Brian

    GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less

  5. The development of GPU-based parallel PRNG for Monte Carlo applications in CUDA Fortran

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kargaran, Hamed, E-mail: h-kargaran@sbu.ac.ir; Minuchehr, Abdolhamid; Zolfaghari, Ahmad

    The implementation of Monte Carlo simulation on the CUDA Fortran requires a fast random number generation with good statistical properties on GPU. In this study, a GPU-based parallel pseudo random number generator (GPPRNG) have been proposed to use in high performance computing systems. According to the type of GPU memory usage, GPU scheme is divided into two work modes including GLOBAL-MODE and SHARED-MODE. To generate parallel random numbers based on the independent sequence method, the combination of middle-square method and chaotic map along with the Xorshift PRNG have been employed. Implementation of our developed PPRNG on a single GPU showedmore » a speedup of 150x and 470x (with respect to the speed of PRNG on a single CPU core) for GLOBAL-MODE and SHARED-MODE, respectively. To evaluate the accuracy of our developed GPPRNG, its performance was compared to that of some other commercially available PPRNGs such as MATLAB, FORTRAN and Miller-Park algorithm through employing the specific standard tests. The results of this comparison showed that the developed GPPRNG in this study can be used as a fast and accurate tool for computational science applications.« less

  6. A novel heterogeneous algorithm to simulate multiphase flow in porous media on multicore CPU-GPU systems

    NASA Astrophysics Data System (ADS)

    McClure, J. E.; Prins, J. F.; Miller, C. T.

    2014-07-01

    Multiphase flow implementations of the lattice Boltzmann method (LBM) are widely applied to the study of porous medium systems. In this work, we construct a new variant of the popular "color" LBM for two-phase flow in which a three-dimensional, 19-velocity (D3Q19) lattice is used to compute the momentum transport solution while a three-dimensional, seven velocity (D3Q7) lattice is used to compute the mass transport solution. Based on this formulation, we implement a novel heterogeneous GPU-accelerated algorithm in which the mass transport solution is computed by multiple shared memory CPU cores programmed using OpenMP while a concurrent solution of the momentum transport is performed using a GPU. The heterogeneous solution is demonstrated to provide speedup of 2.6 × as compared to multi-core CPU solution and 1.8 × compared to GPU solution due to concurrent utilization of both CPU and GPU bandwidths. Furthermore, we verify that the proposed formulation provides an accurate physical representation of multiphase flow processes and demonstrate that the approach can be applied to perform heterogeneous simulations of two-phase flow in porous media using a typical GPU-accelerated workstation.

  7. The Expected Cardiovascular Benefit of Plasma Cholesterol Lowering with or Without LDL-C Targets in Healthy Individuals at Higher Cardiovascular Risk.

    PubMed

    Cesena, Fernando Henpin Yue; Laurinavicius, Antonio Gabriele; Valente, Viviane A; Conceição, Raquel D; Santos, Raul D; Bittencourt, Marcio S

    2017-06-01

    There is controversy whether management of blood cholesterol should be based or not on LDL-cholesterol (LDL-c) target concentrations. To compare the estimated impact of different lipid-lowering strategies, based or not on LDL-c targets, on the risk of major cardiovascular events in a population with higher cardiovascular risk. We included consecutive individuals undergoing a routine health screening in a single center who had a 10-year risk for atherosclerotic cardiovascular disease (ASCVD) ≥ 7.5% (pooled cohort equations, ACC/AHA, 2013). For each individual, we simulated two strategies based on LDL-c target (≤ 100 mg/dL [Starget-100] or ≤ 70 mg/dL [Starget-70]) and two strategies based on percent LDL-c reduction (30% [S30%] or 50% [S50%]). In 1,897 subjects (57 ± 7 years, 96% men, 10-year ASCVD risk 13.7 ± 7.1%), LDL-c would be lowered from 141 ± 33 mg/dL to 99 ± 23 mg/dL in S30%, 71 ± 16 mg/dL in S50%, 98 ± 9 mg/dL in Starget-100, and 70 ± 2 mg/dL in Starget-70. Ten-year ASCVD risk would be reduced to 8.8 ± 4.8% in S50% and 8.9 ± 5.2 in Starget-70. The number of major cardiovascular events prevented in 10 years per 1,000 individuals would be 32 in S30%, 31 in Starget-100, 49 in S50%, and 48 in Starget-70. Compared with Starget-70, S50% would prevent more events in the lower LDL-c tertile and fewer events in the higher LDL-c tertile. The more aggressive lipid-lowering approaches simulated in this study, based on LDL-c target or percent reduction, may potentially prevent approximately 50% more hard cardiovascular events in the population compared with the less intensive treatments. Baseline LDL-c determines which strategy (based or not on LDL-c target) is more appropriate at the individual level. Há controvérsias sobre se o controle do colesterol plasmático deve ou não se basear em metas de concentração de colesterol LDL (LDL-c). Comparar o impacto estimado de diferentes estratégias hipolipemiantes, baseadas ou não em metas de LDL-c, sobre o risco de eventos cardiovasculares maiores em uma população de risco cardiovascular mais elevado. Foram incluídos indivíduos consecutivamente submetidos a uma avaliação rotineira de saúde em um único centro e que apresentavam um risco em 10 anos de doença cardiovascular aterosclerótica (DCVAS) ≥ 7,5% ("pooled cohort equations", ACC/AHA, 2013). Para cada indivíduo, foram simuladas duas estratégias baseadas em meta de LDL-c (≤ 100 mg/dL [Emeta-100] ou ≤ 70 mg/dL [Emeta-70]) e duas estratégias baseadas em redução percentual do LDL-c (30% [E30%] ou 50% [E50%]). Em 1.897 indivíduos (57 ± 7 anos, 96% homens, risco em 10 anos de DCVAS 13,7 ± 7,1%), o LDL-c seria reduzido de 141 ± 33 mg/dL para 99 ± 23 mg/dL na E30%, 71 ± 16 mg/dL na E50%, 98 ± 9 mg/dL na Emeta-100 e 70 ± 2 mg/dL na Emeta-70. O risco em 10 anos de DCVAS seria reduzido para 8,8 ± 4,8% na E50% e para 8,9 ± 5,2 na Emeta-70. O número de eventos cardiovasculares maiores prevenidos em 10 anos por 1.000 indivíduos seria de 32 na E30%, 31 na Emeta-100, 49 na E50% e 48 na Emeta-70. Em comparação com a Emeta-70, a E50% evitaria mais eventos no tercil inferior de LDL-c e menos eventos no tercil superior de LDL-c. As abordagens hipolipemiantes mais agressivas simuladas neste estudo, com base em meta de LDL-c ou redução percentual, podem potencialmente prevenir cerca de 50% mais eventos cardiovasculares graves na população em comparação com os tratamentos menos intensivos. Os níveis basais de LDL-c determinam qual estratégia (baseada ou não em meta de LDL-c) é mais apropriada para cada indivíduo.

  8. TRANEXAMIC ACID ACTION ON LIVER REGENERATION AFTER PARTIAL HEPATECTOMY: EXPERIMENTAL MODEL IN RATS.

    PubMed

    Sobral, Felipe Antonio; Daga, Henrique; Rasera, Henrique Nogueira; Pinheiro, Matheus da Rocha; Cella, Igor Furlan; Morais, Igor Henrique; Marques, Luciana de Oliveira; Collaço, Luiz Martins

    2016-01-01

    Different lesions may affect the liver resulting in harmful stimuli. Some therapeutic procedures to treat those injuries depend on liver regeneration to increase functional capacity of this organ. Evaluate the effects of tranexamic acid on liver regeneration after partial hepatectomy in rats. 40 rats (Rattus norvegicus albinus, Rodentia mammalia) of Wistar-UP lineage were randomly divided into two groups named control (CT) and tranexamic acid (ATX), with 20 rats in each. Both groups were subdivided, according to liver regeneration time of 32 h or seven days after the rats had been operated. The organ regeneration was evaluated through weight and histology, stained with HE and PCNA. The average animal weight of ATX and CT 7 days groups before surgery were 411.2 g and 432.7 g, and 371.3 g and 392.9 g after the regeneration time, respectively. The average number of mitotic cells stained with HE for the ATX and CT 7 days groups were 33.7 and 32.6 mitosis, and 14.5 and 14.9 for the ATX and CT 32 h groups, respectively. When stained with proliferating cell nuclear antigen, the numbers of mitotic cells counted were 849.7 for the ATX 7 days, 301.8 for the CT 7 days groups, 814.2 for the ATX 32 hand 848.1 for the CT 32 h groups. Tranexamic acid was effective in liver regeneration, but in longer period after partial hepatectomy. Muitas são as injúrias que acometem o fígado e levam a estímulo lesivo. Alguns procedimentos terapêuticos para tratamento dessas lesões dependem da regeneração hepática para aumentar a sua capacidade funcional. Avaliar o efeito do ácido tranexâmico na regeneração hepática após hepatectomia parcial em ratos. Foram utilizados 40 ratos (Rattus norvegicus albinus, Rodentia mammalia) convencionais da linhagem Wistar-UP. Foram divididos aleatoriamente em dois grupos de 20: grupo controle (CT) e grupo ácido tranexâmico (ATX). Cada um deles foi divido em dois subgrupos para avaliar a regeneração hepática no tempo de 32 h e 7 dias do pós-operatório. A regeneração do órgão foi avaliada quanto ao peso e histologia, sendo esta última por hematoxilina-eosina e antígeno nuclear de proliferação celular. A média dos pesos dos animais dos grupos ATX 7 dias e CT 7 dias no pré-operatório foram de 411,2 g e 432,7 g, respectivamente, e após a regeneração foram de 371,3 g e 392,9 g. As médias das taxas de mitose coradas por HE dos dois grupos em 7 dias foram de 33,7 e 32,6 mitoses, respectivamente, e de 14,5 e 14,9 mitoses para os grupos ATX e CT 32 h. A contagem de células por antígeno nuclear de proliferação celular mostrou valores de 849,7 para o grupo ATX 7 dias e 301,8 para o CT 7 dias; 814,2 para o grupo ATX 32 h e 848,1 para o CT 32 h. O ácido tranexâmico mostrou-se efetivo na regeneração hepática somente em período mais longo de observação após hepatectomia parcial.

  9. Trombose induzida pelo calor endovenoso: relato de dois casos tratados com rivaroxabana e revisão da literatura

    PubMed Central

    de Araujo, Walter Junior Boim; Timi, Jorge Rufino Ribas; Erzinger, Fabiano Luiz; Caron, Filipe Carlos

    2016-01-01

    Resumo Define-se trombose induzida pelo calor endovenoso como a propagação do trombo a partir de uma veia superficial em direção a uma veia mais profunda. Em geral, é considerada clinicamente insignificante quando não há propagação do trombo para o sistema venoso profundo. Essa condição pode ser tratada com terapia anticoagulante, embora a observação pareça ser suficiente, principalmente para graus menores. Neste estudo, relatamos dois casos de trombose induzida pelo calor endovenoso que teriam indicação de heparina de baixo peso molecular até a resolução do quadro. Porém, optou-se pelo uso da rivaroxabana (15 mg de 12 em 12h), com resolução completa do trombo em 4 semanas (caso 1) e em 7 dias (caso 2). A rivaroxabana pode ser uma alternativa promissora no tratamento da trombose induzida pelo calor endovenoso avançada, pela simplicidade da posologia, sem comprometimento da eficácia ou da segurança. São necessários estudos prospectivos, randomizados e controlados que possibilitem melhor entendimento da condição e o desenvolvimento de recomendações mais definitivas sobre opções de prevenção e tratamento.

  10. GPU-based parallel algorithm for blind image restoration using midfrequency-based methods

    NASA Astrophysics Data System (ADS)

    Xie, Lang; Luo, Yi-han; Bao, Qi-liang

    2013-08-01

    GPU-based general-purpose computing is a new branch of modern parallel computing, so the study of parallel algorithms specially designed for GPU hardware architecture is of great significance. In order to solve the problem of high computational complexity and poor real-time performance in blind image restoration, the midfrequency-based algorithm for blind image restoration was analyzed and improved in this paper. Furthermore, a midfrequency-based filtering method is also used to restore the image hardly with any recursion or iteration. Combining the algorithm with data intensiveness, data parallel computing and GPU execution model of single instruction and multiple threads, a new parallel midfrequency-based algorithm for blind image restoration is proposed in this paper, which is suitable for stream computing of GPU. In this algorithm, the GPU is utilized to accelerate the estimation of class-G point spread functions and midfrequency-based filtering. Aiming at better management of the GPU threads, the threads in a grid are scheduled according to the decomposition of the filtering data in frequency domain after the optimization of data access and the communication between the host and the device. The kernel parallelism structure is determined by the decomposition of the filtering data to ensure the transmission rate to get around the memory bandwidth limitation. The results show that, with the new algorithm, the operational speed is significantly increased and the real-time performance of image restoration is effectively improved, especially for high-resolution images.

  11. Large Scale Document Inversion using a Multi-threaded Computing System

    PubMed Central

    Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

    2018-01-01

    Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations. PMID:29861701

  12. Novel hybrid GPU-CPU implementation of parallelized Monte Carlo parametric expectation maximization estimation method for population pharmacokinetic data analysis.

    PubMed

    Ng, C M

    2013-10-01

    The development of a population PK/PD model, an essential component for model-based drug development, is both time- and labor-intensive. A graphical-processing unit (GPU) computing technology has been proposed and used to accelerate many scientific computations. The objective of this study was to develop a hybrid GPU-CPU implementation of parallelized Monte Carlo parametric expectation maximization (MCPEM) estimation algorithm for population PK data analysis. A hybrid GPU-CPU implementation of the MCPEM algorithm (MCPEMGPU) and identical algorithm that is designed for the single CPU (MCPEMCPU) were developed using MATLAB in a single computer equipped with dual Xeon 6-Core E5690 CPU and a NVIDIA Tesla C2070 GPU parallel computing card that contained 448 stream processors. Two different PK models with rich/sparse sampling design schemes were used to simulate population data in assessing the performance of MCPEMCPU and MCPEMGPU. Results were analyzed by comparing the parameter estimation and model computation times. Speedup factor was used to assess the relative benefit of parallelized MCPEMGPU over MCPEMCPU in shortening model computation time. The MCPEMGPU consistently achieved shorter computation time than the MCPEMCPU and can offer more than 48-fold speedup using a single GPU card. The novel hybrid GPU-CPU implementation of parallelized MCPEM algorithm developed in this study holds a great promise in serving as the core for the next-generation of modeling software for population PK/PD analysis.

  13. Large Scale Document Inversion using a Multi-threaded Computing System.

    PubMed

    Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

    2017-06-01

    Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.

  14. Piezoresponse force microscopy of ferroelectric relaxors =

    NASA Astrophysics Data System (ADS)

    Kiselev, Dmitry

    Nesta tese, ferroelectricos relaxor (I dont know uf the order is correct) de base Pb das familias (Pb,La)(Zr,Ti)O3 (PLZT), Pb(Mg1/3,Nb2/3)O3-PbTiO3 (PMN-PT), Pb(Zn1/3,Nb2/3)O3-PbTiO3 (PZN-PT) foram investigados e analisados. As propriedades ferroelectricas e dielectricas das amostras foram estudadas por metodos convencionais de macro e localmente por microscopia de forca piezoelectrica (PFM). Nos cerâmicos PLZT 9.75/65/35 o contraste da PFM a escala nanometrica foi investigado em funcao do tamanho e orientacao dos graos. Apurou-se que a intensidade do sinal piezoelectrico das nanoestruturas diminui com o aumento da temperatura e desaparece a 490 K (La mol. 8%) e 420 K (9,5%). Os ciclos de histerese locais foram obtidos em funcao da temperatura. A evolucao dos parâmetros macroscopicos e locais com a temperatura de superficie sugere um forte efeito de superficie nas transicoes de fase ferroelectricas do material investigado. A rugosidade da parede de dominio e determinada por PFM para a estrutura de dominio natural existente neste ferroelectrico policristalino. Alem disso, os dominios ferroelectricos artificiais foram criados pela aplicacao de pulsos electricos a ponta do condutor PFM e o tamanho de dominio in-plane foi medido em funcao da duracao do pulso. Todas estas experiencias levaram a conclusao de que a parede de dominio em relaxors do tipo PZT e quase uma interface unidimensional. O mecanismo de contraste na superficie de relaxors do tipo PLZT e medido por PFMAs estruturas de dominio versus evolucao da profundidade foram estudadas em cristais PZN-4,5%PT, com diferentes orientacoes atraves da PFM. Padroes de dominio irregulares com tamanhos tipicos de 20-100 nm foram observados nas superficies com orientacao das amostras unpoled?. Pelo contrario, os cortes de cristal exibem dominios regulares de tamanho micron normal, com os limites do dominio orientados ao longo dos planos cristalograficos permitidos. A existencia de nanodominios em cristais com orientacao esta provisoriamente (wrong Word) atribuida a natureza relaxor de PZN-PT, onde pequenos grupos polares podem formar-se em coindicoes de zero-field-cooling (ZFC). Estes nanodominios sao considerados como os nucleos do estado de polarizacao oposta e podem ser responsaveis pelo menor campo coercitivo para este corte de cristal em particular. No entanto, a histerese local piezoeletrica realizada pelo PFM a escala nanometrica indica uma mudanca de comportamento de PZN-PT semelhante para ambas as orientacoes cristalograficas investigadas. A evolucao das estruturas de dominio com polimento abaixo da superficie do cristal foi investigada. O dominio de ramificacoes e os efeitos de polarizacao de triagem apos o polimento e as medicoes de temperatura tem sido estudados pela PFM e pela analise SEM. Alem disso, verificou-se que a intensidade do sinal piezoelectrico a partir das estruturas de nanodominio diminui com o aumento da temperatura, acabando por desaparecer aos 430 K (orientacao ) e 470 K (orientacao ). Esta diferenca de temperatura nas transicoes de fase local em cristais de diferentes orientacoes e explicada pelo forte efeito de superficie na transicao da fase ferroeletrica em relaxors. A comutacao da polarizacao em relaxor ergodico e nas fases ferroelectricas do sistema PMN-PT foram realizadas pela combinacao de tres metodos, Microscopia de Forca Piezoelectrica, medicao de um unico ponto de relaxamento eletromecânico e por ultimo mapeamento de espectroscopia de tensao. A dependencia do comportamento do relaxamento na amplitude e tempo da tensao de pulso foi encontrada para seguir um comportamento logaritmico universal com uma inclinacao quase constante. Este comportamento e indicativo da progressiva populacao dos estados de relaxamento lento, ao contrario de uma relaxacao linear na presenca de uma ampla distribuicao do tempo de relaxamento. O papel do comportamento de relaxamento, da nao-linearidade ferroelectrica e da heterogeneidade espacial do campo na ponta da sonda de AFM sobre o comportamento do ciclo de histerese e analisada em detalhe. Os ciclos de histerese para ergodica PMN- 10%PT sao mostrados como cineticamente limitados, enquanto que no PMN, com maior teor de PT, sao observados verdadeiros ciclos de histerese ferroelectrica com vies de baixa nucleacao.

  15. Telescópio de pequeno porte como suporte ao ensino em cidades com intensa poluição luminosa II

    NASA Astrophysics Data System (ADS)

    Pereira, P. C. R.; Santos-Júnior, J. M.; Cruz, W. S.

    2003-08-01

    Para a maioria dos estudantes, sua passagem pelo ensino formal fundamental envolve a transmissão de fatos que devem ser guardados para um exame, a habilidade para lembrar fórmulas e, eventualmente, a repetição de experimentos que devem produzir resultados exigidos pelo professor. O resultado deste modelo de ensino, ao longo dos anos, é conhecido por todos: desconhecimento e descontentamento, por parte dos estudantes, de temas relativos ao papel e aos processos da ciência. Acreditamos que a Astronomia, pelo seu caráter observacional, é uma das áreas do conhecimento que pode contribuir neste cenário. A Fundação Planetário da Cidade do Rio de Janeiro possui um telescópio Meade LX-200 (25cm) que, juntamente com as câmeras CCD ST-7E e ST8E, tem sido utilizado em projetos voltados aos estudantes do ensino médio desde o ano 2000. Tais projetos envolvem a condução de um projeto de pesquisa observacional num nível apropriado, e possibilitam o contato com técnicas e novas tecnologias: computador, software para manipulação de dados e gráficos, programas de tratamento e redução de dados, uso de equipamentos óptico-eletrônicos (telescópio e CCD), bem como o processo de aquisição de conhecimento. Dentro da proposta dos anos anteriores, priorizamos projetos de uma noite, ou seja, procuramos trabalhar com fenômenos que apresentem variabilidade com intervalo de recorrência relativamente curto. Em todos os casos, optamos pela fotometria diferencial, que tem se mostrado bastante eficiente para o céu luminoso como o da cidade do Rio de Janeiro. Neste painel, apresentamos alguns dos projetos desenvolvidos no último ano, com 25 estudantes. Apresentamos os resultados da observação da variável pulsante AI Vel (V = 6,6) e da variável cataclísmica FO Aqr (V = 13,5), e do monitoramento do trânsito da lua de Júpiter, Europa, ocorrido em 30 de abril de 2003. As curvas de luz produzidas para as primeiras estão concordantes com as da literatura, assim como os respectivos períodos encontrados (1h20min e 4h48min). No caso do FO Aqr, ficou evidente, também, a modulação decorrente da rotação da anã branca receptora (21min). O erro estimado é de 0,01 magnitude. Propomos uma maior utilização de telescópios de pequeno porte, como suporte ao ensino (médio e superior) em cidades com poluição luminosa. Escolas e Planetários seriam ambientes propícios para a localização do telescópio. Os critérios adotados na escolha dos objetos e o método observacional empregado são também apresentados.

  16. GPU applications for data processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vladymyrov, Mykhailo, E-mail: mykhailo.vladymyrov@cern.ch; Aleksandrov, Andrey; INFN sezione di Napoli, I-80125 Napoli

    2015-12-31

    Modern experiments that use nuclear photoemulsion imply fast and efficient data acquisition from the emulsion can be performed. The new approaches in developing scanning systems require real-time processing of large amount of data. Methods that use Graphical Processing Unit (GPU) computing power for emulsion data processing are presented here. It is shown how the GPU-accelerated emulsion processing helped us to rise the scanning speed by factor of nine.

  17. TH-A-18C-09: Ultra-Fast Monte Carlo Simulation for Cone Beam CT Imaging of Brain Trauma

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sisniega, A; Zbijewski, W; Stayman, J

    Purpose: Application of cone-beam CT (CBCT) to low-contrast soft tissue imaging, such as in detection of traumatic brain injury, is challenged by high levels of scatter. A fast, accurate scatter correction method based on Monte Carlo (MC) estimation is developed for application in high-quality CBCT imaging of acute brain injury. Methods: The correction involves MC scatter estimation executed on an NVIDIA GTX 780 GPU (MC-GPU), with baseline simulation speed of ~1e7 photons/sec. MC-GPU is accelerated by a novel, GPU-optimized implementation of variance reduction (VR) techniques (forced detection and photon splitting). The number of simulated tracks and projections is reduced formore » additional speed-up. Residual noise is removed and the missing scatter projections are estimated via kernel smoothing (KS) in projection plane and across gantry angles. The method is assessed using CBCT images of a head phantom presenting a realistic simulation of fresh intracranial hemorrhage (100 kVp, 180 mAs, 720 projections, source-detector distance 700 mm, source-axis distance 480 mm). Results: For a fixed run-time of ~1 sec/projection, GPU-optimized VR reduces the noise in MC-GPU scatter estimates by a factor of 4. For scatter correction, MC-GPU with VR is executed with 4-fold angular downsampling and 1e5 photons/projection, yielding 3.5 minute run-time per scan, and de-noised with optimized KS. Corrected CBCT images demonstrate uniformity improvement of 18 HU and contrast improvement of 26 HU compared to no correction, and a 52% increase in contrast-tonoise ratio in simulated hemorrhage compared to “oracle” constant fraction correction. Conclusion: Acceleration of MC-GPU achieved through GPU-optimized variance reduction and kernel smoothing yields an efficient (<5 min/scan) and accurate scatter correction that does not rely on additional hardware or simplifying assumptions about the scatter distribution. The method is undergoing implementation in a novel CBCT dedicated to brain trauma imaging at the point of care in sports and military applications. Research grant from Carestream Health. JY is an employee of Carestream Health.« less

  18. Distributed GPU Computing in GIScience

    NASA Astrophysics Data System (ADS)

    Jiang, Y.; Yang, C.; Huang, Q.; Li, J.; Sun, M.

    2013-12-01

    Geoscientists strived to discover potential principles and patterns hidden inside ever-growing Big Data for scientific discoveries. To better achieve this objective, more capable computing resources are required to process, analyze and visualize Big Data (Ferreira et al., 2003; Li et al., 2013). Current CPU-based computing techniques cannot promptly meet the computing challenges caused by increasing amount of datasets from different domains, such as social media, earth observation, environmental sensing (Li et al., 2013). Meanwhile CPU-based computing resources structured as cluster or supercomputer is costly. In the past several years with GPU-based technology matured in both the capability and performance, GPU-based computing has emerged as a new computing paradigm. Compare to traditional computing microprocessor, the modern GPU, as a compelling alternative microprocessor, has outstanding high parallel processing capability with cost-effectiveness and efficiency(Owens et al., 2008), although it is initially designed for graphical rendering in visualization pipe. This presentation reports a distributed GPU computing framework for integrating GPU-based computing within distributed environment. Within this framework, 1) for each single computer, computing resources of both GPU-based and CPU-based can be fully utilized to improve the performance of visualizing and processing Big Data; 2) within a network environment, a variety of computers can be used to build up a virtual super computer to support CPU-based and GPU-based computing in distributed computing environment; 3) GPUs, as a specific graphic targeted device, are used to greatly improve the rendering efficiency in distributed geo-visualization, especially for 3D/4D visualization. Key words: Geovisualization, GIScience, Spatiotemporal Studies Reference : 1. Ferreira de Oliveira, M. C., & Levkowitz, H. (2003). From visual data exploration to visual data mining: A survey. Visualization and Computer Graphics, IEEE Transactions on, 9(3), 378-394. 2. Li, J., Jiang, Y., Yang, C., Huang, Q., & Rice, M. (2013). Visualizing 3D/4D Environmental Data Using Many-core Graphics Processing Units (GPUs) and Multi-core Central Processing Units (CPUs). Computers & Geosciences, 59(9), 78-89. 3. Owens, J. D., Houston, M., Luebke, D., Green, S., Stone, J. E., & Phillips, J. C. (2008). GPU computing. Proceedings of the IEEE, 96(5), 879-899.

  19. Cosmoeducação: uma proposta para o ensino de astronomia

    NASA Astrophysics Data System (ADS)

    Medeiros, L. A. L.; Jafelice, L. C.

    2003-08-01

    Entende-se por cosmoeducação o desenvolvimento vivencial da unidade homem-cosmo. Este conceito é norteado pela psicologia transpessoal, que estuda o ser humano em sua totalidade, onde suas relações ecológicas e cósmicas são de grande importância. Constata-se uma necessidade latente no ser humano moderno em resgatar uma relação holística com o Universo. Neste trabalho exploramos meios de cultivar a consciência de que o ser humano constitui parte integrante do cosmo e se relaciona com este com o objetivo de promover em si uma percepção ambiental mais ampla. Nossa hipótese de trabalho inicial foi que o ensino de conteúdos básicos em astronomia realizado através de uma abordagem holística, que incorpore práticas vivenciais correlacionadas àqueles conteúdos, pode despertar no indivíduo sua identidade cósmica. O método que utilizamos é o fenomenológico e o universo desta pesquisa é um grupo de estudantes da disciplina de Astronomia (Curso de Licenciatura em Geografia/UFRN), onde realizamos observação participante, entrevistas, depoimentos e as práticas vivenciais mencionadas. Neste caso estamos desenvolvendo e adaptando exercícios de algumas técnicas terapêuticas de psicologia transpessoal, que um de nós (LALM) tem aplicado no contexto clínico, para trabalhar aspectos cognitivos envolvidos naquele processo de conscientização cósmica. Resultados parciais claramente referendam a hipótese inicial. Um resultado a destacar é fruto de uma dinâmica de representação corporal interiorizada do eclipse lunar, envolvendo um pequeno grupo daqueles estudantes, na qual conteúdos míticos afloraram de maneira espontânea e contundente para todos, sugerindo ressonância, ou pelo menos isomorfismo, entre o macro e o microcosmo. Este e outros resultados são discutidos em detalhe neste trabalho. (PPGECNM/UFRN; PRONEX/FINEP; NUPA/USP; Temáticos/FAPESP).

  20. Prevalence and factors associated with minor psychiatric disorders in hospital housekeeping workers.

    PubMed

    Marconato, Cintia da Silva; Magnago, Ana Carolina de Souza; Magnago, Tânia Solange Bosi de Souza; Dalmolin, Graziele de Lima; Andolhe, Rafaela; Tavares, Juliana Petri

    2017-06-12

    Investigating the prevalence and factors associated with minor psychiatric disorders (MPDs) in Hospital housekeeping workers. A cross-sectional study carried out in 2013 with workers from the cleaning service of a public university hospital in Rio Grande do Sul, Brazil. Data were collected through a form containing sociodemographic, occupational, habits and health variables. The Self-Reporting Questionnaire-20 was used in order to evaluate MPDs. The study population consisted of 161 workers. The overall prevalence of suspected MPD was 29.3%. The chances of suspected MPDs were higher in workers with Effort-Reward Imbalance, those who did not have time or who occasionally had time for leisure activities, and those taking medications. The prevalence of MPDs was similar to that found in the literature for health workers. Therefore, we consider it important to include these workers in institutional programs for continuing health education. Investigar a prevalência e os fatores associados aos Distúrbios Psíquicos Menores (DPMs) em trabalhadores do Serviço Hospitalar de Limpeza. Estudo transversal, realizado em 2013, com trabalhadores do serviço de limpeza de um hospital universitário público do Rio Grande do Sul, Brasil. Os dados foram coletados por meio de um formulário contendo variáveis sociodemográficas, laborais, hábitos e saúde. Para avaliação dos DPMs utilizou-se do Self-Reporting Questionnaire-20. A população do estudo foi composta pelos 161 trabalhadores. A prevalência global para suspeição de DPM foi de 29,3%. As chances de suspeição de DPMs foram maiores nos trabalhadores em Desequilíbrio Esforço-Recompensa, nos que não tinham ou às vezes tinham tempo para o lazer e naqueles que faziam uso de medicação. A prevalência de DPMs assemelhou-se à encontrada na literatura em trabalhadores da área saúde. Portanto, considera-se importante a inclusão desses trabalhadores em programas institucionais de educação permanente em saúde.

  1. Functional health literacy and adherence to the medication in older adults: integrative review.

    PubMed

    Martins, Nidia Farias Fernandes; Abreu, Daiane Porto Gautério; Silva, Bárbara Tarouco da; Semedo, Deisa Salyse Dos Reis Cabral; Pelzer, Marlene Teda; Ienczak, Fabiana Souza

    2017-01-01

    to characterize the national and international scientific production on the relationship of Functional Health Literacy and the adherence to the medication in older adults. integrative review of literature, searching the following online databases: Scientific Electronic Library Online (SCIELO); Latin American and Caribbean Health Sciences Literature (LILACS); Medical Literature Analysis and Retrieval System Online (MEDLINE); and Cumulative Index to Nursing & Allied Health Literature (CINAHL), in June 2016. We selected 7 articles that obeyed the inclusion criteria. all articles are from the USA. The inappropriate Functional Health Literacy affects the non-adherence to medication; however, there are several strategies and interventions that can be practiced to change this relationship. nursing needs to explorefurther this theme, since it can exert a differentiated care for adherence to medication in older adults, considering the literacy. caracterizar a produção científica nacional e internacional sobre a relação do Letramento Funcional em Saúde e a adesão à medicação em idosos. revisão integrativa da literatura, com busca nas bases de dados on-line: Scientific Electronic Library Online (SCIELO); Literatura Latino-Americana e do Caribe em Ciências da Saúde (LILACS); Medical Literature Analysis and Retrieval System Online (MEDLINE); e Cumulative Index to Nursing & Allied Health Literature (CINAHL), no mês de junho de 2016. Foram selecionados 7 artigos que obedeceram aos critérios de inclusão. todos os artigos são internacionais e originários dos EUA. O Letramento Funcional em Saúde inadequado influencia para a não adesão à medicação, porém há diversas estratégias e intervenções que podem ser realizadas na prática para modificar essa relação. a enfermagem precisa explorar mais essa temática, visto que pode exercer um cuidado diferenciado para a adesão à medicação em idosos, levando em conta o letramento.

  2. Identificação de radiofontes puntiformes presentes na região observada pelo telescópio BEAST

    NASA Astrophysics Data System (ADS)

    Oliveira, M. S.; Wuensche, C. A.; Leonardi, R.; Tello, C.

    2003-08-01

    Radiofontes extragalácticas são um dos principais contaminantes nas medidas da Radiação Cósmica de Fundo (RCF) em freqüências abaixo de 200 GHz. O estudo de seu comportamento espectral permite determinar a contribuição destas fontes às anisotropias intrísincas da RCF. Um dos experimentos recentes concebidos para estudar a RCF é o BEAST (Background Emission Anisotropy Scanning Telescope), cujos primeiros resultados foram publicados em fevereiro de 2003. Nos últimos meses, geramos mapas do céu nas freqüências de 30 GHz e 41 GHz, para um total de 648 horas de observação entre julho e outubro de 2002. Identificamos 4 fontes puntiformes extragalácticas na região do céu situada entre 0h < RA < 24 h e +32° < DEC < +42°, com relação S/R > 4,3 e situadas a pelo menos 25° acima do Plano Galáctico. Suas contrapartidas em 5 GHz, segundo o catálogo GB6, são: J1613+3412, J1635+3808, J0927+3902 e J1642+3948. Estas fontes também foram identificadas pelo satélite WMAP sendo que três coincidem com as observadas pelo BEAST dentro da incerteza do feixe do telescópio e a quarta encontra-se bastante próxima (J1613+3412), embora não seja coincidente. As estimativas preliminares de fluxos obtidas para esses objetos são, respectivamente, 0,51; 0,97; 1,08 e 1,6 Jy em 41 GHz. Usando estes resultados e medidas de fluxos em outras frequências existentes na literatura, apresentamos uma estimativa dos índices espectrais destes objetos no intervalo de frequências entre 4,85 GHz e 41 GHz.

  3. The Research and Test of Fast Radio Burst Real-time Search Algorithm Based on GPU Acceleration

    NASA Astrophysics Data System (ADS)

    Wang, J.; Chen, M. Z.; Pei, X.; Wang, Z. Q.

    2017-03-01

    In order to satisfy the research needs of Nanshan 25 m radio telescope of Xinjiang Astronomical Observatory (XAO) and study the key technology of the planned QiTai radio Telescope (QTT), the receiver group of XAO studied the GPU (Graphics Processing Unit) based real-time FRB searching algorithm which developed from the original FRB searching algorithm based on CPU (Central Processing Unit), and built the FRB real-time searching system. The comparison of the GPU system and the CPU system shows that: on the basis of ensuring the accuracy of the search, the speed of the GPU accelerated algorithm is improved by 35-45 times compared with the CPU algorithm.

  4. GraphReduce: Processing Large-Scale Graphs on Accelerator-Based Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sengupta, Dipanjan; Song, Shuaiwen; Agarwal, Kapil

    2015-11-15

    Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on graphs that exceed the device’s internal memory capacity. GraphReduce adopts a combination of edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs with efficient graph data movement between the host andmore » device.« less

  5. GPU-accelerated computation of electron transfer.

    PubMed

    Höfinger, Siegfried; Acocella, Angela; Pop, Sergiu C; Narumi, Tetsu; Yasuoka, Kenji; Beu, Titus; Zerbetto, Francesco

    2012-11-05

    Electron transfer is a fundamental process that can be studied with the help of computer simulation. The underlying quantum mechanical description renders the problem a computationally intensive application. In this study, we probe the graphics processing unit (GPU) for suitability to this type of problem. Time-critical components are identified via profiling of an existing implementation and several different variants are tested involving the GPU at increasing levels of abstraction. A publicly available library supporting basic linear algebra operations on the GPU turns out to accelerate the computation approximately 50-fold with minor dependence on actual problem size. The performance gain does not compromise numerical accuracy and is of significant value for practical purposes. Copyright © 2012 Wiley Periodicals, Inc.

  6. GPU Accelerated Prognostics

    NASA Technical Reports Server (NTRS)

    Gorospe, George E., Jr.; Daigle, Matthew J.; Sankararaman, Shankar; Kulkarni, Chetan S.; Ng, Eley

    2017-01-01

    Prognostic methods enable operators and maintainers to predict the future performance for critical systems. However, these methods can be computationally expensive and may need to be performed each time new information about the system becomes available. In light of these computational requirements, we have investigated the application of graphics processing units (GPUs) as a computational platform for real-time prognostics. Recent advances in GPU technology have reduced cost and increased the computational capability of these highly parallel processing units, making them more attractive for the deployment of prognostic software. We present a survey of model-based prognostic algorithms with considerations for leveraging the parallel architecture of the GPU and a case study of GPU-accelerated battery prognostics with computational performance results.

  7. Evaluation of the performance of actions and outcomes in primary health care.

    PubMed

    Miclos, Paula Vitali; Calvo, Maria Cristina Marino; Colussi, Claudia Flemming

    2017-01-01

    The objective of this study has been to evaluate the performance of the primary care of Brazilian municipalities in relation to health actions and outcomes. This is an evaluative, cross-sectional research, with a quantitative approach, aimed at the identification of the efficiency frontier of the primary care in health actions and outcomes in Brazilian municipalities. Secondary data have been collected from the Programa Nacional de Melhoria do Acesso e da Qualidade da Atenção Básica (National Program for Improving Access and Quality of Primary Care) and the Department of Informatics of the Brazilian Unified Health System, in 2012. The data envelopment analysis tool has been used for variable returns to scale with product orientation. Municipalities have been analyzed by population size, and small municipalities have presented a high percentage of inefficiency for both models. The analysis of efficiency has indicated the existence of a higher percentage of effective municipalities in the model of health actions than in the model of health outcomes. Avaliar o desempenho da atenção básica dos municípios brasileiros quanto a ações e resultados em saúde. Pesquisa avaliativa, transversal, com abordagem quantitativa, para identificar a fronteira de eficiência da atenção básica em ações e resultados em saúde nos municípios brasileiros. Foi realizada coleta de dados secundários a partir do Programa Nacional de Melhoria do Acesso e da qualidade da Atenção Básica e do Departamento de Informática do Sistema Único de Saúde, no ano de 2012. Utilizou-se a ferramenta análise envoltória de dados para retornos variáveis de escala com orientação para produto. Os municípios foram analisados por porte populacional e verificou-se que para ambos os modelos, os municípios de pequeno porte apresentaram alto percentual de ineficiência. A análise da eficiência indicou a existência de um percentual maior de municípios eficientes no modelo de ações em saúde do que no modelo de resultados em saúde.

  8. Development of a virtual learning environment for cardiorespiratory arrest training.

    PubMed

    Silva, Anazilda Carvalho da; Bernardes, Andrea; Évora, Yolanda Dora Martinez; Dalri, Maria Célia Barcellos; Silva, Alexandre Ribeiro da; Sampaio, Camila Santana Justo Cintra

    2016-01-01

    To develop a Virtual Learning Environment (VLE) aiming at the training of nursing team workers and emergency vehicle drivers in Basic Life Support (BLS) to attend Cardiorespiratory arrest, and to evaluate the quality of its contents among specialists in the area of Emergency and Urgent care. Applied research of technological development. The methodology used was based on the Instructional Design Model (ADDIE), which structures the teaching-learning planning in different stages (analysis, design, development, implementation and evaluation). The VLE was composed of texts elaborated from bibliographic research, links, edited video from a simulation scenario in the laboratory and questions to evaluate the fixation of the content, organized in modules. After its development, it was evaluated as adequate to satisfy the needs of the target public, by eight expert judges, which was made available for electronic access. The VLE has potential as a tool for training and qualification in BLS, as it can be easily integrated with other pedagogical approaches and strategies with active methodologies. Desenvolver um Ambiente Virtual de Aprendizagem (AVA) visando à capacitação de trabalhadores da equipe de enfermagem e condutores de veículo de emergência em Suporte Básico de Vida (SBV) no atendimento à Parada Cardiorrespiratória, e avaliar a qualidade do seu conteúdo junto a especialistas na área de Urgência e Emergência. Pesquisa aplicada, de produção tecnológica. A metodologia utilizada foi baseada no Modelo de Design Instrucional (ADDIE), que estrutura o planejamento de ensino-aprendizagem em estágios distintos (analysis, design, development, implementation and evaluation). O AVA foi composto por textos elaborados a partir de pesquisa bibliográfica, links, vídeo construído a partir de um cenário de simulação em laboratório e questões para avaliar a fixação do conteúdo, organizados em módulos. Após a sua construção, foi avaliado como adequado para satisfazer às necessidades do público-alvo, por oito juízes especialistas, sendo disponibilizado para acesso eletrônico. O AVA tem potencial como ferramenta para formação e capacitação em SBV porser facilmente integrado a outras abordagens pedagógicas e estratégias com metodologias ativas.

  9. Um Breve Balanço dos Estudos em Astronomia e Educação no Brasil no Período de 2010 a 2013

    NASA Astrophysics Data System (ADS)

    Goncalves, Erica de Oliveira; Kern, C.

    2014-10-01

    No Brasil, as pesquisas em ensino de astronomia para a Educação Básica vem ganhando destaque. Posto como importante área do conhecimento para estudantes e professores, os estudos em astronomia conquistam espaços nos documentos oficiais da educação e nos currículos escolares. Diante desse cenário, fez-se, neste trabalho, um mapeamento no banco de dados da Biblioteca Digital Brasileira de Teses e Dissertações , com base nas palavras-chave "astronomia" e "educação" no período de 2010 a 2013. Para compor o que aqui denominamos de balanço da área de estudo, foram selecionados trabalhos e analisados os títulos, os resumos, as considerações finais e as referências, bem como identificamos as fontes epistemológicas correntes nas pesquisas de pós-graduação no período supracitado. Identificou-se, na maior parte dos trabalhos pesquisados, referenciais teóricos relacionados & agrave; área de física, ciências e astronomia que envolvem discussões sobre currículo e práticas pedagógicas vinculados ao ensino de astronomia no ensino fundamental e médio da Educação Bãsica e nos cursos de formação de professores.

  10. Tendências de teses e dissertações sobre ensino de astronomia no Brasil

    NASA Astrophysics Data System (ADS)

    Bretones, P. S.; Megid Neto, J.

    2003-08-01

    Neste trabalho são apresentados os resultados de uma pesquisa do tipo estado da arte sobre teses e dissertações defendidas no Brasil e relativas ao ensino de Astronomia. Teve por objetivo identificar essa produção e conhecer as principais tendências da pesquisa nesse campo. O procedimento inicial consistiu de um levantamento bibliográfico junto ao Centro de Documentação em Ensino de Ciências (CEDOC) da Faculdade de Educação da UNICAMP e ao Banco de Teses da CAPES disponível na Internet. Foram localizadas 13 dissertações de mestrado e 3 teses de doutorado, as quais foram estudadas em função dos seguintes aspectos: instituição, ano de defesa, nível escolar abrangido no estudo, foco temático do estudo e gênero de trabalho acadêmico. Deste conjunto de pesquisas, 13 (81,3%) delas foram defendidas a partir da segunda metade dos anos 90, indicando uma preocupação mais recente com temas relativos ao ensino de Astronomia no conjunto da produção acadêmica em programas de pós-graduação no Brasil. Verificou-se que 43,7% dos trabalhos foram produzidas na USP e 18,8% na UNICAMP. Quanto ao nível escolar abrangido nos estudos, predominaram os estudos direcionados ao Ensino Fundamental de 5a a 8a séries (62,5%). No que diz respeito ao foco temático das pesquisas, as principais tendências voltaram-se: 56,3% para Conteúdo e Método; 43,8% para Concepções do Professor; 37,5% para Currículo e Programas; 37,5% para Recursos Didáticos. Quanto ao gênero de trabalho acadêmico, verificou-se que 43,8% são de Pesquisa Experimental e 31,3% de Pesquisa de Análise de Conteúdo. Estudos de revisão bibliográfica como este visam colaborar com a divulgação ampla da produção acadêmica em determinada área, traçando algumas de suas tendências. Ao mesmo tempo possibilita, a partir de investigações decorrentes, apontar as suas contribuições para o ensino e sinalizar com necessidades a serem supridas por futuras pesquisas.

  11. Optimizing a mobile robot control system using GPU acceleration

    NASA Astrophysics Data System (ADS)

    Tuck, Nat; McGuinness, Michael; Martin, Fred

    2012-01-01

    This paper describes our attempt to optimize a robot control program for the Intelligent Ground Vehicle Competition (IGVC) by running computationally intensive portions of the system on a commodity graphics processing unit (GPU). The IGVC Autonomous Challenge requires a control program that performs a number of different computationally intensive tasks ranging from computer vision to path planning. For the 2011 competition our Robot Operating System (ROS) based control system would not run comfortably on the multicore CPU on our custom robot platform. The process of profiling the ROS control program and selecting appropriate modules for porting to run on a GPU is described. A GPU-targeting compiler, Bacon, is used to speed up development and help optimize the ported modules. The impact of the ported modules on overall performance is discussed. We conclude that GPU optimization can free a significant amount of CPU resources with minimal effort for expensive user-written code, but that replacing heavily-optimized library functions is more difficult, and a much less efficient use of time.

  12. Transportable GPU (General Processor Units) chip set technology for standard computer architectures

    NASA Astrophysics Data System (ADS)

    Fosdick, R. E.; Denison, H. C.

    1982-11-01

    The USAFR-developed GPU Chip Set has been utilized by Tracor to implement both USAF and Navy Standard 16-Bit Airborne Computer Architectures. Both configurations are currently being delivered into DOD full-scale development programs. Leadless Hermetic Chip Carrier packaging has facilitated implementation of both architectures on single 41/2 x 5 substrates. The CMOS and CMOS/SOS implementations of the GPU Chip Set have allowed both CPU implementations to use less than 3 watts of power each. Recent efforts by Tracor for USAF have included the definition of a next-generation GPU Chip Set that will retain the application-proven architecture of the current chip set while offering the added cost advantages of transportability across ISO-CMOS and CMOS/SOS processes and across numerous semiconductor manufacturers using a newly-defined set of common design rules. The Enhanced GPU Chip Set will increase speed by an approximate factor of 3 while significantly reducing chip counts and costs of standard CPU implementations.

  13. A survey of GPU-based medical image computing techniques

    PubMed Central

    Shi, Lin; Liu, Wen; Zhang, Heye; Xie, Yongming

    2012-01-01

    Medical imaging currently plays a crucial role throughout the entire clinical applications from medical scientific research to diagnostics and treatment planning. However, medical imaging procedures are often computationally demanding due to the large three-dimensional (3D) medical datasets to process in practical clinical applications. With the rapidly enhancing performances of graphics processors, improved programming support, and excellent price-to-performance ratio, the graphics processing unit (GPU) has emerged as a competitive parallel computing platform for computationally expensive and demanding tasks in a wide range of medical image applications. The major purpose of this survey is to provide a comprehensive reference source for the starters or researchers involved in GPU-based medical image processing. Within this survey, the continuous advancement of GPU computing is reviewed and the existing traditional applications in three areas of medical image processing, namely, segmentation, registration and visualization, are surveyed. The potential advantages and associated challenges of current GPU-based medical imaging are also discussed to inspire future applications in medicine. PMID:23256080

  14. Accelerating image reconstruction in dual-head PET system by GPU and symmetry properties.

    PubMed

    Chou, Cheng-Ying; Dong, Yun; Hung, Yukai; Kao, Yu-Jiun; Wang, Weichung; Kao, Chien-Min; Chen, Chin-Tu

    2012-01-01

    Positron emission tomography (PET) is an important imaging modality in both clinical usage and research studies. We have developed a compact high-sensitivity PET system that consisted of two large-area panel PET detector heads, which produce more than 224 million lines of response and thus request dramatic computational demands. In this work, we employed a state-of-the-art graphics processing unit (GPU), NVIDIA Tesla C2070, to yield an efficient reconstruction process. Our approaches ingeniously integrate the distinguished features of the symmetry properties of the imaging system and GPU architectures, including block/warp/thread assignments and effective memory usage, to accelerate the computations for ordered subset expectation maximization (OSEM) image reconstruction. The OSEM reconstruction algorithms were implemented employing both CPU-based and GPU-based codes, and their computational performance was quantitatively analyzed and compared. The results showed that the GPU-accelerated scheme can drastically reduce the reconstruction time and thus can largely expand the applicability of the dual-head PET system.

  15. The GPU implementation of micro - Doppler period estimation

    NASA Astrophysics Data System (ADS)

    Yang, Liyuan; Wang, Junling; Bi, Ran

    2018-03-01

    Aiming at the problem that the computational complexity and the deficiency of real-time of the wideband radar echo signal, a program is designed to improve the performance of real-time extraction of micro-motion feature in this paper based on the CPU-GPU heterogeneous parallel structure. Firstly, we discuss the principle of the micro-Doppler effect generated by the rolling of the scattering points on the orbiting satellite, analyses how to use Kalman filter to compensate the translational motion of tumbling satellite and how to use the joint time-frequency analysis and inverse Radon transform to extract the micro-motion features from the echo after compensation. Secondly, the advantages of GPU in terms of real-time processing and the working principle of CPU-GPU heterogeneous parallelism are analysed, and a program flow based on GPU to extract the micro-motion feature from the radar echo signal of rolling satellite is designed. At the end of the article the results of extraction are given to verify the correctness of the program and algorithm.

  16. GPU computing with Kaczmarz’s and other iterative algorithms for linear systems

    PubMed Central

    Elble, Joseph M.; Sahinidis, Nikolaos V.; Vouzis, Panagiotis

    2009-01-01

    The graphics processing unit (GPU) is used to solve large linear systems derived from partial differential equations. The differential equations studied are strongly convection-dominated, of various sizes, and common to many fields, including computational fluid dynamics, heat transfer, and structural mechanics. The paper presents comparisons between GPU and CPU implementations of several well-known iterative methods, including Kaczmarz’s, Cimmino’s, component averaging, conjugate gradient normal residual (CGNR), symmetric successive overrelaxation-preconditioned conjugate gradient, and conjugate-gradient-accelerated component-averaged row projections (CARP-CG). Computations are preformed with dense as well as general banded systems. The results demonstrate that our GPU implementation outperforms CPU implementations of these algorithms, as well as previously studied parallel implementations on Linux clusters and shared memory systems. While the CGNR method had begun to fall out of favor for solving such problems, for the problems studied in this paper, the CGNR method implemented on the GPU performed better than the other methods, including a cluster implementation of the CARP-CG method. PMID:20526446

  17. Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

    NASA Technical Reports Server (NTRS)

    Duffy, Austen C.; Hammond, Dana P.; Nielsen, Eric J.

    2012-01-01

    In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the NASA FUN3D code to be accelerated in parallel with up to four processor cores sharing a single GPU. For codes to scale and fully use resources on these and the next generation machines, codes will need to employ some type of GPU sharing model, as presented in this work. Findings include the effects of GPU sharing on overall performance. A discussion of the inherent challenges that parallel unstructured CFD codes face in accelerator-based computing environments is included, with considerations for future generation architectures. This work was completed by the author in August 2010, and reflects the analysis and results of the time.

  18. Learning of Unknown Environments in Goal-Directed Guidance and Navigation Tasks: Autonomous Systems and Humans

    NASA Astrophysics Data System (ADS)

    Vidal, Joao Vasco Silvestres

    Este trabalho expoe um estudo teorico e experimental das propriedades anisotropicas magnetoeletricas (ME) em diferentes compositos contendo monocristais piezoeletricos (PE), maioritariamente sem chumbo na sua composicao, com vista a diversas aplicacoes multifuncionais. Uma descricao linear do efeito ME em termos de campos eletricos, magneticos e elasticos e constantes materiais e apresentada. Um modelo fenomenologico quasi-estatico e usado para ilustrar a relacao entre as constantes materiais, sua anisotropia e os coeficientes MEs transversais de tensao e carga. Subsequentemente, este modelo e empregue para estimar o maximo coeficiente ME direto de tensao expectavel numa serie de compositos tri-camadas de Metglas/Piezocristal/Metglas em funcao da orientacao do cristal PE. Demonstra-se assim como os efeitos MEs sao fortemente dependentes da orientacao cristalina, o que suporta a possibilidade de se gerarem coeficientes MEs de tensao elevados em compositos contendo monocristais PEs sem chumbo como o niobato de litio (LiNbO3; LNO), tantalato de litio (LiTaO3), ortofosfato de galio (GaPO4; GPO), quartzo (SiO2), langatato (La3Ga5.5Ta0.5O14) e langasite (La3Ga5SiO14) atraves da otimizacao da orientacao cristalina. Uma tecnica experimental dinâmica de lock-in para a medicao da impedância e efeito ME direto e exposta. O formalismo descritivo desta tecnica, assim como um arranjo experimental desenvolvido para o efeito sao apresentados. O esquema e caracteristicas deste, assim como diferentes formas de reduzir o ruido e a indesejavel inducao mutua sao exploradas. Um estudo comparativo do efeito ME direto em compositos tri-camadas de Metglas e monocristais de LNO e PMN-PT conectados de forma simples e exposto. Embora o PMN-PT possua piezocoeficientes de carga muito superiores aos do LNO, o coeficiente ME direto de tensao demonstrou-se comparavel entre ambos os compositos devido a uma muito menor permitividade dieletrica do LNO. Calculos teoricos indicam ainda que as propriedades MEs poderao ser significativamente melhoradas (ate 500 V/(cm.Oe)) atraves da otimizacao do ângulo de corte do LNO, espessura relativa entre camadas ferroeletrica/ferromagnetica e uma melhor colagem entre o Metglas e o LNO. Vantagens da utilizacao do material ferroeletrico LNO em compositos MEs sao discutidas. Num estudo subsequente, as propriedades dinâmicas anisotropicas de impedância e MEs em compositos tri-camadas de Metglas e monocristais PEs sem chumbo de LNO e GPO sao exploradas. Medicoes foram realizadas em funcao do corte de cristal, magnitude e orientacao do campo magnetico de polarizacao e frequencia do campo de modulacao. Coeficientes MEs altamente intensos em certos modos de ressonância sao explorados, e a sua relacao com as propriedades materiais dos cristais e geometria dos compositos e investigada. Um coeficiente ME de ate 249 V/(cm.Oe) foi aqui observado num composito com um cristal de LNO com corte 41ºY a 323.1 kHz. Mostramos assim que compositos multicamadas contendo cristais sem chumbo de LNO e GPO podem exibir efeitos MEs anisotropicos relativamente elevados. Demonstramos tambem que o controlo da orientacao dos cristais PEs pode em principio ser usado na obtencao de propriedades MEs anisotropicas desejaveis para qualquer aplicacao. Caracteristicas unicas como elevada estabilidade quimica, piezoeletricidade linear e robusteza termica abrem verdadeiras perspetivas para a utilizacao de compositos baseados no LNO e GPO em diversas aplicacoes. Eventualmente, compositos bi-camadas contendo lâminas PEs com bidominios de LNO com corte 127ºY foram estudados tanto teoricamente como experimentalmente. Estas lâminas de LNO possuem uma estrutura de bidominios com vetores de polarizacao espontânea opostos ao longo da direcao da sua espessura (i.e. uma estrutura de macrodominios ferroeletricos "head-to-head" ou "tail-to-tail") Medicoes de impedância, efeito ME e densidade de ruido magnetico equivalente foram realizadas nos compositos operando sob condicoes quasi-estaticas e de ressonância. Coeficientes MEs de ate 578 V/(cm.Oe) foram obtidos a ca. 30 kHz sob ressonâncias de dobramento usando cristais PEs com 0.5 mm de espessura. Medicoes de densidade de ruido magnetico equivalente demosntraram valores de ate 153 pT/Hz1/2 a 1 kHz (modo quasi-estatico) e 524 fT/Hz1/2 sob condicoes de ressonância. E de esperar que uma otimizacao adicional das tecnicas de fabrico, geometria dos compositos e circuitos de detencao possa permitir reduzir estes valores ate pelo menos 10 pT/Hz1/2 e 250 fT/Hz1/2, respetivamente, e a frequencia de ressonância em pelo menos duas ordens de grandeza. Estes sistemas poderao assim no futuro ser usados em sensores vetoriais de campo magnetico simples e sensiveis, passivos e estaveis e operaveis a elevadas temperaturas. None

  19. Fast MPEG-CDVS Encoder With GPU-CPU Hybrid Computing.

    PubMed

    Duan, Ling-Yu; Sun, Wei; Zhang, Xinfeng; Wang, Shiqi; Chen, Jie; Yin, Jianxiong; See, Simon; Huang, Tiejun; Kot, Alex C; Gao, Wen

    2018-05-01

    The compact descriptors for visual search (CDVS) standard from ISO/IEC moving pictures experts group has succeeded in enabling the interoperability for efficient and effective image retrieval by standardizing the bitstream syntax of compact feature descriptors. However, the intensive computation of a CDVS encoder unfortunately hinders its widely deployment in industry for large-scale visual search. In this paper, we revisit the merits of low complexity design of CDVS core techniques and present a very fast CDVS encoder by leveraging the massive parallel execution resources of graphics processing unit (GPU). We elegantly shift the computation-intensive and parallel-friendly modules to the state-of-the-arts GPU platforms, in which the thread block allocation as well as the memory access mechanism are jointly optimized to eliminate performance loss. In addition, those operations with heavy data dependence are allocated to CPU for resolving the extra but non-necessary computation burden for GPU. Furthermore, we have demonstrated the proposed fast CDVS encoder can work well with those convolution neural network approaches which enables to leverage the advantages of GPU platforms harmoniously, and yield significant performance improvements. Comprehensive experimental results over benchmarks are evaluated, which has shown that the fast CDVS encoder using GPU-CPU hybrid computing is promising for scalable visual search.

  20. A GPU-Accelerated Approach for Feature Tracking in Time-Varying Imagery Datasets.

    PubMed

    Peng, Chao; Sahani, Sandip; Rushing, John

    2017-10-01

    We propose a novel parallel connected component labeling (CCL) algorithm along with efficient out-of-core data management to detect and track feature regions of large time-varying imagery datasets. Our approach contributes to the big data field with parallel algorithms tailored for GPU architectures. We remove the data dependency between frames and achieve pixel-level parallelism. Due to the large size, the entire dataset cannot fit into cached memory. Frames have to be streamed through the memory hierarchy (disk to CPU main memory and then to GPU memory), partitioned, and processed as batches, where each batch is small enough to fit into the GPU. To reconnect the feature regions that are separated due to data partitioning, we present a novel batch merging algorithm to extract the region connection information across multiple batches in a parallel fashion. The information is organized in a memory-efficient structure and supports fast indexing on the GPU. Our experiment uses a commodity workstation equipped with a single GPU. The results show that our approach can efficiently process a weather dataset composed of terabytes of time-varying radar images. The advantages of our approach are demonstrated by comparing to the performance of an efficient CPU cluster implementation which is being used by the weather scientists.

  1. A CFD Heterogeneous Parallel Solver Based on Collaborating CPU and GPU

    NASA Astrophysics Data System (ADS)

    Lai, Jianqi; Tian, Zhengyu; Li, Hua; Pan, Sha

    2018-03-01

    Since Graphic Processing Unit (GPU) has a strong ability of floating-point computation and memory bandwidth for data parallelism, it has been widely used in the areas of common computing such as molecular dynamics (MD), computational fluid dynamics (CFD) and so on. The emergence of compute unified device architecture (CUDA), which reduces the complexity of compiling program, brings the great opportunities to CFD. There are three different modes for parallel solution of NS equations: parallel solver based on CPU, parallel solver based on GPU and heterogeneous parallel solver based on collaborating CPU and GPU. As we can see, GPUs are relatively rich in compute capacity but poor in memory capacity and the CPUs do the opposite. We need to make full use of the GPUs and CPUs, so a CFD heterogeneous parallel solver based on collaborating CPU and GPU has been established. Three cases are presented to analyse the solver’s computational accuracy and heterogeneous parallel efficiency. The numerical results agree well with experiment results, which demonstrate that the heterogeneous parallel solver has high computational precision. The speedup on a single GPU is more than 40 for laminar flow, it decreases for turbulent flow, but it still can reach more than 20. What’s more, the speedup increases as the grid size becomes larger.

  2. Accelerating large-scale simulation of seismic wave propagation by multi-GPUs and three-dimensional domain decomposition

    NASA Astrophysics Data System (ADS)

    Okamoto, Taro; Takenaka, Hiroshi; Nakamura, Takeshi; Aoki, Takayuki

    2010-12-01

    We adopted the GPU (graphics processing unit) to accelerate the large-scale finite-difference simulation of seismic wave propagation. The simulation can benefit from the high-memory bandwidth of GPU because it is a "memory intensive" problem. In a single-GPU case we achieved a performance of about 56 GFlops, which was about 45-fold faster than that achieved by a single core of the host central processing unit (CPU). We confirmed that the optimized use of fast shared memory and registers were essential for performance. In the multi-GPU case with three-dimensional domain decomposition, the non-contiguous memory alignment in the ghost zones was found to impose quite long time in data transfer between GPU and the host node. This problem was solved by using contiguous memory buffers for ghost zones. We achieved a performance of about 2.2 TFlops by using 120 GPUs and 330 GB of total memory: nearly (or more than) 2200 cores of host CPUs would be required to achieve the same performance. The weak scaling was nearly proportional to the number of GPUs. We therefore conclude that GPU computing for large-scale simulation of seismic wave propagation is a promising approach as a faster simulation is possible with reduced computational resources compared to CPUs.

  3. Toward GPGPU accelerated human electromechanical cardiac simulations

    PubMed Central

    Vigueras, Guillermo; Roy, Ishani; Cookson, Andrew; Lee, Jack; Smith, Nicolas; Nordsletten, David

    2014-01-01

    In this paper, we look at the acceleration of weakly coupled electromechanics using the graphics processing unit (GPU). Specifically, we port to the GPU a number of components of Heart—a CPU-based finite element code developed for simulating multi-physics problems. On the basis of a criterion of computational cost, we implemented on the GPU the ODE and PDE solution steps for the electrophysiology problem and the Jacobian and residual evaluation for the mechanics problem. Performance of the GPU implementation is then compared with single core CPU (SC) execution as well as multi-core CPU (MC) computations with equivalent theoretical performance. Results show that for a human scale left ventricle mesh, GPU acceleration of the electrophysiology problem provided speedups of 164 × compared with SC and 5.5 times compared with MC for the solution of the ODE model. Speedup of up to 72 × compared with SC and 2.6 × compared with MC was also observed for the PDE solve. Using the same human geometry, the GPU implementation of mechanics residual/Jacobian computation provided speedups of up to 44 × compared with SC and 2.0 × compared with MC. © 2013 The Authors. International Journal for Numerical Methods in Biomedical Engineering published by John Wiley & Sons, Ltd. PMID:24115492

  4. Accelerating EPI distortion correction by utilizing a modern GPU-based parallel computation.

    PubMed

    Yang, Yao-Hao; Huang, Teng-Yi; Wang, Fu-Nien; Chuang, Tzu-Chao; Chen, Nan-Kuei

    2013-04-01

    The combination of phase demodulation and field mapping is a practical method to correct echo planar imaging (EPI) geometric distortion. However, since phase dispersion accumulates in each phase-encoding step, the calculation complexity of phase modulation is Ny-fold higher than conventional image reconstructions. Thus, correcting EPI images via phase demodulation is generally a time-consuming task. Parallel computing by employing general-purpose calculations on graphics processing units (GPU) can accelerate scientific computing if the algorithm is parallelized. This study proposes a method that incorporates the GPU-based technique into phase demodulation calculations to reduce computation time. The proposed parallel algorithm was applied to a PROPELLER-EPI diffusion tensor data set. The GPU-based phase demodulation method reduced the EPI distortion correctly, and accelerated the computation. The total reconstruction time of the 16-slice PROPELLER-EPI diffusion tensor images with matrix size of 128 × 128 was reduced from 1,754 seconds to 101 seconds by utilizing the parallelized 4-GPU program. GPU computing is a promising method to accelerate EPI geometric correction. The resulting reduction in computation time of phase demodulation should accelerate postprocessing for studies performed with EPI, and should effectuate the PROPELLER-EPI technique for clinical practice. Copyright © 2011 by the American Society of Neuroimaging.

  5. FastGCN: A GPU Accelerated Tool for Fast Gene Co-Expression Networks

    PubMed Central

    Liang, Meimei; Zhang, Futao; Jin, Gulei; Zhu, Jun

    2015-01-01

    Gene co-expression networks comprise one type of valuable biological networks. Many methods and tools have been published to construct gene co-expression networks; however, most of these tools and methods are inconvenient and time consuming for large datasets. We have developed a user-friendly, accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit) architectures. Genetic entropies were exploited to filter out genes with no or small expression changes in the raw data preprocessing step. Pearson correlation coefficients were then calculated. After that, we normalized these coefficients and employed the False Discovery Rate to control the multiple tests. At last, modules identification was conducted to construct the co-expression networks. All of these calculations were implemented on a GPU. We also compressed the coefficient matrix to save space. We compared the performance of the GPU implementation with those of multi-core CPU implementations with 16 CPU threads, single-thread C/C++ implementation and single-thread R implementation. Our results show that GPU implementation largely outperforms single-thread C/C++ implementation and single-thread R implementation, and GPU implementation outperforms multi-core CPU implementation when the number of genes increases. With the test dataset containing 16,000 genes and 590 individuals, we can achieve greater than 63 times the speed using a GPU implementation compared with a single-thread R implementation when 50 percent of genes were filtered out and about 80 times the speed when no genes were filtered out. PMID:25602758

  6. FastGCN: a GPU accelerated tool for fast gene co-expression networks.

    PubMed

    Liang, Meimei; Zhang, Futao; Jin, Gulei; Zhu, Jun

    2015-01-01

    Gene co-expression networks comprise one type of valuable biological networks. Many methods and tools have been published to construct gene co-expression networks; however, most of these tools and methods are inconvenient and time consuming for large datasets. We have developed a user-friendly, accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit) architectures. Genetic entropies were exploited to filter out genes with no or small expression changes in the raw data preprocessing step. Pearson correlation coefficients were then calculated. After that, we normalized these coefficients and employed the False Discovery Rate to control the multiple tests. At last, modules identification was conducted to construct the co-expression networks. All of these calculations were implemented on a GPU. We also compressed the coefficient matrix to save space. We compared the performance of the GPU implementation with those of multi-core CPU implementations with 16 CPU threads, single-thread C/C++ implementation and single-thread R implementation. Our results show that GPU implementation largely outperforms single-thread C/C++ implementation and single-thread R implementation, and GPU implementation outperforms multi-core CPU implementation when the number of genes increases. With the test dataset containing 16,000 genes and 590 individuals, we can achieve greater than 63 times the speed using a GPU implementation compared with a single-thread R implementation when 50 percent of genes were filtered out and about 80 times the speed when no genes were filtered out.

  7. Inauguração do Telescópio SOAR

    NASA Astrophysics Data System (ADS)

    Steiner, João

    2004-04-01

    A comunidade astronômica brasileira de há muito almeja ter a sua disposição um instrumento científico com o qual possa fazer pesquisa de vanguarda e manter a competitividade científica a nível internacional. Hoje este sonho se torna uma realidade. O Brasil tem tido uma política de pesquisa e de pós-graduação bem sucedida. Estamos formando 7000 doutores por ano e produzimos 1,5% da ciência mundial. Nosso desafio, hoje, é associar a esta capacidade de gerar conhecimento também a capacidade de usar o conhecimento em beneficio da sociedade. A Astronomia não é exceção. Temos 7 programas de pós-graduação em nível de doutorado e 11 em nível de mestrado. O telescópio SOAR será o principal instrumento que sustentará estes programas nas próximas décadas. A inauguração do telescópio SOAR simboliza de forma concreta e decidida o apoio do MCT, do CNPq e da FAPESP para o financiamento à pesquisa básica em nosso país. O Laboratório Nacional de Astrofísica, criado a cerca de 20 anos pelo CNPq, a par do Laboratório Nacional de Luz Sincrotron, são até hoje, os únicos laboratórios nacionais do Brasil e ambos voltados basicamente ao avanço do conhecimento. Os vinte anos de existência do LNA foram decisivos para a estruturação da comunidade astronômica no Brasil e para a construção das parcerias como o SOAR.

  8. On Ensino de Astronomia: Desafios para Implantação

    NASA Astrophysics Data System (ADS)

    Faria, R. Z.; Voelzke, M. R.

    2008-09-01

    Em 2002 o ensino de Astronomia foi proposto como um dos temas estruturadores pelos Parâmetros Curriculares Nacionais e sugerido como facilitador para que o aluno compreendesse a Física como construção humana e parte do seu mundo vivencial, mas raramente seus conceitos foram ensinados. A presente pesquisa discute dois aspectos relacionados à abordagem de Astronomia. O primeiro aspecto é se ela está sendo abordada pelos professores do Ensino Médio e o segundo, aborda a maneira como ela está sendo ensinada. Optou-se pela aplicação de um questionário a partir do 2° semestre de 2006 e durante o ano de 2007 com professores que ministram a disciplina de Física, os quais trabalham em escolas estaduais em Rio Grande da Serra, Ribeirão Pires e Mauá no estado São Paulo. Dos 66,2% dos professores que responderam ao questionário nos municípios de Rio Grande da Serra, Ribeirão Pires e Mauá, 57,4% não aplicaram nenhum tópico de astronomia, 70,2% não utilizaram laboratório, 89,4% não utilizaram qualquer tipo de programa computacional, 83,0% nunca fizeram visitas com alunos a museus e planetários e 38,3% não indicaram qualquer tipo de livro ou revista referente à astronomia aos seus alunos. Mesmo considerando a Astronomia um conteúdo potencialmente significativo, esta não fez parte dos planejamentos escolares. Portanto são necessárias propostas que visem estratégias para a educação continuada dos professores como, por exemplo, cursos específicos sobre o ensino em Astronomia.

  9. Advantages of GPU technology in DFT calculations of intercalated graphene

    NASA Astrophysics Data System (ADS)

    Pešić, J.; Gajić, R.

    2014-09-01

    Over the past few years, the expansion of general-purpose graphic-processing unit (GPGPU) technology has had a great impact on computational science. GPGPU is the utilization of a graphics-processing unit (GPU) to perform calculations in applications usually handled by the central processing unit (CPU). Use of GPGPUs as a way to increase computational power in the material sciences has significantly decreased computational costs in already highly demanding calculations. A level of the acceleration and parallelization depends on the problem itself. Some problems can benefit from GPU acceleration and parallelization, such as the finite-difference time-domain algorithm (FTDT) and density-functional theory (DFT), while others cannot take advantage of these modern technologies. A number of GPU-supported applications had emerged in the past several years (www.nvidia.com/object/gpu-applications.html). Quantum Espresso (QE) is reported as an integrated suite of open source computer codes for electronic-structure calculations and materials modeling at the nano-scale. It is based on DFT, the use of a plane-waves basis and a pseudopotential approach. Since the QE 5.0 version, it has been implemented as a plug-in component for standard QE packages that allows exploiting the capabilities of Nvidia GPU graphic cards (www.qe-forge.org/gf/proj). In this study, we have examined the impact of the usage of GPU acceleration and parallelization on the numerical performance of DFT calculations. Graphene has been attracting attention worldwide and has already shown some remarkable properties. We have studied an intercalated graphene, using the QE package PHonon, which employs GPU. The term ‘intercalation’ refers to a process whereby foreign adatoms are inserted onto a graphene lattice. In addition, by intercalating different atoms between graphene layers, it is possible to tune their physical properties. Our experiments have shown there are benefits from using GPUs, and we reached an acceleration of several times compared to standard CPU calculations.

  10. Evaluating the Usefulness of a Novel 10B-Carrier Conjugated With Cyclic RGD Peptide in Boron Neutron Capture Therapy

    PubMed Central

    Masunaga, Shin-ichiro; Kimura, Sadaaki; Harada, Tomohiro; Okuda, Kensuke; Sakurai, Yoshinori; Tanaka, Hiroki; Suzuki, Minoru; Kondo, Natsuko; Maruhashi, Akira; Nagasawa, Hideko; Ono, Koji

    2012-01-01

    Background To evaluate the usefulness of a novel 10B-carrier conjugated with an integrin-binding cyclic RGD peptide (GPU-201) in boron neutron capture therapy (BNCT). Methods GPU-201 was synthesized from integrin-binding Arg-Gly-Asp (RGD) consensus sequence of matrix proteins and a 10B cluster 1, 2-dicarba-closo-dodecaborane-10B. Mercaptododecaborate-10B (BSH) dissolved in physiological saline and BSH and GPU-201 dissolved with cyclodextrin (CD) as a solubilizing and dispersing agent were intraperitoneally administered to SCC VII tumor-bearing mice. Then, the 10B concentrations in the tumors and normal tissues were measured by γ-ray spectrometry. Meanwhile, tumor-bearing mice were continuously given 5-bromo-2’-deoxyuridine (BrdU) to label all proliferating (P) cells in the tumors, then treated with GPU-201, BSH-CD, or BSH. Immediately after reactor neutron beam or γ-ray irradiation, during which intratumor 10B concentrations were kept at levels similar to each other, cells from some tumors were isolated and incubated with a cytokinesis blocker. The responses of the Q and total (= P + Q) cell populations were assessed based on the frequency of micronuclei using immunofluorescence staining for BrdU. Results The 10B from BSH was washed away rapidly in all these tissues and the retention of 10B from BSH-CD and GPU-201 was similar except in blood where the 10B concentration from GPU-201 was higher for longer. GPU-201 showed a significantly stronger radio-sensitizing effect under neutron beam irradiation on both total and Q cell populations than any other 10B-carrier. Conclusion A novel 10B-carrier conjugated with an integrin-binding RGD peptide (GPU-201) that sensitized tumor cells more markedly than conventional 10B-carriers may be a promising candidate for use in BNCT. However, its toxicity needs to be tested further. PMID:29147290

  11. High performance MRI simulations of motion on multi-GPU systems.

    PubMed

    Xanthis, Christos G; Venetis, Ioannis E; Aletras, Anthony H

    2014-07-04

    MRI physics simulators have been developed in the past for optimizing imaging protocols and for training purposes. However, these simulators have only addressed motion within a limited scope. The purpose of this study was the incorporation of realistic motion, such as cardiac motion, respiratory motion and flow, within MRI simulations in a high performance multi-GPU environment. Three different motion models were introduced in the Magnetic Resonance Imaging SIMULator (MRISIMUL) of this study: cardiac motion, respiratory motion and flow. Simulation of a simple Gradient Echo pulse sequence and a CINE pulse sequence on the corresponding anatomical model was performed. Myocardial tagging was also investigated. In pulse sequence design, software crushers were introduced to accommodate the long execution times in order to avoid spurious echoes formation. The displacement of the anatomical model isochromats was calculated within the Graphics Processing Unit (GPU) kernel for every timestep of the pulse sequence. Experiments that would allow simulation of custom anatomical and motion models were also performed. Last, simulations of motion with MRISIMUL on single-node and multi-node multi-GPU systems were examined. Gradient Echo and CINE images of the three motion models were produced and motion-related artifacts were demonstrated. The temporal evolution of the contractility of the heart was presented through the application of myocardial tagging. Better simulation performance and image quality were presented through the introduction of software crushers without the need to further increase the computational load and GPU resources. Last, MRISIMUL demonstrated an almost linear scalable performance with the increasing number of available GPU cards, in both single-node and multi-node multi-GPU computer systems. MRISIMUL is the first MR physics simulator to have implemented motion with a 3D large computational load on a single computer multi-GPU configuration. The incorporation of realistic motion models, such as cardiac motion, respiratory motion and flow may benefit the design and optimization of existing or new MR pulse sequences, protocols and algorithms, which examine motion related MR applications.

  12. Fast GPU-based computation of spatial multigrid multiframe LMEM for PET.

    PubMed

    Nassiri, Moulay Ali; Carrier, Jean-François; Després, Philippe

    2015-09-01

    Significant efforts were invested during the last decade to accelerate PET list-mode reconstructions, notably with GPU devices. However, the computation time per event is still relatively long, and the list-mode efficiency on the GPU is well below the histogram-mode efficiency. Since list-mode data are not arranged in any regular pattern, costly accesses to the GPU global memory can hardly be optimized and geometrical symmetries cannot be used. To overcome obstacles that limit the acceleration of reconstruction from list-mode on the GPU, a multigrid and multiframe approach of an expectation-maximization algorithm was developed. The reconstruction process is started during data acquisition, and calculations are executed concurrently on the GPU and the CPU, while the system matrix is computed on-the-fly. A new convergence criterion also was introduced, which is computationally more efficient on the GPU. The implementation was tested on a Tesla C2050 GPU device for a Gemini GXL PET system geometry. The results show that the proposed algorithm (multigrid and multiframe list-mode expectation-maximization, MGMF-LMEM) converges to the same solution as the LMEM algorithm more than three times faster. The execution time of the MGMF-LMEM algorithm was 1.1 s per million of events on the Tesla C2050 hardware used, for a reconstructed space of 188 x 188 x 57 voxels of 2 x 2 x 3.15 mm3. For 17- and 22-mm simulated hot lesions, the MGMF-LMEM algorithm led on the first iteration to contrast recovery coefficients (CRC) of more than 75 % of the maximum CRC while achieving a minimum in the relative mean square error. Therefore, the MGMF-LMEM algorithm can be used as a one-pass method to perform real-time reconstructions for low-count acquisitions, as in list-mode gated studies. The computation time for one iteration and 60 millions of events was approximately 66 s.

  13. GPU-based Parallel Application Design for Emerging Mobile Devices

    NASA Astrophysics Data System (ADS)

    Gupta, Kshitij

    A revolution is underway in the computing world that is causing a fundamental paradigm shift in device capabilities and form-factor, with a move from well-established legacy desktop/laptop computers to mobile devices in varying sizes and shapes. Amongst all the tasks these devices must support, graphics has emerged as the 'killer app' for providing a fluid user interface and high-fidelity game rendering, effectively making the graphics processor (GPU) one of the key components in (present and future) mobile systems. By utilizing the GPU as a general-purpose parallel processor, this dissertation explores the GPU computing design space from an applications standpoint, in the mobile context, by focusing on key challenges presented by these devices---limited compute, memory bandwidth, and stringent power consumption requirements---while improving the overall application efficiency of the increasingly important speech recognition workload for mobile user interaction. We broadly partition trends in GPU computing into four major categories. We analyze hardware and programming model limitations in current-generation GPUs and detail an alternate programming style called Persistent Threads, identify four use case patterns, and propose minimal modifications that would be required for extending native support. We show how by manually extracting data locality and altering the speech recognition pipeline, we are able to achieve significant savings in memory bandwidth while simultaneously reducing the compute burden on GPU-like parallel processors. As we foresee GPU computing to evolve from its current 'co-processor' model into an independent 'applications processor' that is capable of executing complex work independently, we create an alternate application framework that enables the GPU to handle all control-flow dependencies autonomously at run-time while minimizing host involvement to just issuing commands, that facilitates an efficient application implementation. Finally, as compute and communication capabilities of mobile devices improve, we analyze energy implications of processing speech recognition locally (on-chip) and offloading it to servers (in-cloud).

  14. A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dong, Tingzing Tim; Tomov, Stanimire Z; Luszczek, Piotr R

    As modern hardware keeps evolving, an increasingly effective approach to developing energy efficient and high-performance solvers is to design them to work on many small size and independent problems. Many applications already need this functionality, especially for GPUs, which are currently known to be about four to five times more energy efficient than multicore CPUs. We describe the development of one-sided factorizations that work for a set of small dense matrices in parallel, and we illustrate our techniques on the QR factorization based on Householder transformations. We refer to this mode of operation as a batched factorization. Our approach ismore » based on representing the algorithms as a sequence of batched BLAS routines for GPU-only execution. This is in contrast to the hybrid CPU-GPU algorithms that rely heavily on using the multicore CPU for specific parts of the workload. But for a system to benefit fully from the GPU's significantly higher energy efficiency, avoiding the use of the multicore CPU must be a primary design goal, so the system can rely more heavily on the more efficient GPU. Additionally, this will result in the removal of the costly CPU-to-GPU communication. Furthermore, we do not use a single symmetric multiprocessor(on the GPU) to factorize a single problem at a time. We illustrate how our performance analysis, and the use of profiling and tracing tools, guided the development and optimization of our batched factorization to achieve up to a 2-fold speedup and a 3-fold energy efficiency improvement compared to our highly optimized batched CPU implementations based on the MKL library(when using two sockets of Intel Sandy Bridge CPUs). Compared to a batched QR factorization featured in the CUBLAS library for GPUs, we achieved up to 5x speedup on the K40 GPU.« less

  15. Democratic Population Decisions Result in Robust Policy-Gradient Learning: A Parametric Study with GPU Simulations

    PubMed Central

    Richmond, Paul; Buesing, Lars; Giugliano, Michele; Vasilaki, Eleni

    2011-01-01

    High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU programming to simulate a two-layer network of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and investigate its ability to learn a simplified navigation task using a policy-gradient learning rule stemming from Reinforcement Learning. The purpose of this paper is twofold. First, we want to support the use of GPUs in the field of Computational Neuroscience. Second, using GPU computing power, we investigate the conditions under which the said architecture and learning rule demonstrate best performance. Our work indicates that networks featuring strong Mexican-Hat-shaped recurrent connections in the top layer, where decision making is governed by the formation of a stable activity bump in the neural population (a “non-democratic” mechanism), achieve mediocre learning results at best. In absence of recurrent connections, where all neurons “vote” independently (“democratic”) for a decision via population vector readout, the task is generally learned better and more robustly. Our study would have been extremely difficult on a desktop computer without the use of GPU programming. We present the routines developed for this purpose and show that a speed improvement of 5x up to 42x is provided versus optimised Python code. The higher speed is achieved when we exploit the parallelism of the GPU in the search of learning parameters. This suggests that efficient GPU programming can significantly reduce the time needed for simulating networks of spiking neurons, particularly when multiple parameter configurations are investigated. PMID:21572529

  16. GPU accelerated generation of digitally reconstructed radiographs for 2-D/3-D image registration.

    PubMed

    Dorgham, Osama M; Laycock, Stephen D; Fisher, Mark H

    2012-09-01

    Recent advances in programming languages for graphics processing units (GPUs) provide developers with a convenient way of implementing applications which can be executed on the CPU and GPU interchangeably. GPUs are becoming relatively cheap, powerful, and widely available hardware components, which can be used to perform intensive calculations. The last decade of hardware performance developments shows that GPU-based computation is progressing significantly faster than CPU-based computation, particularly if one considers the execution of highly parallelisable algorithms. Future predictions illustrate that this trend is likely to continue. In this paper, we introduce a way of accelerating 2-D/3-D image registration by developing a hybrid system which executes on the CPU and utilizes the GPU for parallelizing the generation of digitally reconstructed radiographs (DRRs). Based on the advancements of the GPU over the CPU, it is timely to exploit the benefits of many-core GPU technology by developing algorithms for DRR generation. Although some previous work has investigated the rendering of DRRs using the GPU, this paper investigates approximations which reduce the computational overhead while still maintaining a quality consistent with that needed for 2-D/3-D registration with sufficient accuracy to be clinically acceptable in certain applications of radiation oncology. Furthermore, by comparing implementations of 2-D/3-D registration on the CPU and GPU, we investigate current performance and propose an optimal framework for PC implementations addressing the rigid registration problem. Using this framework, we are able to render DRR images from a 256×256×133 CT volume in ~24 ms using an NVidia GeForce 8800 GTX and in ~2 ms using NVidia GeForce GTX 580. In addition to applications requiring fast automatic patient setup, these levels of performance suggest image-guided radiation therapy at video frame rates is technically feasible using relatively low cost PC architecture.

  17. High performance computing for deformable image registration: towards a new paradigm in adaptive radiotherapy.

    PubMed

    Samant, Sanjiv S; Xia, Junyi; Muyan-Ozcelik, Pinar; Owens, John D

    2008-08-01

    The advent of readily available temporal imaging or time series volumetric (4D) imaging has become an indispensable component of treatment planning and adaptive radiotherapy (ART) at many radiotherapy centers. Deformable image registration (DIR) is also used in other areas of medical imaging, including motion corrected image reconstruction. Due to long computation time, clinical applications of DIR in radiation therapy and elsewhere have been limited and consequently relegated to offline analysis. With the recent advances in hardware and software, graphics processing unit (GPU) based computing is an emerging technology for general purpose computation, including DIR, and is suitable for highly parallelized computing. However, traditional general purpose computation on the GPU is limited because the constraints of the available programming platforms. As well, compared to CPU programming, the GPU currently has reduced dedicated processor memory, which can limit the useful working data set for parallelized processing. We present an implementation of the demons algorithm using the NVIDIA 8800 GTX GPU and the new CUDA programming language. The GPU performance will be compared with single threading and multithreading CPU implementations on an Intel dual core 2.4 GHz CPU using the C programming language. CUDA provides a C-like language programming interface, and allows for direct access to the highly parallel compute units in the GPU. Comparisons for volumetric clinical lung images acquired using 4DCT were carried out. Computation time for 100 iterations in the range of 1.8-13.5 s was observed for the GPU with image size ranging from 2.0 x 10(6) to 14.2 x 10(6) pixels. The GPU registration was 55-61 times faster than the CPU for the single threading implementation, and 34-39 times faster for the multithreading implementation. For CPU based computing, the computational time generally has a linear dependence on image size for medical imaging data. Computational efficiency is characterized in terms of time per megapixels per iteration (TPMI) with units of seconds per megapixels per iteration (or spmi). For the demons algorithm, our CPU implementation yielded largely invariant values of TPMI. The mean TPMIs were 0.527 spmi and 0.335 spmi for the single threading and multithreading cases, respectively, with <2% variation over the considered image data range. For GPU computing, we achieved TPMI =0.00916 spmi with 3.7% variation, indicating optimized memory handling under CUDA. The paradigm of GPU based real-time DIR opens up a host of clinical applications for medical imaging.

  18. Preparation and characterization of novel nanocomposites of inorganic/polysaccharide type =

    NASA Astrophysics Data System (ADS)

    Oliveira, Fabiane Costa

    O uso de polimeros naturais no ambito da preparacao de nanocompositos nao tem sido tao amplamente estudado quando comparado com os polimeros sinteticos. Assim, esta tese tem como objectivo estudar metodologias para a preparacao de novos materiais nanocompositos sob a forma de dispersoes e filmes utilizando polissacarideos como matriz. A tese esta dividida em cinco capitulos sendo o ultimo capitulo dedicado as conclusoes gerais e a sugestoes para trabalhos futuros. Inicialmente e apresentada uma breve revisao bibliografica sobre os principais temas colocando esta tese em contexto. Consideracoes sobre o uso de polimeros naturais e a sua combinacao com a utilizacao de nanoparticulas inorganicas para a fabricacao de novos bionanocomposites sao descritas e os objectivos e outline da tese sao tambem apresentados. No segundo capitulo, a preparacao de particulas de silica puras ou modificadas bem como a sua caracterizacao por FTIR, SEM, TEM, TGA, DLS (tamanho e potencial zeta) e medicoes de angulo de contacto sao discutidas. De modo a melhorar a compatibilidade da silica com os polissacarideos, as particulas SiO2 foram modificados com dois compostos do tipo organosilano: 3- metacril-oxipropil-trimetoxissilano (MPS) e 3-aminopropil-trimetoxissilano (APS). As particulas SiO2 MPS foram posteriormente encapsuladas com de poli(metacrilato de glicidilo) utilizando a tecnica de polimerizacao em emulsao. A utilizacao dos nanocompositos resultantes na preparacao de dispersoes de bionanocompositos nao foi bem sucedida e por esse motivo nao os estudos nao foram prosseguidos. O uso de SiO2 APS na preparacao de dispersoes bionanocomposite foi eficiente. No terceiro capitulo e apresentada uma revisao sobre dispersoes bionanocompositas e respectiva caracterizacao destacando aspectos fundamentais sobre reologia e microestrutura. Em seguida, e discutido o estudo sistematico realizado sobre o comportamento reologico de dispersoes de SiO2 utilizando tres polissacarideos distintos no que concerne a carga e as caracteristicas gelificantes: a goma de alfarroba (nao ionica), o quitosano (cationico) e a goma xantana (anionica) cujas propriedades reologicas sao amplamente conhecidas. Os estudos reologicos realizados sob diferentes condicoes demonstraram que a formacao de geis frageis e/ou bem estruturados depende do tamanho SiO2, da concentracao, do pH e da forca ionica. Estes estudos foram confirmados por analises microestruturais usando a microscopia electronica a baixas temperaturas (Cryo-SEM). No quarto capitulo, sao apresentados os estudos relativos a preparacao e caracterizacao de filmes bionanocompositos utilizando quitosano como matriz. Primeiramente e apresentada uma revisao sobre filmes de bionanocompositos e os aspectos fundamentais das tecnicas de caracterizacao utilizadas. A escolha do plasticizante e da sua concentracao sao discutidas com base nas propriedades de filmes de quitosano preparados. Em seguida, o efeito da concentracao de silica e dos metodos utilizados para a dispersar na matriz de polissacarideo, bem como o efeito da modificacao da superficie da silica e avaliado. As caracteristicas da superficie e as propriedades de barreira, mecanicas e termicas sao discutidas para cada conjunto de filmes preparados antes e apos a sua neutralizacao. Os resultados obtidos mostraram que a dispersao das cargas no plasticizante e posterior adicao a matriz polissacaridica resultaram apenas em pequenas melhorias ja que o problema da agregacao de silica nao foi ultrapassado. Por esse motivo foram preparados filmes com SiO2 APS os quais apresentaram propriedades melhores apesar da agregacao das particulas nao ter sido completamente impedida. Tal pode estar relacionado com o processo de secagem dos filmes. Finalmente, no capitulo 5, sao apresentadas as principais conclusoes obtidas e algumas sugestoes para trabalho futuro.

  19. Dataflow-Based Implementation of Layered Sensing Applications on High-Performance Embedded Processors

    DTIC Science & Technology

    2013-03-01

    time (milliseconds) GFlops Comparison to GPU peak performance (%) Cascade Gaussian Filtering 13 45.19 6.3 Difference of Gaussian 0.512 152...values for the GPU-targeted actor implementations in terms of Giga Floating Point Operations Per Second ( GFLOPS ). Our GFLOPS calculation for an actor...kernels. The results for GFLOPS are provided in Table . The actors were implemented on an NVIDIA GTX260 GPU, which provides 715 GFLOPS as peak

  20. GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sengupta, Dipanjan; Agarwal, Kapil; Song, Shuaiwen

    2015-09-30

    Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on graphs that exceed the device’s internal memory capacity. GraphReduce adopts a combination of both edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs with efficient graph data movement between the hostmore » and the device.« less

  1. Properties of self-assembled diluted magnetic semiconductor nanostructures =

    NASA Astrophysics Data System (ADS)

    Ankiewicz, Amelia Olga Goncalves

    Este trabalho centra-se na investigacao da possibilidade de se conseguir um semicondutor magnetico diluido (SMD) baseado em ZnO. Foi levado a cabo um estudo detalhado das propriedades magneticas e estruturais de estruturas de ZnO, nomeadamente nanofios (NFs), nanocristais (NCs) e filmes finos, dopadas com metais de transicao (MTs). Foram usadas varias tecnicas experimentais para caracterizar estas estruturas, designadamente difraccao de raios-X, microscopia electronica de varrimento, ressonancia magnetica, SQUID, e medidas de transporte. Foram incorporados substitucionalmente nos sitios do Zn ioes de Mn2+ e Co2+ em ambos os NFs e NCs de ZnO. Revelou-se para ambos os ioes dopantes, que a incorporacao e heterogenea, uma vez que parte do sinal de ressonancia paramagnetica electronica (RPE) vem de ioes de MTs em ambientes distorcidos ou enriquecidos com MTs. A partir das intensidades relativas dos espectros de RPE e de modificacoes da superficie, demonstra-se ainda que os NCs exibem uma estrutura core-shell. Os resultados, evidenciam que, com o aumento da concentracao de MTs, a dimensao dos NCs diminui e aumentam as distorcoes da rede. Finalmente, no caso dos NCs dopados com Mn, obteve-se o resultado singular de que a espessura da shell e da ordem de 0.3 nm e de que existe uma acumulacao de Mn na mesma. Com o objectivo de esclarecer o papel dos portadores de carga na medicao das interaccoes ferromagneticas, foram co-dopados filmes de ZnO com Mn e Al ou com Co e Al. Os filmes dopados com Mn, revelaram-se simplesmente paramagneticos, com os ioes de Mn substitucionais nos sitios do Zn. Por outro lado, os filmes dopados com Co exibem ferromagnetismo fraco nao intrinseco, provavelmente devido a decomposicao spinodal. Foram ainda efectuados estudos comparativos com filmes de ligas de Zn1-xFexO. Como era de esperar, detectaram-se segundas fases de espinela e de oxido de ferro nestas ligas; todas as amostras exibiam curvas de histerese a 300 K. Estes resultados suportam a hipotese de que as segundas fases sao responsaveis pelo comportamento magnetico observado em muitos sistemas baseados em ZnO. Nao se observou nenhuma evidencia de ferromagnetismo mediado por portadores de carga. As experiencias mostram que a analise de RPE permite demonstrar directamente se e onde estao incorporados os ioes de MTs e evidenciam a importancia dos efeitos de superficie para dimensoes menores que 15 nm, para as quais se formam estruturas core-shell. As investigacoes realizadas no ambito desta tese demonstram que nenhuma das amostras de ZnO estudadas exibiram propriedades de um SMD intrinseco e que, no futuro, sao necessarios estudos teoricos e experimentais detalhados das interaccoes de troca entre os ioes de MTs e os atomos do ZnO para determinar a origem das propriedades magneticas observadas.

  2. THE ROLE OF METABOLIC SURGERY FOR PATIENTS WITH OBESITY GRADE I ANDCLINICALLY UNCONTROLLED TYPE 2 DIABETES.

    PubMed

    Campos, Josemberg; Ramos, Almino; Szego, Thomaz; Zilberstein, Bruno; Feitosa, Heládio; Cohen, Ricardo

    2016-07-07

    Even considering the advance of the medical treatment in the last 20 years with new and more effective drugs, the outcomes are still disappointing as the control of obesity and type 2 Diabetes Mellitus (T2DM) with a large number of patients under the medical treatment still not reaching the desired outcomes. To present a Metabolic Risk Score to better guide the surgical indication for T2DM patients with body mass index (BMI) where surgery for obesity is still controversial. Research was conducted in PubMed, Medline, PubMed Central, Scielo and Lilacs between 2003-2015 correlating headings: metabolic surgery, obesity and type 2 diabetesmellitus. In addition, representatives of the societiesinvolved, as an expert panel, issued opinions. Forty-five related articles were analyzed by evidence-based medicine criteria. Grouped opinions sought to answer the following questions: Why metabolic and not bariatric surgery?; Mechanisms involved in glycemic control; BMI as a single criterion for surgical indication for uncontrolled T2DM; Results of metabolic surgery studies in BMI<35 kg/m2; Safety of metabolic surgery in patients with BMI<35 kg/m2; Long-term effects of surgery in patients with baseline BMI<35 kg/m2 and Proposal for a Metabolic Risk Score. Metabolic surgery has well-defined mechanisms of action both in experimental and human studies. Gastrointestinal interventions in T2DM patients with IMC≤35 kg/m2 has similar safety and efficacy when compared to groups with greater BMIs, leading to the improvement of diabetes in a superior manner than clinical treatment and lifestyle changes, in part through weight loss independent mechanisms . There is no correlation between baseline BMI and weight loss in the long term with the success rate after any surgical treatment. Gastrointestinal surgery treatment may be an option for patients with T2DM without adequate clinical control, with a BMI between 30 and 35, after thorough evaluation following the parameters detailed in Metabolic Risk Score defined by the surgical societies. Roux-en-Y gastric bypass (RYGB), because of its well known safety and efficacy and longer follow-up studies, is the main surgical technique indicated for patients eligible for surgery through the Metabolic Risk Score. The vertical sleeve gastrectomy may be considered if there is an absolute contraindication for the RYGB. T2DM patients should be evaluated by the multiprofessional team that will assess surgical eligibility, preoperative work up, follow up and long term monitoring for micro and macrovascular complications. Mesmo considerando o avanço do tratamento clínico ocorrido nos últimos 20 anos, com novos e mais eficientes medicamentos, os dados ainda são desanimadores quanto ao controle da obesidade e da diabete melito tipo 2(DMT2),com grande parcela de doentes em tratamento clínico ficando fora da meta desejada de controle. Apresentar proposta de Escore de Risco Metabólico para melhor orientar a indicação cirúrgica do diabete em pacientes com índice de massa corpórea (IMC) mais baixo nos quais o uso de procedimento cirúrgico para obesidade ainda é controverso. Foi realizada pesquisa nas bases de dados PubMed, Medline, PubMed Central, Scielo e Lilacs entre 2003-2015 correlacionando os descritores:cirurgia metabólica, obesidade e diabete melito tipo 2. Adicionalmente, representantes das sociedades envolvidas emitiram opiniões em pontos nos quais não existia na literatura trabalhos com graus de evidência elevados. Foram encontrados 45 artigos relacionadosque foram analisados pelos critérios da medicina baseada em evidências.As opiniões agrupadas procuraram responder as seguintes questões: Porque cirurgia metabólica e não bariátrica?;Mecanismos envolvidos no controle glicêmico; IMC como critério isolado de indicação cirúrgica para o DMT2 não controlado; Resultados de estudos de cirurgia metabólica em IMC<35 kg/m2; Segurança da cirurgia metabólica em pacientes com IMC<35 kg/m2; Efeitos em longo prazo da cirurgia em pacientes com IMC inicial <35 kg/m2; Proposta de Escore de Risco Metabólico. A cirurgia metabólica tem mecanismos de ação bem definidos tanto em estudos experimentais quanto em seres humanos. As intervenções gastrointestinais em diabéticos com IMC≤35 kg/m2 possuem segurança e eficácia semelhantes aos grupos com IMCs maiores, levando a melhora do diabete de forma superior aos tratamentos clínicos e mudanças de estilo de vida, em parte através de mecanismos independentes da perda ponderal. Não há correlação entre o IMC inicial e perda ponderal em longo prazo com os índices de sucesso do tratamento cirúrgico. O tratamento cirúrgico é opção para os pacientes portadores de DMT2 sem adequado controle clínico, com IMC entre 30 e 35, após minuciosa avaliação seguindo os parâmetros dispostos no Escore de Risco Metabólico aqui proposto. DGYR é a técnica indicada para os pacientes selecionados no Escore, existindo a possibilidade de indicação da gastrectomia vertical para os casos em que exista contraindicação para ela. O paciente deve ser avaliado por equipe multiprofissional envolvida na indicação, preparo e acompanhamento após as operações e acompanhados com monitorização de complicações micro e macrovasculares.

  3. THE ROLE OF METABOLIC SURGERY FOR PATIENTS WITH OBESITY GRADE I AND TYPE 2 DIABETES NOT CONTROLLED CLINICALLY.

    PubMed

    Campos, Josemberg; Ramos, Almino; Szego, Thomaz; Zilberstein, Bruno; Feitosa, Heládio; Cohen, Ricardo

    Even considering the advance of the medical treatment in the last 20 years with new and more effective drugs, the outcomes are still disappointing as the control of obesity and type 2 Diabetes Mellitus (T2DM) with a large number of patients under the medical treatment still not reaching the desired outcomes. To present a Metabolic Risk Score to better guide the surgical indication for T2DM patients with body mass index (BMI) where surgery for obesity is still controversial. Research was conducted in Pubmed, Medline, Pubmed Central, Scielo and Lilacs between 2003-2015 correlating headings: metabolic surgery, obesity and type 2 diabetes mellitus. In addition, representatives of the societies involved, as an expert panel, issued opinions. Forty-five related articles were analyzed by evidence-based medicine criteria. Grouped opinions sought to answer the following questions: Why metabolic and not bariatric surgery?; Mechanisms involved in glycemic control; BMI as a single criterion for surgical indication for uncontrolled T2DM; Results of metabolic surgery studies in BMI<35 kg/m2; Safety of metabolic surgery in patients with BMI<35 kg/m2; Long-term effects of surgery in patients with baseline BMI<35 kg/m2 and Proposal for a Metabolic Risk Score. Metabolic surgery has well-defined mechanisms of action both in experimental and human studies. Gastrointestinal interventions in T2DM patients with IMC≤35 kg/m2 has similar safety and efficacy when compared to groups with greater BMIs, leading to the improvement of diabetes in a superior manner than clinical treatment and lifestyle changes, in part through weight loss independent mechanisms . There is no correlation between baseline BMI and weight loss in the long term with the success rate after any surgical treatment. Gastrointestinal surgery treatment may be an option for patients with T2DM without adequate clinical control, with a BMI between 30 and 35, after thorough evaluation following the parameters detailed in Metabolic Risk Score defined by the surgical societies. Roux-en-Y gastric bypass (RYGB), because of its well known safety and efficacy and longer follow-up studies, is the main surgical technique indicated for patients eligible for surgery through the Metabolic Risk Score. The vertical sleeve gastrectomy may be considered if there is an absolute contraindication for the RYGB. T2DM patients should be evaluated by the multiprofessional team that will assess surgical eligibility, preoperative work up, follow up and long term monitoring for micro and macrovascular complications. Mesmo considerando o avanço do tratamento clínico ocorrido nos últimos 20 anos, com novos e mais eficientes medicamentos, os dados ainda são desanimadores quanto ao controle da obesidade e da diabete melito tipo 2 (DMT2),com grande parcela de doentes em tratamento clínico ficando fora da meta desejada de controle. Apresentar proposta de Escore de Risco Metabólico para melhor orientar a indicação cirúrgica do diabete em pacientes com índice de massa corpórea (IMC) mais baixo nos quais o uso de procedimento cirúrgico para obesidade ainda é controverso. Foi realizada pesquisa nas bases de dados Pubmed, Medline, Pubmed Central, Scielo e Lilacs entre 2003-2015 correlacionando os descritores:cirurgia metabólica, obesidade e diabete melito tipo 2. Adicionalmente, representantes das sociedades envolvidas emitiram opiniões em pontos nos quais não existia na literatura trabalhos com graus de evidência elevados. Foram encontrados 45 artigos relacionados que foram analisados pelos critérios da medicina baseada em evidências. As opiniões agrupadas procuraram responder as seguintes questões: Porque cirurgia metabólica e não bariátrica?; Mecanismos envolvidos no controle glicêmico; IMC como critério isolado de indicação cirúrgica para o DMT2 não controlado; Resultados de estudos de cirurgia metabólica em IMC<35 kg/m2; Segurança da cirurgia metabólica em pacientes com IMC<35 kg/m2; Efeitos em longo prazo da cirurgia em pacientes com IMC inicial <35 kg/m2; Proposta de Escore de Risco Metabólico. A cirurgia metabólica tem mecanismos de ação bem definidos tanto em estudos experimentais quanto em seres humanos. As intervenções gastrointestinais em diabéticos com IMC≤35 kg/m2 possuem segurança e eficácia semelhantes aos grupos com IMCs maiores, levando a melhora do diabete de forma superior aos tratamentos clínicos e mudanças de estilo de vida, em parte através de mecanismos independentes da perda ponderal. Não há correlação entre o IMC inicial e perda ponderal em longo prazo com os índices de sucesso do tratamento cirúrgico. O tratamento cirúrgico é opção para os pacientes portadores de DMT2 sem adequado controle clínico, com IMC entre 30 e 35, após minuciosa avaliação seguindo os parâmetros dispostos no Escore de Risco Metabólico aqui proposto. DGYR é a técnica indicada para os pacientes selecionados no Escore, existindo a possibilidade de indicação da gastrectomia vertical para os casos em que exista contraindicação para ela. O paciente deve ser avaliado por equipe multiprofissional envolvida na indicação, preparo e acompanhamento após as operações e acompanhados com monitorização de complicações micro e macrovasculares.

  4. CoMD Implementation Suite in Emerging Programming Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Haque, Riyaz; Reeve, Sam; Juallmes, Luc

    CoMD-Em is a software implementation suite of the CoMD [4] proxy app using different emerging programming models. It is intended to analyze the features and capabilities of novel programming models that could help ensure code and performance portability and scalability across heterogeneous platforms while improving programmer productivity. Another goal is to provide the authors and venders with some meaningful feedback regarding the capabilities and limitations of their models. The actual application is a classical molecular dynamics (MD) simulation using either the Lennard-Jones method (LJ) or the embedded atom method (EAM) for primary particle interaction. The code can be extended tomore » support alternate interaction models. The code is expected ro run on a wide class of heterogeneous hardware configurations like shard/distributed/hybrid memory, GPU's and any other platform supported by the underlying programming model.« less

  5. Sabemos prescrever profilaxia de tromboembolismo venoso nos pacientes internados?

    PubMed Central

    Lopes, Bruno Abdala Candido; Teixeira, Isabela Pizzatto; de Souza, Taynara Dantas; Tafarel, Jean Rodrigo

    2017-01-01

    Resumo Contexto Embora preconizada, a profilaxia de tromboembolismo venoso (TEV) deixa de ser realizada sistematicamente em pacientes internados. Objetivo Verificar se os pacientes hospitalizados recebem a prescrição correta da profilaxia de TEV do médico responsável por sua internação, conforme sua categoria de risco. Métodos Estudo transversal com análise de prontuários de pacientes internados no Hospital Santa Casa de Misericórdia de Curitiba, PR, entre 20 de março e 25 de maio de 2015. Excluíram-se os pacientes em uso de anticoagulantes ou com sangramento ativo. Analisou-se gênero, idade, tipo de cobertura de saúde, especialidade responsável pelo paciente e fatores de risco dos pacientes para classificá-los em alto, moderado ou baixo risco para TEV. Comparou-se o uso ou não da profilaxia entre as prescrições das especialidades clínicas e cirúrgicas, pacientes internados pelo Sistema Único de Saúde (SUS) e por convênios e de acordo com seu risco para TEV. Resultados Dos 78 pacientes avaliados, oito preencheram os critérios de exclusão. Dos 70 pacientes elegíveis (média etária 56,9 anos; 41 homens; 62 cobertos pelo SUS), 31 eram tratados por clínicos e 39 por cirurgiões. Apenas 46 (65,71%) pacientes receberam profilaxia para TEV. Dentre os pacientes clínicos, 29 (93,5%) receberam profilaxia, contra 17 (43,6%) do grupo cirúrgico (p < 0,001). Pacientes clínicos de moderado e alto risco receberam mais profilaxia que os cirúrgicos (p < 0,001 e p = 0,002). Não houve diferenças quanto à cobertura de saúde (SUS versus convênios médicos). Conclusões No Hospital Santa Casa de Misericórdia de Curitiba, pacientes cirúrgicos estão menos protegidos de eventos tromboembólicos em relação aos clínicos. PMID:29930647

  6. GPU based framework for geospatial analyses

    NASA Astrophysics Data System (ADS)

    Cosmin Sandric, Ionut; Ionita, Cristian; Dardala, Marian; Furtuna, Titus

    2017-04-01

    Parallel processing on multiple CPU cores is already used at large scale in geocomputing, but parallel processing on graphics cards is just at the beginning. Being able to use an simple laptop with a dedicated graphics card for advanced and very fast geocomputation is an advantage that each scientist wants to have. The necessity to have high speed computation in geosciences has increased in the last 10 years, mostly due to the increase in the available datasets. These datasets are becoming more and more detailed and hence they require more space to store and more time to process. Distributed computation on multicore CPU's and GPU's plays an important role by processing one by one small parts from these big datasets. These way of computations allows to speed up the process, because instead of using just one process for each dataset, the user can use all the cores from a CPU or up to hundreds of cores from GPU The framework provide to the end user a standalone tools for morphometry analyses at multiscale level. An important part of the framework is dedicated to uncertainty propagation in geospatial analyses. The uncertainty may come from the data collection or may be induced by the model or may have an infinite sources. These uncertainties plays important roles when a spatial delineation of the phenomena is modelled. Uncertainty propagation is implemented inside the GPU framework using Monte Carlo simulations. The GPU framework with the standalone tools proved to be a reliable tool for modelling complex natural phenomena The framework is based on NVidia Cuda technology and is written in C++ programming language. The code source will be available on github at https://github.com/sandricionut/GeoRsGPU Acknowledgement: GPU framework for geospatial analysis, Young Researchers Grant (ICUB-University of Bucharest) 2016, director Ionut Sandric

  7. Implementation and optimization of ultrasound signal processing algorithms on mobile GPU

    NASA Astrophysics Data System (ADS)

    Kong, Woo Kyu; Lee, Wooyoul; Kim, Kyu Cheol; Yoo, Yangmo; Song, Tai-Kyong

    2014-03-01

    A general-purpose graphics processing unit (GPGPU) has been used for improving computing power in medical ultrasound imaging systems. Recently, a mobile GPU becomes powerful to deal with 3D games and videos at high frame rates on Full HD or HD resolution displays. This paper proposes the method to implement ultrasound signal processing on a mobile GPU available in the high-end smartphone (Galaxy S4, Samsung Electronics, Seoul, Korea) with programmable shaders on the OpenGL ES 2.0 platform. To maximize the performance of the mobile GPU, the optimization of shader design and load sharing between vertex and fragment shader was performed. The beamformed data were captured from a tissue mimicking phantom (Model 539 Multipurpose Phantom, ATS Laboratories, Inc., Bridgeport, CT, USA) by using a commercial ultrasound imaging system equipped with a research package (Ultrasonix Touch, Ultrasonix, Richmond, BC, Canada). The real-time performance is evaluated by frame rates while varying the range of signal processing blocks. The implementation method of ultrasound signal processing on OpenGL ES 2.0 was verified by analyzing PSNR with MATLAB gold standard that has the same signal path. CNR was also analyzed to verify the method. From the evaluations, the proposed mobile GPU-based processing method has no significant difference with the processing using MATLAB (i.e., PSNR<52.51 dB). The comparable results of CNR were obtained from both processing methods (i.e., 11.31). From the mobile GPU implementation, the frame rates of 57.6 Hz were achieved. The total execution time was 17.4 ms that was faster than the acquisition time (i.e., 34.4 ms). These results indicate that the mobile GPU-based processing method can support real-time ultrasound B-mode processing on the smartphone.

  8. GPU-Accelerated Voxelwise Hepatic Perfusion Quantification

    PubMed Central

    Wang, H; Cao, Y

    2012-01-01

    Voxelwise quantification of hepatic perfusion parameters from dynamic contrast enhanced (DCE) imaging greatly contributes to assessment of liver function in response to radiation therapy. However, the efficiency of the estimation of hepatic perfusion parameters voxel-by-voxel in the whole liver using a dual-input single-compartment model requires substantial improvement for routine clinical applications. In this paper, we utilize the parallel computation power of a graphics processing unit (GPU) to accelerate the computation, while maintaining the same accuracy as the conventional method. Using CUDA-GPU, the hepatic perfusion computations over multiple voxels are run across the GPU blocks concurrently but independently. At each voxel, non-linear least squares fitting the time series of the liver DCE data to the compartmental model is distributed to multiple threads in a block, and the computations of different time points are performed simultaneously and synchronically. An efficient fast Fourier transform in a block is also developed for the convolution computation in the model. The GPU computations of the voxel-by-voxel hepatic perfusion images are compared with ones by the CPU using the simulated DCE data and the experimental DCE MR images from patients. The computation speed is improved by 30 times using a NVIDIA Tesla C2050 GPU compared to a 2.67 GHz Intel Xeon CPU processor. To obtain liver perfusion maps with 626400 voxels in a patient’s liver, it takes 0.9 min with the GPU-accelerated voxelwise computation, compared to 110 min with the CPU, while both methods result in perfusion parameters differences less than 10−6. The method will be useful for generating liver perfusion images in clinical settings. PMID:22892645

  9. Fast-GPU-PCC: A GPU-Based Technique to Compute Pairwise Pearson's Correlation Coefficients for Time Series Data-fMRI Study.

    PubMed

    Eslami, Taban; Saeed, Fahad

    2018-04-20

    Functional magnetic resonance imaging (fMRI) is a non-invasive brain imaging technique, which has been regularly used for studying brain’s functional activities in the past few years. A very well-used measure for capturing functional associations in brain is Pearson’s correlation coefficient. Pearson’s correlation is widely used for constructing functional network and studying dynamic functional connectivity of the brain. These are useful measures for understanding the effects of brain disorders on connectivities among brain regions. The fMRI scanners produce huge number of voxels and using traditional central processing unit (CPU)-based techniques for computing pairwise correlations is very time consuming especially when large number of subjects are being studied. In this paper, we propose a graphics processing unit (GPU)-based algorithm called Fast-GPU-PCC for computing pairwise Pearson’s correlation coefficient. Based on the symmetric property of Pearson’s correlation, this approach returns N ( N − 1 ) / 2 correlation coefficients located at strictly upper triangle part of the correlation matrix. Storing correlations in a one-dimensional array with the order as proposed in this paper is useful for further usage. Our experiments on real and synthetic fMRI data for different number of voxels and varying length of time series show that the proposed approach outperformed state of the art GPU-based techniques as well as the sequential CPU-based versions. We show that Fast-GPU-PCC runs 62 times faster than CPU-based version and about 2 to 3 times faster than two other state of the art GPU-based methods.

  10. SU-D-206-01: Employing a Novel Consensus Optimization Strategy to Achieve Iterative Cone Beam CT Reconstruction On a Multi-GPU Platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, B; Southern Medical University, Guangzhou, Guangdong; Tian, Z

    Purpose: While compressed sensing-based cone-beam CT (CBCT) iterative reconstruction techniques have demonstrated tremendous capability of reconstructing high-quality images from undersampled noisy data, its long computation time still hinders wide application in routine clinic. The purpose of this study is to develop a reconstruction framework that employs modern consensus optimization techniques to achieve CBCT reconstruction on a multi-GPU platform for improved computational efficiency. Methods: Total projection data were evenly distributed to multiple GPUs. Each GPU performed reconstruction using its own projection data with a conventional total variation regularization approach to ensure image quality. In addition, the solutions from GPUs were subjectmore » to a consistency constraint that they should be identical. We solved the optimization problem with all the constraints considered rigorously using an alternating direction method of multipliers (ADMM) algorithm. The reconstruction framework was implemented using OpenCL on a platform with two Nvidia GTX590 GPU cards, each with two GPUs. We studied the performance of our method and demonstrated its advantages through a simulation case with a NCAT phantom and an experimental case with a Catphan phantom. Result: Compared with the CBCT images reconstructed using conventional FDK method with full projection datasets, our proposed method achieved comparable image quality with about one third projection numbers. The computation time on the multi-GPU platform was ∼55 s and ∼ 35 s in the two cases respectively, achieving a speedup factor of ∼ 3.0 compared with single GPU reconstruction. Conclusion: We have developed a consensus ADMM-based CBCT reconstruction method which enabled performing reconstruction on a multi-GPU platform. The achieved efficiency made this method clinically attractive.« less

  11. Compilação de dados atômicos e moleculares do UV ao IV próximo para uso em síntese espectral

    NASA Astrophysics Data System (ADS)

    Coelho, P.; Barbuy, B.; Melendez, J.; Allen, D. M.; Castilho, B.

    2003-08-01

    Espectros sintéticos são utéis em uma grande variedade de aplicações, desde análise de abundâncias em espectros estelares de alta resolução ao estudo de populações estelares em espectros integrados. A confiabilidade de um espectro sintético depende do modelo de atmosfera adotado, do código de formação de linhas e da qualidade dos dados atômicos e moleculares que são determinantes no cálculo das opacidades da fotosfera. O nosso grupo no departamento de Astronomia no IAG tem utilizado espectros sintéticos há mais de 15 anos, em aplicações voltadas principalmente para a análise de abundâncias de estrelas G, K e M e populações estelares velhas. Ao longo desse tempo, as listas de linhas vieram sendo construídas e atualizadas continuamente, e alguns acréscimos recentes podem ser citados: Castilho (1999, átomos e moléculas no UV), Schiavon (1998, bandas moleculares de TiO) e Melendez (2001, átomos e moléculas no IV próximo). Com o intuito de calcular uma grade de espectros do UV ao IV próximo para uso no estudo de populações estelares velhas, se fazia necessário compilar e homogeneizar as diversas listas em apenas uma lista atômica e uma molecular. Nesse processo, a nova lista compilada foi correlacionada com outras bases de dados (NIST, Kurucz Database, O' Brian et al. 1991) para atualização dos parâmetros que caracterizam a transição atômica (comprimento de onda, log gf e potencial de excitação). Adicionalmente as constantes de interação C6 foram calculadas segundo a teoria de Anstee & O'Mara (1995) e artigos posteriores. As bandas moleculares de CH e CN foram recalculadas com o programa LIFBASE (Luque & Crosley 1999). Nesse poster estão detalhados os procedimentos citados acima, as comparações entre espectros calculados com as novas listas e espectros observados em alta resolução do Sol e de Arcturus, e uma análise do impacto decorrente da utilização de diferentes modelos de atmosfera no espectro sintético. Ao final, temos uma lista de linhas atômicas com mais de 24.000 linhas e uma lista molecular com as moléculas CN, CH, OH, NH, MgH, C2, TiO Gama, CO, FeH, adequadas ao estudo de estrelas G, K e M e populações estelares velhas.

  12. Efficient Parallel Levenberg-Marquardt Model Fitting towards Real-Time Automated Parametric Imaging Microscopy

    PubMed Central

    Zhu, Xiang; Zhang, Dianwen

    2013-01-01

    We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785

  13. Problems Related to Parallelization of CFD Algorithms on GPU, Multi-GPU and Hybrid Architectures

    NASA Astrophysics Data System (ADS)

    Biazewicz, Marek; Kurowski, Krzysztof; Ludwiczak, Bogdan; Napieraia, Krystyna

    2010-09-01

    Computational Fluid Dynamics (CFD) is one of the branches of fluid mechanics, which uses numerical methods and algorithms to solve and analyze fluid flows. CFD is used in various domains, such as oil and gas reservoir uncertainty analysis, aerodynamic body shapes optimization (e.g. planes, cars, ships, sport helmets, skis), natural phenomena analysis, numerical simulation for weather forecasting or realistic visualizations. CFD problem is very complex and needs a lot of computational power to obtain the results in a reasonable time. We have implemented a parallel application for two-dimensional CFD simulation with a free surface approximation (MAC method) using new hardware architectures, in particular multi-GPU and hybrid computing environments. For this purpose we decided to use NVIDIA graphic cards with CUDA environment due to its simplicity of programming and good computations performance. We used finite difference discretization of Navier-Stokes equations, where fluid is propagated over an Eulerian Grid. In this model, the behavior of the fluid inside the cell depends only on the properties of local, surrounding cells, therefore it is well suited for the GPU-based architecture. In this paper we demonstrate how to use efficiently the computing power of GPUs for CFD. Additionally, we present some best practices to help users analyze and improve the performance of CFD applications executed on GPU. Finally, we discuss various challenges around the multi-GPU implementation on the example of matrix multiplication.

  14. Pyglidein - A Simple HTCondor Glidein Service

    NASA Astrophysics Data System (ADS)

    Schultz, D.; Riedel, B.; Merino, G.

    2017-10-01

    A major challenge for data processing and analysis at the IceCube Neutrino Observatory presents itself in connecting a large set of individual clusters together to form a computing grid. Most of these clusters do not provide a “standard” grid interface. Using a local account on each submit machine, HTCondor glideins can be submitted to virtually any type of scheduler. The glideins then connect back to a main HTCondor pool, where jobs can run normally with no special syntax. To respond to dynamic load, a simple server advertises the number of idle jobs in the queue and the resources they request. The submit script can query this server to optimize glideins to what is needed, or not submit if there is no demand. Configuring HTCondor dynamic slots in the glideins allows us to efficiently handle varying memory requirements as well as whole-node jobs. One step of the IceCube simulation chain, photon propagation in the ice, heavily relies on GPUs for faster execution. Therefore, one important requirement for any workload management system in IceCube is to handle GPU resources properly. Within the pyglidein system, we have successfully configured HTCondor glideins to use any GPU allocated to it, with jobs using the standard HTCondor GPU syntax to request and use a GPU. This mechanism allows us to seamlessly integrate our local GPU cluster with remote non-Grid GPU clusters, including specially allocated resources at XSEDE supercomputers.

  15. Cpu/gpu Computing for AN Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

    NASA Astrophysics Data System (ADS)

    Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin

    2016-06-01

    CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.

  16. OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows

    DOE PAGES

    Xia, Yidong; Lou, Jialin; Luo, Hong; ...

    2015-02-09

    Here, an OpenACC directive-based graphics processing unit (GPU) parallel scheme is presented for solving the compressible Navier–Stokes equations on 3D hybrid unstructured grids with a third-order reconstructed discontinuous Galerkin method. The developed scheme requires the minimum code intrusion and algorithm alteration for upgrading a legacy solver with the GPU computing capability at very little extra effort in programming, which leads to a unified and portable code development strategy. A face coloring algorithm is adopted to eliminate the memory contention because of the threading of internal and boundary face integrals. A number of flow problems are presented to verify the implementationmore » of the developed scheme. Timing measurements were obtained by running the resulting GPU code on one Nvidia Tesla K20c GPU card (Nvidia Corporation, Santa Clara, CA, USA) and compared with those obtained by running the equivalent Message Passing Interface (MPI) parallel CPU code on a compute node (consisting of two AMD Opteron 6128 eight-core CPUs (Advanced Micro Devices, Inc., Sunnyvale, CA, USA)). Speedup factors of up to 24× and 1.6× for the GPU code were achieved with respect to one and 16 CPU cores, respectively. The numerical results indicate that this OpenACC-based parallel scheme is an effective and extensible approach to port unstructured high-order CFD solvers to GPU computing.« less

  17. Improving metabolic stability with deuterium: The discovery of GPU-028, a potent free fatty acid receptor 4 agonists.

    PubMed

    Li, Zheng; Xu, Xue; Li, Gang; Fu, Xiaoting; Liu, Yanzhi; Feng, Yufeng; Wang, Mingyan; Ouyang, Yunting; Han, Jing

    2017-12-15

    The free fatty acid receptor 4 (FFA4) has emerged as a promising anti-diabetic target due to its function in improvement of insulin secretion and insulin resistance. The FFA4 agonist TUG-891 revealed great potential as a widely used pharmacological tool, but it has been suffered from high plasma clearance probably because the phenylpropanoic acid is vulnerable to β-oxidation. To identify metabolically stable analog without influence on physiological mechanism of TUG-891, we tried to incorporate deuterium at the α-position of phenylpropionic acid to afford compound 4 (GPU-028). As expected, GPU-028 revealed a longer half-life (T 1/2  = 1.66 h), lower clearance (CL = 0.97 L/h/kg) and higher maximum plasma concentration (C max  = 2035.23 μg/L), resulting in a 4-fold higher exposure than TUG-891. Although GPU-028 exhibited a similar agonistic activity in comparison to TUG-891, the hypoglycemic effect of GPU-028 was better than that of TUG-891 after treatment over four weeks in diet-induced obese mice. These positive results indicated that GPU-028 might be a better pharmacological tool than TUG-891 to explore physiological function of FFA4, especially on the in vivo study. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Higher-order ice-sheet modelling accelerated by multigrid on graphics cards

    NASA Astrophysics Data System (ADS)

    Brædstrup, Christian; Egholm, David

    2013-04-01

    Higher-order ice flow modelling is a very computer intensive process owing primarily to the nonlinear influence of the horizontal stress coupling. When applied for simulating long-term glacial landscape evolution, the ice-sheet models must consider very long time series, while both high temporal and spatial resolution is needed to resolve small effects. The use of higher-order and full stokes models have therefore seen very limited usage in this field. However, recent advances in graphics card (GPU) technology for high performance computing have proven extremely efficient in accelerating many large-scale scientific computations. The general purpose GPU (GPGPU) technology is cheap, has a low power consumption and fits into a normal desktop computer. It could therefore provide a powerful tool for many glaciologists working on ice flow models. Our current research focuses on utilising the GPU as a tool in ice-sheet and glacier modelling. To this extent we have implemented the Integrated Second-Order Shallow Ice Approximation (iSOSIA) equations on the device using the finite difference method. To accelerate the computations, the GPU solver uses a non-linear Red-Black Gauss-Seidel iterator coupled with a Full Approximation Scheme (FAS) multigrid setup to further aid convergence. The GPU finite difference implementation provides the inherent parallelization that scales from hundreds to several thousands of cores on newer cards. We demonstrate the efficiency of the GPU multigrid solver using benchmark experiments.

  19. The novel implicit LU-SGS parallel iterative method based on the diffusion equation of a nuclear reactor on a GPU cluster

    NASA Astrophysics Data System (ADS)

    Zhang, Jilin; Sha, Chaoqun; Wu, Yusen; Wan, Jian; Zhou, Li; Ren, Yongjian; Si, Huayou; Yin, Yuyu; Jing, Ya

    2017-02-01

    GPU not only is used in the field of graphic technology but also has been widely used in areas needing a large number of numerical calculations. In the energy industry, because of low carbon, high energy density, high duration and other characteristics, the development of nuclear energy cannot easily be replaced by other energy sources. Management of core fuel is one of the major areas of concern in a nuclear power plant, and it is directly related to the economic benefits and cost of nuclear power. The large-scale reactor core expansion equation is large and complicated, so the calculation of the diffusion equation is crucial in the core fuel management process. In this paper, we use CUDA programming technology on a GPU cluster to run the LU-SGS parallel iterative calculation against the background of the diffusion equation of the reactor. We divide one-dimensional and two-dimensional mesh into a plurality of domains, with each domain evenly distributed on the GPU blocks. A parallel collision scheme is put forward that defines the virtual boundary of the grid exchange information and data transmission by non-stop collision. Compared with the serial program, the experiment shows that GPU greatly improves the efficiency of program execution and verifies that GPU is playing a much more important role in the field of numerical calculations.

  20. permGPU: Using graphics processing units in RNA microarray association studies.

    PubMed

    Shterev, Ivo D; Jung, Sin-Ho; George, Stephen L; Owzar, Kouros

    2010-06-16

    Many analyses of microarray association studies involve permutation, bootstrap resampling and cross-validation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed. We have developed a CUDA based implementation, permGPU, that employs graphics processing units in microarray association studies. We illustrate the performance and applicability of permGPU within the context of permutation resampling for a number of test statistics. An extensive simulation study demonstrates a dramatic increase in performance when using permGPU on an NVIDIA GTX 280 card compared to an optimized C/C++ solution running on a conventional Linux server. permGPU is available as an open-source stand-alone application and as an extension package for the R statistical environment. It provides a dramatic increase in performance for permutation resampling analysis in the context of microarray association studies. The current version offers six test statistics for carrying out permutation resampling analyses for binary, quantitative and censored time-to-event traits.

  1. Parallel Computer System for 3D Visualization Stereo on GPU

    NASA Astrophysics Data System (ADS)

    Al-Oraiqat, Anas M.; Zori, Sergii A.

    2018-03-01

    This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.

  2. Multi-GPU parallel algorithm design and analysis for improved inversion of probability tomography with gravity gradiometry data

    NASA Astrophysics Data System (ADS)

    Hou, Zhenlong; Huang, Danian

    2017-09-01

    In this paper, we make a study on the inversion of probability tomography (IPT) with gravity gradiometry data at first. The space resolution of the results is improved by multi-tensor joint inversion, depth weighting matrix and the other methods. Aiming at solving the problems brought by the big data in the exploration, we present the parallel algorithm and the performance analysis combining Compute Unified Device Architecture (CUDA) with Open Multi-Processing (OpenMP) based on Graphics Processing Unit (GPU) accelerating. In the test of the synthetic model and real data from Vinton Dome, we get the improved results. It is also proved that the improved inversion algorithm is effective and feasible. The performance of parallel algorithm we designed is better than the other ones with CUDA. The maximum speedup could be more than 200. In the performance analysis, multi-GPU speedup and multi-GPU efficiency are applied to analyze the scalability of the multi-GPU programs. The designed parallel algorithm is demonstrated to be able to process larger scale of data and the new analysis method is practical.

  3. Adaptation of a Multi-Block Structured Solver for Effective Use in a Hybrid CPU/GPU Massively Parallel Environment

    NASA Astrophysics Data System (ADS)

    Gutzwiller, David; Gontier, Mathieu; Demeulenaere, Alain

    2014-11-01

    Multi-Block structured solvers hold many advantages over their unstructured counterparts, such as a smaller memory footprint and efficient serial performance. Historically, multi-block structured solvers have not been easily adapted for use in a High Performance Computing (HPC) environment, and the recent trend towards hybrid GPU/CPU architectures has further complicated the situation. This paper will elaborate on developments and innovations applied to the NUMECA FINE/Turbo solver that have allowed near-linear scalability with real-world problems on over 250 hybrid GPU/GPU cluster nodes. Discussion will focus on the implementation of virtual partitioning and load balancing algorithms using a novel meta-block concept. This implementation is transparent to the user, allowing all pre- and post-processing steps to be performed using a simple, unpartitioned grid topology. Additional discussion will elaborate on developments that have improved parallel performance, including fully parallel I/O with the ADIOS API and the GPU porting of the computationally heavy CPUBooster convergence acceleration module. Head of HPC and Release Management, Numeca International.

  4. Work stealing for GPU-accelerated parallel programs in a global address space framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

    Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less

  5. Implementation of EAM and FS potentials in HOOMD-blue

    NASA Astrophysics Data System (ADS)

    Yang, Lin; Zhang, Feng; Travesset, Alex; Wang, Caizhuang; Ho, Kaiming

    HOOMD-blue is a general-purpose software to perform classical molecular dynamics simulations entirely on GPUs. We provide full support for EAM and FS type potentials in HOOMD-blue, and report accuracy and efficiency benchmarks, including comparisons with the LAMMPS GPU package. Two problems were selected to test the accuracy: the determination of the glass transition temperature of Cu64.5Zr35.5 alloy using an FS potential and the calculation of pair distribution functions of Ni3Al using an EAM potential. In both cases, the results using HOOMD-blue are indistinguishable from those obtained by the GPU package in LAMMPS within statistical uncertainties. As tests for time efficiency, we benchmark time-steps per second using LAMMPS GPU and HOOMD-blue on one NVIDIA Tesla GPU. Compared to our typical LAMMPS simulations on one CPU cluster node which has 16 CPUs, LAMMPS GPU can be 3-3.5 times faster, and HOOMD-blue can be 4-5.5 times faster. We acknowledge the support from Laboratory Directed Research and Development (LDRD) of Ames Laboratory.

  6. GPU-accelerated FDTD modeling of radio-frequency field-tissue interactions in high-field MRI.

    PubMed

    Chi, Jieru; Liu, Feng; Weber, Ewald; Li, Yu; Crozier, Stuart

    2011-06-01

    The analysis of high-field RF field-tissue interactions requires high-performance finite-difference time-domain (FDTD) computing. Conventional CPU-based FDTD calculations offer limited computing performance in a PC environment. This study presents a graphics processing unit (GPU)-based parallel-computing framework, producing substantially boosted computing efficiency (with a two-order speedup factor) at a PC-level cost. Specific details of implementing the FDTD method on a GPU architecture have been presented and the new computational strategy has been successfully applied to the design of a novel 8-element transceive RF coil system at 9.4 T. Facilitated by the powerful GPU-FDTD computing, the new RF coil array offers optimized fields (averaging 25% improvement in sensitivity, and 20% reduction in loop coupling compared with conventional array structures of the same size) for small animal imaging with a robust RF configuration. The GPU-enabled acceleration paves the way for FDTD to be applied for both detailed forward modeling and inverse design of MRI coils, which were previously impractical.

  7. Ramses-GPU: Second order MUSCL-Handcock finite volume fluid solver

    NASA Astrophysics Data System (ADS)

    Kestener, Pierre

    2017-10-01

    RamsesGPU is a reimplementation of RAMSES (ascl:1011.007) which drops the adaptive mesh refinement (AMR) features to optimize 3D uniform grid algorithms for modern graphics processor units (GPU) to provide an efficient software package for astrophysics applications that do not need AMR features but do require a very large number of integration time steps. RamsesGPU provides an very efficient C++/CUDA/MPI software implementation of a second order MUSCL-Handcock finite volume fluid solver for compressible hydrodynamics as a magnetohydrodynamics solver based on the constraint transport technique. Other useful modules includes static gravity, dissipative terms (viscosity, resistivity), and forcing source term for turbulence studies, and special care was taken to enhance parallel input/output performance by using state-of-the-art libraries such as HDF5 and parallel-netcdf.

  8. Simulation of ring polymer melts with GPU acceleration

    NASA Astrophysics Data System (ADS)

    Schram, R. D.; Barkema, G. T.

    2018-06-01

    We implemented the elastic lattice polymer model on the GPU (Graphics Processing Unit), and show that the GPU is very efficient for polymer simulations of dense polymer melts. The implementation is able to perform up to 4.1 ṡ109 Monte Carlo moves per second. Compared to our standard CPU implementation, we find an effective speed-up of a factor 92. Using this GPU implementation we studied the equilibrium properties and the dynamics of non-concatenated ring polymers in a melt of such polymers, using Rouse modes. With increasing polymer length, we found a very slow transition to compactness with a growth exponent ν ≈ 1 / 3. Numerically we find that the longest internal time scale of the polymer scales as N3.1, with N the molecular weight of the ring polymer.

  9. Tensor Algebra Library for NVidia Graphics Processing Units

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liakh, Dmitry

    This is a general purpose math library implementing basic tensor algebra operations on NVidia GPU accelerators. This software is a tensor algebra library that can perform basic tensor algebra operations, including tensor contractions, tensor products, tensor additions, etc., on NVidia GPU accelerators, asynchronously with respect to the CPU host. It supports a simultaneous use of multiple NVidia GPUs. Each asynchronous API function returns a handle which can later be used for querying the completion of the corresponding tensor algebra operation on a specific GPU. The tensors participating in a particular tensor operation are assumed to be stored in local RAMmore » of a node or GPU RAM. The main research area where this library can be utilized is the quantum many-body theory (e.g., in electronic structure theory).« less

  10. Overview of implementation of DARPA GPU program in SAIC

    NASA Astrophysics Data System (ADS)

    Braunreiter, Dennis; Furtek, Jeremy; Chen, Hai-Wen; Healy, Dennis

    2008-04-01

    This paper reviews the implementation of DARPA MTO STAP-BOY program for both Phase I and II conducted at Science Applications International Corporation (SAIC). The STAP-BOY program conducts fast covariance factorization and tuning techniques for space-time adaptive process (STAP) Algorithm Implementation on Graphics Processor unit (GPU) Architectures for Embedded Systems. The first part of our presentation on the DARPA STAP-BOY program will focus on GPU implementation and algorithm innovations for a prototype radar STAP algorithm. The STAP algorithm will be implemented on the GPU, using stream programming (from companies such as PeakStream, ATI Technologies' CTM, and NVIDIA) and traditional graphics APIs. This algorithm will include fast range adaptive STAP weight updates and beamforming applications, each of which has been modified to exploit the parallel nature of graphics architectures.

  11. Multi-core and GPU accelerated simulation of a radial star target imaged with equivalent t-number circular and Gaussian pupils

    NASA Astrophysics Data System (ADS)

    Greynolds, Alan W.

    2013-09-01

    Results from the GelOE optical engineering software are presented for the through-focus, monochromatic coherent and polychromatic incoherent imaging of a radial "star" target for equivalent t-number circular and Gaussian pupils. The FFT-based simulations are carried out using OpenMP threading on a multi-core desktop computer, with and without the aid of a many-core NVIDIA GPU accessing its cuFFT library. It is found that a custom FFT optimized for the 12-core host has similar performance to a simply implemented 256-core GPU FFT. A more sophisticated version of the latter but tuned to reduce overhead on a 448-core GPU is 20 to 28 times faster than a basic FFT implementation running on one CPU core.

  12. A GPU-paralleled implementation of an enhanced face recognition algorithm

    NASA Astrophysics Data System (ADS)

    Chen, Hao; Liu, Xiyang; Shao, Shuai; Zan, Jiguo

    2013-03-01

    Face recognition algorithm based on compressed sensing and sparse representation is hotly argued in these years. The scheme of this algorithm increases recognition rate as well as anti-noise capability. However, the computational cost is expensive and has become a main restricting factor for real world applications. In this paper, we introduce a GPU-accelerated hybrid variant of face recognition algorithm named parallel face recognition algorithm (pFRA). We describe here how to carry out parallel optimization design to take full advantage of many-core structure of a GPU. The pFRA is tested and compared with several other implementations under different data sample size. Finally, Our pFRA, implemented with NVIDIA GPU and Computer Unified Device Architecture (CUDA) programming model, achieves a significant speedup over the traditional CPU implementations.

  13. A fonte ionizante do disco de acreção no núcleo de NGC1097

    NASA Astrophysics Data System (ADS)

    Silva, R. N.; Storchi-Bergmann, T.

    2003-08-01

    Observações em raios-X revelam o "coração" dos núcleos ativos de galáxias, pois esse tipo de radiação provém das suas regiões mais internas, próximas ao buraco negro central. Neste trabalho apresentamos observações em raios-X da região central da galáxia NGC1097, que hospeda um buraco negro supermassivo e um disco de acreção cuja emissão vem sendo observada há dez anos através da linha de emissão Ha larga (10000 km/s) e de duplo pico. As observações em raios-X - que foram obtidas com o Telescópio Chandra - foram combinadas com observações no ultravioleta obtidas com o Telescópio Espacial Hubble e são usadas para estudar as características da fonte central que ioniza o disco de acreção. A distribuição espectral de energia é comparada com a predita por modelos, em particular o de uma estrutura "ADAF" ("advection dominated accretion flow") na parte interna do disco. Tal estrutura produz um espectro de emissão de linhas estreitas tipo LINER, como observado em NGC1097 e em rádio-galáxias que apresentam linhas de Balmer largas de duplo pico. Apresentamos também uma comparação entre outros LINERs com linhas de emissão largas de duplo pico, disponíveis na literatura ou nos arquivos do Chandra e do Telescópio Espacial Hubble e discutimos as correspondentes implicações para modelos da fonte central.

  14. Relationship between Resting Heart Rate, Blood Pressure and Pulse Pressure in Adolescents.

    PubMed

    Christofaro, Diego Giulliano Destro; Casonatto, Juliano; Vanderlei, Luiz Carlos Marques; Cucato, Gabriel Grizzo; Dias, Raphael Mendes Ritti

    2017-05-01

    High resting heart rate is considered an important factor for increasing mortality chance in adults. However, it remains unclear whether the observed associations would remain after adjustment for confounders in adolescents. To analyze the relationship between resting heart rate, blood pressure and pulse pressure in adolescents of both sexes. A cross-sectional study with 1231 adolescents (716 girls and 515 boys) aged 14-17 years. Heart rate, blood pressure and pulse pressure were evaluated using an oscillometric blood pressure device, validated for this population. Weight and height were measured with an electronic scale and a stadiometer, respectively, and waist circumference with a non-elastic tape. Multivariate analysis using linear regression investigated the relationship between resting heart rate and blood pressure and pulse pressure in boys and girls, controlling for general and abdominal obesity. Higher resting heart rate values were observed in girls (80.1 ± 11.0 beats/min) compared to boys (75.9 ± 12.7 beats/min) (p ≤ 0.001). Resting heart rate was associated with systolic blood pressure in boys (Beta = 0.15 [0.04; 0.26]) and girls (Beta = 0.24 [0.16; 0.33]), with diastolic blood pressure in boys (Beta = 0.50 [0.37; 0.64]) and girls (Beta = 0.41 [0.30; 0.53]), and with pulse pressure in boys (Beta = -0.16 [-0.27; -0.04]). This study demonstrated a relationship between elevated resting heart rate and increased systolic and diastolic blood pressure in both sexes and pulse pressure in boys even after controlling for potential confounders, such as general and abdominal obesity. A frequência cardíaca de repouso é considerada um importante fator de aumento de mortalidade em adultos. Entretanto, ainda é incerto se as associações observadas permanecem após ajuste para fatores de confusão em adolescentes. Analisar a relação entre frequência cardíaca de repouso, pressão arterial e pressão de pulso em adolescentes dos dois sexos. Estudo transversal com 1231 adolescentes (716 meninas e 515 meninos, idade de 14-17 anos). Frequência cardíaca, pressão arterial e pressão de pulso foram avaliadas com esfigmomanômetro oscilométrico validado para essa população. Peso e altura foram medidos com balança eletrônica e estadiômetro, respectivamente, e a circunferência abdominal, com uma fita inextensível. Análise multivariada com regressão linear investigou a relação entre frequência cardíaca de repouso, pressão arterial e pressão de pulso em meninos e meninas, controlando para obesidade geral e abdominal. Valores maiores de frequência cardíaca de repouso foram observados em meninas (80,1 ± 11,0 bpm) em comparação a meninos (75,9 ± 12,7 bpm) (p ≤ 0,001). Frequência cardíaca de repouso associou-se com pressão arterial sistólica em meninos [Beta = 0,15 (0,04; 0,26)] e meninas [Beta = 0,24 (0,16; 0,33)], com pressão arterial diastólica em meninos [Beta = 0,50 (0,37; 0,64)] e meninas [Beta = 0,41 (0,30; 0,53)], e com pressão de pulso apenas em meninos [Beta = -0,16 (-0,27; -0,04)]. Este estudo demonstrou a relação da frequência cardíaca de repouso elevada com aumento das pressões arteriais sistólica e diastólica em ambos os sexos e com pressão de pulso em meninos, mesmo após controle para potenciais fatores de confusão, como obesidade geral e abdominal.

  15. Um enfoque antropológico para o ensino de astronomia no nível médio

    NASA Astrophysics Data System (ADS)

    Costa, G. B.; Jafelice, L. C.

    2003-08-01

    Há uma enorme carência de materiais didático-pedagógicos em astronomia para professores do ensino médio, sobretudo materiais que explorem também aspectos humanísticos. A origem do Universo é um bom exemplo desta constatação central. Embora tal origem teve explicações culturais diversas, os professores não têm informações sobre isso e muito menos material que trabalhe diferentes visões de mundo e treinamento que os capacite a abordá-las devidamente. Conseqüentemente o ensino de astronomia costuma ser tecnicista e dissociado do aspecto humano que alimenta o grande interesse e curiosidade que esses temas despertam. Aqui apresentamos propostas visando contribuir para reverter esse quadro e trabalhamos distintas visões de Universo: espontâneas, autóctones e científicas. Desenvolvemos práticas, materiais instrucionais e textos para viabilizar a adoção de um enfoque antropológico para o ensino de astronomia no nível médio, no qual as culturas humanística e científica sejam integradas de uma maneira contextualizada e eficaz para aquele ensino. Estas propostas foram aplicadas em um curso de treinamento para professores da rede pública de diferentes disciplinas. A receptividade dos professores à abordagem proposta e os resultados alcançados foram muito estimulantes. Destes, destacamos: produção de roteiros de atividades; desenvolvimento de práticas didático-pedagógicas específicas (e.g., encenação de mitos; dança primordial guarani; "criação" de constelações e interpretações pluriculturais; etc.); e sugestões concretas para a efetiva realização de um ensino interdisciplinar contextualizado, onde questões cosmogônicas servem de mote para iniciar tal ensino. Discutimos estes resultados e como o enfoque adotado pode instrumentalizar os professores para leituras de mundo que incluem naturalmente aspectos culturais, sociais e históricos associados aos temas estudados. (PPGECNM/UFRN; PRONEX/FINEP; NUPA/USP; Temáticos/FAPESP)

  16. Difficulties of First Years Elementary School Teachers with the Teaching of Astronomy. (Breton Title: Dificuldades de Professores dos Anos Iniciais do Ensino Fundamental em Relação ao Ensino da Astronomia. ) Dificultades de LOS Profesores de los Primeros Años de la Escuela Primaria en Relación a la Enseñanza de la Astronomía

    NASA Astrophysics Data System (ADS)

    Langhi, Rodolfo; Nardi, Roberto

    2005-12-01

    This paper reports Primary School teachers' discourses analysis about their difficulties related to the teaching of Astronomy. It reports partial data of a master's level research carried out in the last two years, named "An exploratory study for inserting Astronomy in primary school teachers' education" (LANGHI, 2004). The study took into consideration students' and teachers' common sense conceptions about astronomical phenomena, conceptual mistakes in textbooks, and Astronomy's suggestions given by the PCN (Parâmetros Curriculares Nacionais - The Brazilian National Curriculum Standards). The paper aims to characterize teachers' difficulties, in order to provide subsides to the implementation of an initial or continuing education program. This study is justified by the fact that courses plans like these only will be adapted to the teacher's (and students') reality, if there is a primary investigation about what the teachers really need to know about Astronomy. This fact was possible here by the enunciations interpretation of a teachers' sample using semi-structured interviews, according to discourse analysis procedures. The research outcomes show difficulties related to factors like: those of personal order, methodological, on teacher's formation, educational infrastructure and other related to information sources for educators. Este artigo, que relata as dificuldades de professores em relação ao ensino da Astronomia, faz parte de um estudo exploratório para a inserção da Astronomia na formação de professores dos anos iniciais do Ensino Fundamental. Esse estudo leva em consideração as concepções alternativas de alunos e professores sobre fenômenos astronômicos, os erros conceituais em livros didáticos e as sugestões de conteúdos de Astronomia constantes nos PCN (Parâmetros Curriculares Nacionais). Caracterizar as dificuldades dos professores é a questão central deste texto, apontando para o objetivo de contribuir com subsídios para um futuro programa de formação continuada neste tema. O estudo se justifica mediante o fato de que planejamentos de cursos como estes só se adequarão à realidade do professor (e do aluno) se houver uma investigação antecipada sobre o que os docentes precisam saber e saber fazer a respeito da Astronomia, o que se concretizou em nosso caso pela interpretação dos discursos de uma amostra de professores coletados através de entrevistas semi-estruturadas, utilizando para interpretação os princípios e métodos da análise do discurso em sua linha francesa. Os resultados da pesquisa indicaram dificuldades de ordem pessoal, metodológica, de formação, de infra-estrutura e outras relacionadas às fontes de informações para docentes. Este artículo que relata las dificultades de los profesores en relación a la enseñanza da laAstronomía es parte de un estudio preliminar para la implantación dela Astrnomía enla formación de profesores de ls primeros años del ciclo primario.El estudio considera las concepciones alternativas de alumnos y profesores respecto a los fenómenos astronómicos, los errores conceptuales en los libros didácticos y las sugerencias de contenidos de Astronomía que constan en los Parámetros Curriculares Nacionales del Brasil. Caracterizar las dificultades de los profesores constituye la cuestión central de este texto, apuntando para el objetivo de contribuir para un futuro programa de educación contínua en este tema. El estudio se justifica mediante el hecho que la planificación de cursos de este tipo solo se adecuarán a la realidade del profesor (y del alumno) si existe una investigación anterior a respecto de lo que los docentes precisan saber y saber realizar en Astronomía, lo cual se concretó en nuestro caso por medio de la interpretación de los discursos de una muestra de profesores obtenidos através de entrevistas semiestructuradas, utilizand para esta interpretación los principios y métodos de análisis del discurso en su línea francesa. Los resultados mostraron dificultades de orden personal, metodológica, formativa, de infraestructura y otras relacionadas a las fuentes de información para los docentes.

  17. Trends in corrected lung cancer mortality rates in Brazil and regions.

    PubMed

    Malta, Deborah Carvalho; Abreu, Daisy Maria Xavier de; Moura, Lenildo de; Lana, Gustavo C; Azevedo, Gulnar; França, Elisabeth

    2016-06-27

    To describe the trend in cancer mortality rates in Brazil and regions before and after correction for underreporting of deaths and redistribution of ill-defined and nonspecific causes. The study used data of deaths from lung cancer among the population aged from 30 to 69 years, notified to the Mortality Information System between 1996 and 2011, corrected for underreporting of deaths, non-registered sex and age , and causes with ill-defined or garbage codes according to sex, age, and region. Standardized rates were calculated by age for raw and corrected data. An analysis of time trend in lung cancer mortality was carried out using the regression model with autoregressive errors. Lung cancer in Brazil presented higher rates among men compared to women, and the South region showed the highest death risk in 1996 and 2011. Mortality showed a trend of reduction for males and increase for women. Lung cancer in Brazil presented different distribution patterns according to sex, with higher rates among men and a reduction in the mortality trend for men and increase for women. Descrever a tendência da mortalidade por câncer de pulmão no Brasil e regiões, antes e após as correções por sub-registro de óbitos, redistribuição de causas mal definidas e causas inespecíficas. Foram utilizados dados de óbitos por câncer de pulmão da população de 30 a 69 anos, notificados ao Sistema de Informação sobre Mortalidade, entre 1996 e 2011, corrigidos para sub-registro de óbitos, declaração de sexo e idade ignorados e causas com códigos mal definidos e inespecíficos segundo sexo, idade e região. Foram calculadas taxas padronizadas por idade para dados brutos e corrigidos. Realizou-se análise da tendência temporal da mortalidade por câncer de pulmão por meio do modelo de regressão com erros autorregressivos. O câncer de pulmão no Brasil apresentou taxas mais elevadas em homens que em mulheres e a região Sul foi a que apresentou maior risco de morte em 1996 e 2011. A mortalidade tendeu a reduzir para o sexo masculino e a aumentar para o sexo feminino. O câncer de pulmão no Brasil apresenta padrão de distribuição diferente segundo sexo, com taxas mais elevadas em homens e com redução da tendência de mortalidade para o sexo masculino e aumento das taxas para o sexo feminino.

  18. Graphics processing unit (GPU) real-time infrared scene generation

    NASA Astrophysics Data System (ADS)

    Christie, Chad L.; Gouthas, Efthimios (Themie); Williams, Owen M.

    2007-04-01

    VIRSuite, the GPU-based suite of software tools developed at DSTO for real-time infrared scene generation, is described. The tools include the painting of scene objects with radiometrically-associated colours, translucent object generation, polar plot validation and versatile scene generation. Special features include radiometric scaling within the GPU and the presence of zoom anti-aliasing at the core of VIRSuite. Extension of the zoom anti-aliasing construct to cover target embedding and the treatment of translucent objects is described.

  19. Cardiovascular Risk Stratification and Statin Eligibility Based on the Brazilian vs. North American Guidelines on Blood Cholesterol Management.

    PubMed

    Cesena, Fernando Henpin Yue; Laurinavicius, Antonio Gabriele; Valente, Viviane A; Conceição, Raquel D; Santos, Raul D; Bittencourt, Marcio S

    2017-06-01

    The best way to select individuals for lipid-lowering treatment in the population is controversial. In healthy individuals in primary prevention: to assess the relationship between cardiovascular risk categorized according to the V Brazilian Guideline on Dyslipidemia and the risk calculated by the pooled cohort equations (PCE); to compare the proportion of individuals eligible for statins, according to different criteria. In individuals aged 40-75 years consecutively submitted to routine health assessment at one single center, four criteria of eligibility for statin were defined: BR-1, BR-2 (LDL-c above or at least 30 mg/dL above the goal recommended by the Brazilian Guideline, respectively), USA-1 and USA-2 (10-year risk estimated by the PCE ≥ 5.0% or ≥ 7.5%, respectively). The final sample consisted of 13,947 individuals (48 ± 6 years, 71% men). Most individuals at intermediate or high risk based on the V Brazilian Guideline had a low risk calculated by the PCE, and more than 70% of those who were considered at high risk had this categorization because of the presence of aggravating factors. Among women, 24%, 17%, 4% and 2% were eligible for statin use according to the BR-1, BR-2, USA-1 and USA-2 criteria, respectively (p < 0.01). The respective figures for men were 75%, 58%, 31% and 17% (p < 0.01). Eighty-five percent of women and 60% of men who were eligible for statin based on the BR-1 criterion would not be candidates for statin based on the USA-1 criterion. As compared to the North American Guideline, the V Brazilian Guideline considers a substantially higher proportion of the population as eligible for statin use in primary prevention. This results from discrepancies between the risk stratified by the Brazilian Guideline and that calculated by the PCE, particularly because of the risk reclassification based on aggravating factors. Existe controvérsia sobre a melhor forma de selecionar indivíduos para tratamento hipolipemiante na população. Em indivíduos saudáveis em prevenção primária: avaliar a relação entre o risco cardiovascular segundo a V Diretriz Brasileira de Dislipidemias e o risco calculado pelas pooled cohort equations (PCE); comparar a proporção de indivíduos elegíveis para estatinas, de acordo com diferentes critérios. Em indivíduos de 40 a 75 anos submetidos consecutivamente a avaliação rotineira de saúde em um único centro, quatro critérios de elegibilidade para estatina foram definidos: BR-1, BR-2 (LDL-c acima ou pelo menos 30 mg/dL acima da meta preconizada pela diretriz brasileira, respectivamente), EUA-1 e EUA-2 (risco estimado pelas PCE em 10 anos ≥ 5,0% ou ≥ 7,5%, respectivamente). Foram estudados 13.947 indivíduos (48 ± 6 anos, 71% homens). A maioria dos indivíduos de risco intermediário ou alto pela V Diretriz apresentou risco calculado pelas PCE baixo e mais de 70% daqueles considerados de alto risco o foram devido à presença de fator agravante. Foram elegíveis para estatina 24%, 17%, 4% e 2% das mulheres pelos critérios BR-1, BR-2, EUA-1 e EUA-2, respectivamente (p < 0,01). Os respectivos valores para os homens foram 75%, 58%, 31% e 17% (p < 0,01). Oitenta e cinco por cento das mulheres e 60% dos homens elegíveis para estatina pelo critério BR-1 não seriam candidatos pelo critério EUA-1. Comparada à diretriz norte-americana, a V Diretriz Brasileira considera uma proporção substancialmente maior da população como elegível para estatina em prevenção primária. Isso se relaciona com discrepâncias entre o risco estratificado pela diretriz brasileira e o calculado pelas PCE, particularmente devido à reclassificação de risco baseada em fatores agravantes.

  20. High performance MRI simulations of motion on multi-GPU systems

    PubMed Central

    2014-01-01

    Background MRI physics simulators have been developed in the past for optimizing imaging protocols and for training purposes. However, these simulators have only addressed motion within a limited scope. The purpose of this study was the incorporation of realistic motion, such as cardiac motion, respiratory motion and flow, within MRI simulations in a high performance multi-GPU environment. Methods Three different motion models were introduced in the Magnetic Resonance Imaging SIMULator (MRISIMUL) of this study: cardiac motion, respiratory motion and flow. Simulation of a simple Gradient Echo pulse sequence and a CINE pulse sequence on the corresponding anatomical model was performed. Myocardial tagging was also investigated. In pulse sequence design, software crushers were introduced to accommodate the long execution times in order to avoid spurious echoes formation. The displacement of the anatomical model isochromats was calculated within the Graphics Processing Unit (GPU) kernel for every timestep of the pulse sequence. Experiments that would allow simulation of custom anatomical and motion models were also performed. Last, simulations of motion with MRISIMUL on single-node and multi-node multi-GPU systems were examined. Results Gradient Echo and CINE images of the three motion models were produced and motion-related artifacts were demonstrated. The temporal evolution of the contractility of the heart was presented through the application of myocardial tagging. Better simulation performance and image quality were presented through the introduction of software crushers without the need to further increase the computational load and GPU resources. Last, MRISIMUL demonstrated an almost linear scalable performance with the increasing number of available GPU cards, in both single-node and multi-node multi-GPU computer systems. Conclusions MRISIMUL is the first MR physics simulator to have implemented motion with a 3D large computational load on a single computer multi-GPU configuration. The incorporation of realistic motion models, such as cardiac motion, respiratory motion and flow may benefit the design and optimization of existing or new MR pulse sequences, protocols and algorithms, which examine motion related MR applications. PMID:24996972

  1. Uso de modelos mecânicos em curso informal de astronomia para deficientes visuais. Resgate de uma experiência

    NASA Astrophysics Data System (ADS)

    Tavares, E. T., Jr.; Klafke, J. C.

    2003-08-01

    O presente trabalho propõe-se a resgatar uma experiência que teve lugar no Planetário de São Paulo nos anos 60. Em 1962, o Sr. Acácio, então com 37 anos, deficiente visual desde os 27, passou a assistir às aulas ministradas pelo Prof. Aristóteles Orsini aos integrantes do corpo de servidores do Planetário. O Sr. Acácio era o único deficiente da turma e, embora possuísse conhecimentos básicos e relativamente avançados de matemática, enfrentava dificuldades na compreensão e acompanhamento da exposição, como também em estudos posteriores. Com o intuito de auxiliá-lo na superação desses problemas, o Prof. Orsini solicitou a construção de modelos mecânicos que, através do sentido do tato, permitissem o acompanhamento das aulas e a transposição do modelo para o "constructo" mental. Essa prática mostrou-se tão eficaz que facilitou sobejamente o aprendizado da matéria pelo sujeito. O Sr. Acácio passou a integrar o corpo de professores do Planetário/Escola Municipal de Astrofísica, tendo ficado responsável pelo curso de "Introdução à Astronomia" por vários anos. Além disso, a experiência foi tão bem sucedida que alguns dos modelos tiveram seus elementos constitutivos pintados diferencialmente para serem utilizados em cursos regulares do Planetário, tornando-se parte integrante do conjunto de recursos didáticos da instituição. É pensando nessa eficácia, tanto em seu objetivo original permitir o aprendizado de um deficiente visual quanto no subsidiário recurso didático sistemático da instituição que decidimos resgatar essa experiência. Estribados nela, acreditamos ser extremamente produtivo, em termos educacionais, o aperfeiçoamento dos modelos originais, agora resgatados e restaurados, e a criação de outros que pudessem ser utilizados no ensino dessa ciência a deficientes visuais.

  2. Resultados de médio e longo prazo do tratamento endovenoso de varizes com laser de diodo em 1940 nm: análise crítica e considerações técnicas

    PubMed Central

    Viarengo, Luiz Marcelo Aiello; Viarengo, Gabriel; Martins, Aline Meira; Mancini, Marília Wechellian; Lopes, Luciana Almeida

    2017-01-01

    Resumo Contexto Desde a introdução do laser endovenoso para tratamento das varizes, há uma busca pelo comprimento de onda ideal, capaz de produzir o maior dano seletivo possível com maior segurança e menor incidência de efeitos adversos. Objetivos Avaliar os resultados de médio e longo prazo do laser de diodo de 1940 nm no tratamento de varizes, correlacionando os parâmetros utilizados com a durabilidade do desfecho anatômico. Métodos Revisão retrospectiva de pacientes diagnosticados com insuficiência venosa crônica em estágio clínico baseado em clínica, etiologia, anatomia e patofisiologia (CEAP) C2 a C6, submetidos ao tratamento termoablativo endovenoso de varizes tronculares, com laser com comprimento de onda em 1940 nm com fibra óptica de emissão radial, no período de abril de 2012 a julho de 2015. Uma revisão sistemática dos registros médicos eletrônicos foi realizada para obter dados demográficos e dados clínicos, incluindo dados de ultrassom dúplex, durante o período de seguimento pós-operatório. Resultados A média de idade dos pacientes foi de 53,3 anos; 37 eram mulheres (90,2%). O tempo médio de seguimento foi de 803 dias. O calibre médio das veias tratadas foi de 7,8 mm. A taxa de sucesso imediato foi de 100%, com densidade de energia endovenosa linear (linear endovenous energy density, LEED) média de 45,3 J/cm. A taxa de sucesso tardio foi de 95,1%, com duas recanalizações por volta de 12 meses pós-ablação. Não houve nenhuma recanalização nas veias tratadas com LEED superior a 30 J/cm. Conclusões O laser 1940 nm mostrou-se seguro e efetivo, em médio e longo prazo, para os parâmetros propostos, em segmentos venosos com até 10 mm de diâmetro. PMID:29930619

  3. A survey of techniques for architecting and managing GPU register file

    DOE PAGES

    Mittal, Sparsh

    2016-04-07

    To support their massively-multithreaded architecture, GPUs use very large register file (RF) which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs use tiny RF and much larger caches to optimize latency. Due to these differences, along with the crucial impact of RF in determining GPU performance, novel and intelligent techniques are required for managing GPU RF. In this paper, we survey the techniques for designing and managing GPU RF. We discuss techniques related to performance, energy and reliability aspects of RF. To emphasize the similarities and differences between the techniques, we classify themmore » along several parameters. Lastly, the aim of this paper is to synthesize the state-of-art developments in RF management and also stimulate further research in this area.« less

  4. A GPU-Based Implementation of the Firefly Algorithm for Variable Selection in Multivariate Calibration Problems

    PubMed Central

    de Paula, Lauro C. M.; Soares, Anderson S.; de Lima, Telma W.; Delbem, Alexandre C. B.; Coelho, Clarimar J.; Filho, Arlindo R. G.

    2014-01-01

    Several variable selection algorithms in multivariate calibration can be accelerated using Graphics Processing Units (GPU). Among these algorithms, the Firefly Algorithm (FA) is a recent proposed metaheuristic that may be used for variable selection. This paper presents a GPU-based FA (FA-MLR) with multiobjective formulation for variable selection in multivariate calibration problems and compares it with some traditional sequential algorithms in the literature. The advantage of the proposed implementation is demonstrated in an example involving a relatively large number of variables. The results showed that the FA-MLR, in comparison with the traditional algorithms is a more suitable choice and a relevant contribution for the variable selection problem. Additionally, the results also demonstrated that the FA-MLR performed in a GPU can be five times faster than its sequential implementation. PMID:25493625

  5. A GPU-Based Implementation of the Firefly Algorithm for Variable Selection in Multivariate Calibration Problems.

    PubMed

    de Paula, Lauro C M; Soares, Anderson S; de Lima, Telma W; Delbem, Alexandre C B; Coelho, Clarimar J; Filho, Arlindo R G

    2014-01-01

    Several variable selection algorithms in multivariate calibration can be accelerated using Graphics Processing Units (GPU). Among these algorithms, the Firefly Algorithm (FA) is a recent proposed metaheuristic that may be used for variable selection. This paper presents a GPU-based FA (FA-MLR) with multiobjective formulation for variable selection in multivariate calibration problems and compares it with some traditional sequential algorithms in the literature. The advantage of the proposed implementation is demonstrated in an example involving a relatively large number of variables. The results showed that the FA-MLR, in comparison with the traditional algorithms is a more suitable choice and a relevant contribution for the variable selection problem. Additionally, the results also demonstrated that the FA-MLR performed in a GPU can be five times faster than its sequential implementation.

  6. Accelerating electron tomography reconstruction algorithm ICON with GPU.

    PubMed

    Chen, Yu; Wang, Zihao; Zhang, Jingrong; Li, Lun; Wan, Xiaohua; Sun, Fei; Zhang, Fa

    2017-01-01

    Electron tomography (ET) plays an important role in studying in situ cell ultrastructure in three-dimensional space. Due to limited tilt angles, ET reconstruction always suffers from the "missing wedge" problem. With a validation procedure, iterative compressed-sensing optimized NUFFT reconstruction (ICON) demonstrates its power in the restoration of validated missing information for low SNR biological ET dataset. However, the huge computational demand has become a major problem for the application of ICON. In this work, we analyzed the framework of ICON and classified the operations of major steps of ICON reconstruction into three types. Accordingly, we designed parallel strategies and implemented them on graphics processing units (GPU) to generate a parallel program ICON-GPU. With high accuracy, ICON-GPU has a great acceleration compared to its CPU version, up to 83.7×, greatly relieving ICON's dependence on computing resource.

  7. Protein-protein docking on hardware accelerators: comparison of GPU and MIC architectures

    PubMed Central

    2015-01-01

    Background The hardware accelerators will provide solutions to computationally complex problems in bioinformatics fields. However, the effect of acceleration depends on the nature of the application, thus selection of an appropriate accelerator requires some consideration. Results In the present study, we compared the effects of acceleration using graphics processing unit (GPU) and many integrated core (MIC) on the speed of fast Fourier transform (FFT)-based protein-protein docking calculation. The GPU implementation performed the protein-protein docking calculations approximately five times faster than the MIC offload mode implementation. The MIC native mode implementation has the advantage in the implementation costs. However, the performance was worse with larger protein pairs because of memory limitations. Conclusion The results suggest that GPU is more suitable than MIC for accelerating FFT-based protein-protein docking applications. PMID:25707855

  8. Rapid earthquake detection through GPU-Based template matching

    NASA Astrophysics Data System (ADS)

    Mu, Dawei; Lee, En-Jui; Chen, Po

    2017-12-01

    The template-matching algorithm (TMA) has been widely adopted for improving the reliability of earthquake detection. The TMA is based on calculating the normalized cross-correlation coefficient (NCC) between a collection of selected template waveforms and the continuous waveform recordings of seismic instruments. In realistic applications, the computational cost of the TMA is much higher than that of traditional techniques. In this study, we provide an analysis of the TMA and show how the GPU architecture provides an almost ideal environment for accelerating the TMA and NCC-based pattern recognition algorithms in general. So far, our best-performing GPU code has achieved a speedup factor of more than 800 with respect to a common sequential CPU code. We demonstrate the performance of our GPU code using seismic waveform recordings from the ML 6.6 Meinong earthquake sequence in Taiwan.

  9. A survey of techniques for architecting and managing GPU register file

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh

    To support their massively-multithreaded architecture, GPUs use very large register file (RF) which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs use tiny RF and much larger caches to optimize latency. Due to these differences, along with the crucial impact of RF in determining GPU performance, novel and intelligent techniques are required for managing GPU RF. In this paper, we survey the techniques for designing and managing GPU RF. We discuss techniques related to performance, energy and reliability aspects of RF. To emphasize the similarities and differences between the techniques, we classify themmore » along several parameters. Lastly, the aim of this paper is to synthesize the state-of-art developments in RF management and also stimulate further research in this area.« less

  10. A fully parallel in time and space algorithm for simulating the electrical activity of a neural tissue.

    PubMed

    Bedez, Mathieu; Belhachmi, Zakaria; Haeberlé, Olivier; Greget, Renaud; Moussaoui, Saliha; Bouteiller, Jean-Marie; Bischoff, Serge

    2016-01-15

    The resolution of a model describing the electrical activity of neural tissue and its propagation within this tissue is highly consuming in term of computing time and requires strong computing power to achieve good results. In this study, we present a method to solve a model describing the electrical propagation in neuronal tissue, using parareal algorithm, coupling with parallelization space using CUDA in graphical processing unit (GPU). We applied the method of resolution to different dimensions of the geometry of our model (1-D, 2-D and 3-D). The GPU results are compared with simulations from a multi-core processor cluster, using message-passing interface (MPI), where the spatial scale was parallelized in order to reach a comparable calculation time than that of the presented method using GPU. A gain of a factor 100 in term of computational time between sequential results and those obtained using the GPU has been obtained, in the case of 3-D geometry. Given the structure of the GPU, this factor increases according to the fineness of the geometry used in the computation. To the best of our knowledge, it is the first time such a method is used, even in the case of neuroscience. Parallelization time coupled with GPU parallelization space allows for drastically reducing computational time with a fine resolution of the model describing the propagation of the electrical signal in a neuronal tissue. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Risk management in providing specialized care for people living with AIDS.

    PubMed

    Leadebal, Oriana Deyze Correia Paiva; Medeiros, Leidyanny Barbosa de; Morais, Kalline Silva de; Nascimento, João Agnaldo do; Monroe, Aline Aparecida; Nogueira, Jordana de Almeida

    2016-01-01

    Analyzing the provision of actions related to managing clinical risk in managing specialized care for people living with AIDS. A cross-sectional study carried out in a reference outpatient clinic in Paraíba, with a sample of 150 adults with AIDS. Data were collected through primary and secondary sources using a structured questionnaire, analyzed using descriptive statistics, multiple correspondence analysis and logistic regression model to determine the association between "providing care" and "clinical risk." Actions with satisfactory provision express a biological care focus; the dimensions that most contributed to a satisfactory assessment of care provision were "clinical and laboratory evaluations" and "prevention and self-care incentivization"; 45.3% of participants were categorized into high clinical risk, 34% into average clinical risk, and 20.7% into low clinical risk; a positive association between providing care and clinical risk was found. The need to use risk classification technologies to direct the planning of local care provision became evident considering its requirements, and thus qualifying the care provided in these areas. Analisar a oferta de ações relacionadas ao manejo de risco clínico na gestão do cuidado especializado a pessoas vivendo com aids. Estudo transversal realizado em ambulatório de referência na Paraíba, com amostra de 150 adultos com aids. Os dados foram coletados por meio de fontes primárias e secundárias utilizando-se de formulário estruturado, e analisados através de estatística descritiva, análise de correspondência múltipla e modelo de regressão logística para averiguar a associação entre "oferta" e "risco clínico". As ações de oferta satisfatória expressam foco biologicista do cuidado; as dimensões que mais contribuíram para o julgamento satisfatório da oferta foram "avaliação clínica e laboratorial" e "prevenção e estímulo ao autocuidado"; 45,3% dos participantes foram categorizados em risco clínico alto, 34% em risco clínico médio, e 20,7% em risco clínico baixo; e verificou-se associação positiva entre oferta e risco clínico. Ficou evidente a necessidade da utilização de tecnologias de classificação de risco para direcionar o planejamento da oferta local, considerando-se as necessidades, e assim qualificar o cuidado produzido nestes espaços.

  12. The culture of patient safety from the perspective of the pediatric emergency nursing team.

    PubMed

    Macedo, Taise Rocha; Rocha, Patricia Kuerten; Tomazoni, Andreia; Souza, Sabrina de; Anders, Jane Cristina; Davis, Karri

    2016-01-01

    To identify the patient safety culture in pediatric emergencies from the perspective of the nursing team. A quantitative, cross-sectional survey research study with a sample composed of 75 professionals of the nursing team. Data was collected between September and November 2014 in three Pediatric Emergency units by applying the Hospital Survey on Patient Safety Culture instrument. Data were submitted to descriptive analysis. Strong areas for patient safety were not found, with areas identified having potential being: Expectations and actions from supervisors/management to promote patient safety and teamwork. Areas identified as critical were: Non-punitive response to error and support from hospital management for patient safety. The study found a gap between the safety culture and pediatric emergencies, but it found possibilities of transformation that will contribute to the safety of pediatric patients. Nursing professionals need to become protagonists in the process of replacing the current paradigm for a culture focused on safety. The replication of this study in other institutions is suggested in order to improve the current health care scenario. Identificar a cultura de segurança do paciente em emergências pediátricas, na perspectiva da equipe de enfermagem. Pesquisa quantitativa, tipo survey transversal. Amostra composta por 75 profissionais da equipe de enfermagem. Dados coletados entre setembro e novembro de 2014, em três Emergências Pediátricas, aplicando o instrumento Hospital Survey on Patient Safety Culture. Dados submetidos à análise descritiva. Não foram encontradas áreas de força para a segurança do paciente, sendo identificadas áreas com potencial de assim se tornarem: Expectativas e ações do supervisor/chefia para promoção da segurança do paciente e Trabalho em equipe. Como área crítica identificaram-se: Resposta não punitiva ao erro e Apoio da gestão hospitalar para segurança do paciente. O estudo apontou distanciamento entre a cultura de segurança e as emergências pediátricas, porém vislumbrou possibilidades de transformação, que contribuirão para segurança do paciente pediátrico. Os profissionais de enfermagem precisam se tornar protagonistas no processo de substituição do atual paradigma para uma cultura focada na segurança. Sugere-se replicação deste estudo em outras instituições a fim de aprimorar o atual cenário de assistência à saúde.

  13. Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications

    NASA Astrophysics Data System (ADS)

    Francés, J.; Otero, B.; Bleda, S.; Gallego, S.; Neipp, C.; Márquez, A.; Beléndez, A.

    2015-06-01

    The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of the bi-dimensional scheme of the FDTD method using multi-CPU and multi-GPU, respectively. In the first implementation, an open source message passing interface (OMPI) has been included in order to massively exploit the resources of a biprocessor station with two Intel Xeon processors. Moreover, regarding CPU code version, the streaming SIMD extensions (SSE) and also the advanced vectorial extensions (AVX) have been included with shared memory approaches that take advantage of the multi-core platforms. On the other hand, the second implementation called the multi-GPU code version is based on Peer-to-Peer communications available in CUDA on two GPUs (NVIDIA GTX 670). Subsequently, this paper presents an accurate analysis of the influence of the different code versions including shared memory approaches, vector instructions and multi-processors (both CPU and GPU) and compares them in order to delimit the degree of improvement of using distributed solutions based on multi-CPU and multi-GPU. The performance of both approaches was analysed and it has been demonstrated that the addition of shared memory schemes to CPU computing improves substantially the performance of vector instructions enlarging the simulation sizes that use efficiently the cache memory of CPUs. In this case GPU computing is slightly twice times faster than the fine tuned CPU version in both cases one and two nodes. However, for massively computations explicit vector instructions do not worth it since the memory bandwidth is the limiting factor and the performance tends to be the same than the sequential version with auto-vectorisation and also shared memory approach. In this scenario GPU computing is the best option since it provides a homogeneous behaviour. More specifically, the speedup of GPU computing achieves an upper limit of 12 for both one and two GPUs, whereas the performance reaches peak values of 80 GFlops and 146 GFlops for the performance for one GPU and two GPUs respectively. Finally, the method is applied to an earth crust profile in order to demonstrate the potential of our approach and the necessity of applying acceleration strategies in these type of applications.

  14. Perda de massa em ventos empoeirados de estrelas supergigantes

    NASA Astrophysics Data System (ADS)

    Vidotto, A. A.; Jatenco-Pereira, V.

    2003-08-01

    Em praticamente todas as regiões do diagrama HR, as estrelas apresentam evidências observacionais de perda de massa. Na literatura, pode-se encontrar trabalhos que tratam tanto do diagnóstico da perda de massa como da construção de modelos que visam explicá-la. O amortecimento de ondas Alfvén tem sido utilizado como mecanismo de aceleração de ventos homogêneos. Entretanto, sabe-se que os envelopes de estrelas frias contêm grãos sólidos e moléculas. Com o intuito de estudar a interação entre as ondas Alfvén e a poeira e a sua conseqüência na aceleração do vento estelar, Falceta-Gonçalves & Jatenco-Pereira (2002) desenvolveram um modelo de perda de massa para estrelas supergigantes. Neste trabalho, apresentamos um estudo do modelo acima proposto para avaliar a dependência da taxa de perda de massa com alguns parâmetros iniciais como, por exemplo, a densidade r0, o campo magnético B0, o comprimento de amortecimento da onda L0, seu fluxo f0, entre outros. Sendo assim, aumentando f0 de 10% a partir de valores de referência, vimos que aumenta consideravelmente, enquanto que um aumento de mesmo valor em r0, B0 e L0 acarreta uma diminuição em .

  15. BSSDATA - um programa otimizado para filtragem de dados em radioastronomia solar

    NASA Astrophysics Data System (ADS)

    Martinon, A. R. F.; Sawant, H. S.; Fernandes, F. C. R.; Stephany, S.; Preto, A. J.; Dobrowolski, K. M.

    2003-08-01

    A partir de 1998, entrou em operação regular no INPE, em São José dos Campos, o Brazilian Solar Spectroscope (BSS). O BSS é dedicado às observações de explosões solares decimétricas com alta resolução temporal e espectral, com a principal finalidade de investigar fenômenos associados com a liberação de energia dos "flares" solares. Entre os anos de 1999 e 2002, foram catalogadas, aproximadamente 340 explosões solares classificadas em 8 tipos distintos, de acordo com suas características morfológicas. Na análise detalhada de cada tipo, ou grupo, de explosões solares deve-se considerar a variação do fluxo do sol calmo ("background"), em função da freqüência e a variação temporal, além da complexidade das explosões e estruturas finas registradas superpostas ao fundo variável. Com o intuito de realizar tal análise foi desenvolvido o programa BSSData. Este programa, desenvolvido em linguagem C++, é constituído de várias ferramentas que auxiliam no tratamento e análise dos dados registrados pelo BSS. Neste trabalho iremos abordar as ferramentas referentes à filtragem do ruído de fundo. As rotinas do BSSData para filtragem de ruído foram testadas nos diversos grupos de explosões solares ("dots", "fibra", "lace", "patch", "spikes", "tipo III" e "zebra") alcançando um bom resultado na diminuição do ruído de fundo e obtendo, em conseqüência, dados onde o sinal torna-se mais homogêneo ressaltando as áreas onde existem explosões solares e tornando mais precisas as determinações dos parâmetros observacionais de cada explosão. Estes resultados serão apresentados e discutidos.

  16. Melhoramentos no código Wilson-Devinney para binárias eclipsantes

    NASA Astrophysics Data System (ADS)

    Vieira, L. A.; Vaz, L. P. R.

    2003-08-01

    A análise de curvas de luz e velocidades radiais de sistemas binários eclipsantes pode ser feita por meio de vários modelos. Um desses é o Modelo Wilson-Devinney (WD). Ao longo dos anos, esse modelo sofreu várias alterações em seus códigos principais, com a finalidade de torná-lo mais consistente tanto fíisica como numericamente. O Modelo WD tem sido melhorado de várias maneiras em seus dois códigos: um para a predição das curvas de luz teórica e de velocidade radiais e outra para as soluções destas curvas. Teoricamente, na física do modelo, nós introduzimos a possibilidade de levar em conta os efeitos do movimento apsidal. Numericamente, nós introduzimos a possibilidade de usar o Método SIMPLEX no procedimento da solução, como uma alternativa para o já implementado Método de Mínimos Quadrados (Least Squares Method). Estas modificações, juntamente com outras já introduzidas pelo nosso grupo anteriormente, tornam o código mais eficiente na solução das curvas de luz e de velocidade radiais de binárias eclipsantes. Como o modelo tem sido usado para analisar sistemas com componentes pré-sequência principal (TY CrA, Casey et al. 1998, Vaz et al. 1998), SM 790, Stassun et al. 2003), este melhoramento beneficiará estes casos também. Apresentamos os resultados obtidos com a modificação do código WD por meio do uso de dados da estrela GL Carinae, comprovando, (1) que os parâmetros orbitais calculados por nós são coerentes com os obtidos anteriormente na literatura (Giménez & Clausen, 1986) e com os obtidos por Faria (1987), e (2) que a implementação do Método SIMPLEX torna o código mais lento mas completamente consistente internamente e evita os problemas gerados pelo uso do Método de Mínimos Quadrados, tais como imprecisão no cálculo das derivadas parciais e convergência para mínimos locais.

  17. Supermassive Black Hole Binaries in High Performance Massively Parallel Direct N-body Simulations on Large GPU Clusters

    NASA Astrophysics Data System (ADS)

    Spurzem, R.; Berczik, P.; Zhong, S.; Nitadori, K.; Hamada, T.; Berentzen, I.; Veles, A.

    2012-07-01

    Astrophysical Computer Simulations of Dense Star Clusters in Galactic Nuclei with Supermassive Black Holes are presented using new cost-efficient supercomputers in China accelerated by graphical processing cards (GPU). We use large high-accuracy direct N-body simulations with Hermite scheme and block-time steps, parallelised across a large number of nodes on the large scale and across many GPU thread processors on each node on the small scale. A sustained performance of more than 350 Tflop/s for a science run on using simultaneously 1600 Fermi C2050 GPUs is reached; a detailed performance model is presented and studies for the largest GPU clusters in China with up to Petaflop/s performance and 7000 Fermi GPU cards. In our case study we look at two supermassive black holes with equal and unequal masses embedded in a dense stellar cluster in a galactic nucleus. The hardening processes due to interactions between black holes and stars, effects of rotation in the stellar system and relativistic forces between the black holes are simultaneously taken into account. The simulation stops at the complete relativistic merger of the black holes.

  18. Exploiting graphics processing units for computational biology and bioinformatics.

    PubMed

    Payne, Joshua L; Sinnott-Armstrong, Nicholas A; Moore, Jason H

    2010-09-01

    Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units (GPUs) that possess more memory bandwidth and computational capability than central processing units (CPUs), the standard workhorses of scientific computing. With the recent release of generalpurpose GPUs and NVIDIA's GPU programming language, CUDA, graphics engines are being adopted widely in scientific computing applications, particularly in the fields of computational biology and bioinformatics. The goal of this article is to concisely present an introduction to GPU hardware and programming, aimed at the computational biologist or bioinformaticist. To this end, we discuss the primary differences between GPU and CPU architecture, introduce the basics of the CUDA programming language, and discuss important CUDA programming practices, such as the proper use of coalesced reads, data types, and memory hierarchies. We highlight each of these topics in the context of computing the all-pairs distance between instances in a dataset, a common procedure in numerous disciplines of scientific computing. We conclude with a runtime analysis of the GPU and CPU implementations of the all-pairs distance calculation. We show our final GPU implementation to outperform the CPU implementation by a factor of 1700.

  19. GPU-based stochastic-gradient optimization for non-rigid medical image registration in time-critical applications

    NASA Astrophysics Data System (ADS)

    Bhosale, Parag; Staring, Marius; Al-Ars, Zaid; Berendsen, Floris F.

    2018-03-01

    Currently, non-rigid image registration algorithms are too computationally intensive to use in time-critical applications. Existing implementations that focus on speed typically address this by either parallelization on GPU-hardware, or by introducing methodically novel techniques into CPU-oriented algorithms. Stochastic gradient descent (SGD) optimization and variations thereof have proven to drastically reduce the computational burden for CPU-based image registration, but have not been successfully applied in GPU hardware due to its stochastic nature. This paper proposes 1) NiftyRegSGD, a SGD optimization for the GPU-based image registration tool NiftyReg, 2) random chunk sampler, a new random sampling strategy that better utilizes the memory bandwidth of GPU hardware. Experiments have been performed on 3D lung CT data of 19 patients, which compared NiftyRegSGD (with and without random chunk sampler) with CPU-based elastix Fast Adaptive SGD (FASGD) and NiftyReg. The registration runtime was 21.5s, 4.4s and 2.8s for elastix-FASGD, NiftyRegSGD without, and NiftyRegSGD with random chunk sampling, respectively, while similar accuracy was obtained. Our method is publicly available at https://github.com/SuperElastix/NiftyRegSGD.

  20. Implementation of GPU accelerated SPECT reconstruction with Monte Carlo-based scatter correction.

    PubMed

    Bexelius, Tobias; Sohlberg, Antti

    2018-06-01

    Statistical SPECT reconstruction can be very time-consuming especially when compensations for collimator and detector response, attenuation, and scatter are included in the reconstruction. This work proposes an accelerated SPECT reconstruction algorithm based on graphics processing unit (GPU) processing. Ordered subset expectation maximization (OSEM) algorithm with CT-based attenuation modelling, depth-dependent Gaussian convolution-based collimator-detector response modelling, and Monte Carlo-based scatter compensation was implemented using OpenCL. The OpenCL implementation was compared against the existing multi-threaded OSEM implementation running on a central processing unit (CPU) in terms of scatter-to-primary ratios, standardized uptake values (SUVs), and processing speed using mathematical phantoms and clinical multi-bed bone SPECT/CT studies. The difference in scatter-to-primary ratios, visual appearance, and SUVs between GPU and CPU implementations was minor. On the other hand, at its best, the GPU implementation was noticed to be 24 times faster than the multi-threaded CPU version on a normal 128 × 128 matrix size 3 bed bone SPECT/CT data set when compensations for collimator and detector response, attenuation, and scatter were included. GPU SPECT reconstructions show great promise as an every day clinical reconstruction tool.

  1. Accelerated event-by-event Monte Carlo microdosimetric calculations of electrons and protons tracks on a multi-core CPU and a CUDA-enabled GPU.

    PubMed

    Kalantzis, Georgios; Tachibana, Hidenobu

    2014-01-01

    For microdosimetric calculations event-by-event Monte Carlo (MC) methods are considered the most accurate. The main shortcoming of those methods is the extensive requirement for computational time. In this work we present an event-by-event MC code of low projectile energy electron and proton tracks for accelerated microdosimetric MC simulations on a graphic processing unit (GPU). Additionally, a hybrid implementation scheme was realized by employing OpenMP and CUDA in such a way that both GPU and multi-core CPU were utilized simultaneously. The two implementation schemes have been tested and compared with the sequential single threaded MC code on the CPU. Performance comparison was established on the speed-up for a set of benchmarking cases of electron and proton tracks. A maximum speedup of 67.2 was achieved for the GPU-based MC code, while a further improvement of the speedup up to 20% was achieved for the hybrid approach. The results indicate the capability of our CPU-GPU implementation for accelerated MC microdosimetric calculations of both electron and proton tracks without loss of accuracy. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  2. A GPU-Accelerated 3-D Coupled Subsample Estimation Algorithm for Volumetric Breast Strain Elastography.

    PubMed

    Peng, Bo; Wang, Yuqi; Hall, Timothy J; Jiang, Jingfeng

    2017-04-01

    Our primary objective of this paper was to extend a previously published 2-D coupled subsample tracking algorithm for 3-D speckle tracking in the framework of ultrasound breast strain elastography. In order to overcome heavy computational cost, we investigated the use of a graphic processing unit (GPU) to accelerate the 3-D coupled subsample speckle tracking method. The performance of the proposed GPU implementation was tested using a tissue-mimicking phantom and in vivo breast ultrasound data. The performance of this 3-D subsample tracking algorithm was compared with the conventional 3-D quadratic subsample estimation algorithm. On the basis of these evaluations, we concluded that the GPU implementation of this 3-D subsample estimation algorithm can provide high-quality strain data (i.e., high correlation between the predeformation and the motion-compensated postdeformation radio frequency echo data and high contrast-to-noise ratio strain images), as compared with the conventional 3-D quadratic subsample algorithm. Using the GPU implementation of the 3-D speckle tracking algorithm, volumetric strain data can be achieved relatively fast (approximately 20 s per volume [2.5 cm ×2.5 cm ×2.5 cm]).

  3. PuReMD-GPU: A reactive molecular dynamics simulation package for GPUs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kylasa, S.B., E-mail: skylasa@purdue.edu; Aktulga, H.M., E-mail: hmaktulga@lbl.gov; Grama, A.Y., E-mail: ayg@cs.purdue.edu

    2014-09-01

    We present an efficient and highly accurate GP-GPU implementation of our community code, PuReMD, for reactive molecular dynamics simulations using the ReaxFF force field. PuReMD and its incorporation into LAMMPS (Reax/C) is used by a large number of research groups worldwide for simulating diverse systems ranging from biomembranes to explosives (RDX) at atomistic level of detail. The sub-femtosecond time-steps associated with ReaxFF strongly motivate significant improvements to per-timestep simulation time through effective use of GPUs. This paper presents, in detail, the design and implementation of PuReMD-GPU, which enables ReaxFF simulations on GPUs, as well as various performance optimization techniques wemore » developed to obtain high performance on state-of-the-art hardware. Comprehensive experiments on model systems (bulk water and amorphous silica) are presented to quantify the performance improvements achieved by PuReMD-GPU and to verify its accuracy. In particular, our experiments show up to 16× improvement in runtime compared to our highly optimized CPU-only single-core ReaxFF implementation. PuReMD-GPU is a unique production code, and is currently available on request from the authors.« less

  4. Streaming parallel GPU acceleration of large-scale filter-based spiking neural networks.

    PubMed

    Slażyński, Leszek; Bohte, Sander

    2012-01-01

    The arrival of graphics processing (GPU) cards suitable for massively parallel computing promises affordable large-scale neural network simulation previously only available at supercomputing facilities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of magnitude, the challenge is to develop fine-grained parallel algorithms to fully exploit the particulars of GPUs. Computation in a neural network is inherently parallel and thus a natural match for GPU architectures: given inputs, the internal state for each neuron can be updated in parallel. We show that for filter-based spiking neurons, like the Spike Response Model, the additive nature of membrane potential dynamics enables additional update parallelism. This also reduces the accumulation of numerical errors when using single precision computation, the native precision of GPUs. We further show that optimizing simulation algorithms and data structures to the GPU's architecture has a large pay-off: for example, matching iterative neural updating to the memory architecture of the GPU speeds up this simulation step by a factor of three to five. With such optimizations, we can simulate in better-than-realtime plausible spiking neural networks of up to 50 000 neurons, processing over 35 million spiking events per second.

  5. Multi-GPU maximum entropy image synthesis for radio astronomy

    NASA Astrophysics Data System (ADS)

    Cárcamo, M.; Román, P. E.; Casassus, S.; Moral, V.; Rannou, F. R.

    2018-01-01

    The maximum entropy method (MEM) is a well known deconvolution technique in radio-interferometry. This method solves a non-linear optimization problem with an entropy regularization term. Other heuristics such as CLEAN are faster but highly user dependent. Nevertheless, MEM has the following advantages: it is unsupervised, it has a statistical basis, it has a better resolution and better image quality under certain conditions. This work presents a high performance GPU version of non-gridding MEM, which is tested using real and simulated data. We propose a single-GPU and a multi-GPU implementation for single and multi-spectral data, respectively. We also make use of the Peer-to-Peer and Unified Virtual Addressing features of newer GPUs which allows to exploit transparently and efficiently multiple GPUs. Several ALMA data sets are used to demonstrate the effectiveness in imaging and to evaluate GPU performance. The results show that a speedup from 1000 to 5000 times faster than a sequential version can be achieved, depending on data and image size. This allows to reconstruct the HD142527 CO(6-5) short baseline data set in 2.1 min, instead of 2.5 days that takes a sequential version on CPU.

  6. Real-time time-division color electroholography using a single GPU and a USB module for synchronizing reference light.

    PubMed

    Araki, Hiromitsu; Takada, Naoki; Niwase, Hiroaki; Ikawa, Shohei; Fujiwara, Masato; Nakayama, Hirotaka; Kakue, Takashi; Shimobaba, Tomoyoshi; Ito, Tomoyoshi

    2015-12-01

    We propose real-time time-division color electroholography using a single graphics processing unit (GPU) and a simple synchronization system of reference light. To facilitate real-time time-division color electroholography, we developed a light emitting diode (LED) controller with a universal serial bus (USB) module and the drive circuit for reference light. A one-chip RGB LED connected to a personal computer via an LED controller was used as the reference light. A single GPU calculates three computer-generated holograms (CGHs) suitable for red, green, and blue colors in each frame of a three-dimensional (3D) movie. After CGH calculation using a single GPU, the CPU can synchronize the CGH display with the color switching of the one-chip RGB LED via the LED controller. Consequently, we succeeded in real-time time-division color electroholography for a 3D object consisting of around 1000 points per color when an NVIDIA GeForce GTX TITAN was used as the GPU. Furthermore, we implemented the proposed method in various GPUs. The experimental results showed that the proposed method was effective for various GPUs.

  7. Benchmarking GPU and CPU codes for Heisenberg spin glass over-relaxation

    NASA Astrophysics Data System (ADS)

    Bernaschi, M.; Parisi, G.; Parisi, L.

    2011-06-01

    We present a set of possible implementations for Graphics Processing Units (GPU) of the Over-relaxation technique applied to the 3D Heisenberg spin glass model. The results show that a carefully tuned code can achieve more than 100 GFlops/s of sustained performance and update a single spin in about 0.6 nanoseconds. A multi-hit technique that exploits the GPU shared memory further reduces this time. Such results are compared with those obtained by means of a highly-tuned vector-parallel code on latest generation multi-core CPUs.

  8. Simultaneous Range-Velocity Processing and SNR Analysis of AFIT’s Random Noise Radar

    DTIC Science & Technology

    2012-03-22

    reducing the overall processing time. Two computers, equipped with NVIDIA ® GPUs, were used to process the col- 45 lected data. The specifications for each...gather the results back to the CPU. Another company , AccelerEyes®, has developed a product called Jacket® that claims to be better than the parallel...Number of Processing Cores 4 8 Processor Speed 3.33 GHz 3.07 GHz Installed Memory 48 GB 48 GB GPU Make NVIDIA NVIDIA GPU Model Tesla 1060 Tesla C2070 GPU

  9. GPU Acceleration of DSP for Communication Receivers.

    PubMed

    Gunther, Jake; Gunther, Hyrum; Moon, Todd

    2017-09-01

    Graphics processing unit (GPU) implementations of signal processing algorithms can outperform CPU-based implementations. This paper describes the GPU implementation of several algorithms encountered in a wide range of high-data rate communication receivers including filters, multirate filters, numerically controlled oscillators, and multi-stage digital down converters. These structures are tested by processing the 20 MHz wide FM radio band (88-108 MHz). Two receiver structures are explored: a single channel receiver and a filter bank channelizer. Both run in real time on NVIDIA GeForce GTX 1080 graphics card.

  10. Base de linhas moleculares para síntese espectral estelar

    NASA Astrophysics Data System (ADS)

    Milone, A.; Sanzovo, G.

    2003-08-01

    A análise das abundâncias quí micas fotosféricas em estrelas do tipo solar ou tardia, através do cálculo teórico de seus espectros, emprega a espectroscopia de alta resolução e necessita de uma base representativa de linhas atômicas e moleculares com suas respectivas constantes bem determinadas. Nesse trabalho, utilizamos como ponto de partida as extensas listas de linhas espectrais de sistemas eletrônicos de algumas moléculas diatômicas compiladas por Kurucz para a construção de uma base de linhas moleculares para a sí ntese espectral estelar. Revisamos as determinações dos fatores rotacionais de Honl-London das forças de oscilador das linhas moleculares, para cada banda vibracional de alguns sistemas eletrônicos, seguindo a regra usual de normalização. Usamos as forças de oscilador eletrônicas da literatura. Os fatores vibracionais de Franck-Condon de cada banda foram especialmente recalculados empregando-se novas constantes moleculares. Reproduzimos, com êxito, as absorções espectrais de determinadas bandas eletrônicas-vibracionais das espécies moleculares C12C12, C12N14 e Mg24H em espectros de estrelas de referência como o Sol e Arcturus.

  11. Celulas solares e sensores de filme fino de silicio depositados sobre substratos flexiveis =

    NASA Astrophysics Data System (ADS)

    Pinto, Emilio Sergio Marins Vieira

    Celulas solares flexiveis de filmes finos de silicio sao geralmente fabricadas a baixa temperatura sobre substratos de plastico ou a mais elevadas temperaturas sobre folhas de aco. Esta tese reporta o estudo da deposicao de filmes finos sobre diferentes substratos de plastico, transparentes e coloridos, para celulas solares do tipo sobrestrato e substrato, respectivamente. Como objetivo co-lateral, os filmes dopados depositados sobre plastico foram usados como sensores de deformacao, utilizando as suas propriedades piezo-resistivas. Elevadas taxas de deposicao dos filmes de silicio depositados sobre plastico foram obtidas a baixa temperatura do substrato (150ºC) por rf-PECVD. A influencia de diferentes parametros de deposicao sobre as propriedades e taxa de deposicao dos filmes resultantes foram estudados e correlacionados. Celulas solares de filmes finos de silicio amorfo e microcristalino foram desenvolvidas a baixas temperaturas sobre plasticos. Eficiencias de 5 - 6.5% foram alcancadas para as celulas amorfas e 7.5% para as celulas microcristalinas. Efeitos de aprisionamento da luz foram estudados atraves da texturizacao por ablacao laser de substratos de plastico e corrosao umida de TCO sobre plastico. Filmes finos de silicio microcristalino, depositados por HW-CVD, com fator piezoresistivo de -32.2, foram usados para fabricar sensores de deformacao em uma membrana plastica muito fina (15 μm). Estruturas de teste em textil e a miniaturizacao dos sensores piezoresistivos depositados sobre substratos flexiveis de poliimida foram abordados.

  12. Dobutamine Stress Echocardiography Safety in Chagas Disease Patients.

    PubMed

    Rassi, Daniela do Carmo; Vieira, Marcelo Luiz Campos; Furtado, Rogerio Gomes; Turco, Fabio de Paula; Melato, Luciano Henrique; Hotta, Viviane Tiemi; Nunes, Colandy Godoy de Oliveira; Rassi, Luiz; Rassi, Salvador

    2017-02-01

    A few decades ago, patients with Chagas disease were predominantly rural workers, with a low risk profile for obstructive coronary artery disease (CAD). As urbanization has increased, they became exposed to the same risk factors for CAD of uninfected individuals. Dobutamine stress echocardiography (DSE) has proven to be an important tool in CAD diagnosis. Despite being a potentially arrhythmogenic method, it is safe for coronary patients without Chagas disease. For Chagas disease patients, however, the indication of DSE in clinical practice is uncertain, because of the arrhythmogenic potential of that heart disease. To assess DSE safety in Chagas disease patients with clinical suspicion of CAD, as well as the incidence of arrhythmias and adverse events during the exam. Retrospective analysis of a database of patients referred for DSE from May/2012 to February/2015. This study assessed 205 consecutive patients with Chagas disease suspected of having CAD. All of them had their serology for Chagas disease confirmed. Their mean age was 64±10 years and most patients were females (65.4%). No patient had significant adverse events, such as acute myocardial infarction, ventricular fibrillation, asystole, stroke, cardiac rupture and death. Regarding arrhythmias, ventricular extrasystoles occurred in 48% of patients, and non-sustained ventricular tachycardia in 7.3%. DSE proved to be safe in this population of Chagas disease patients, in which no potentially life-threatening outcome was found. Até poucas décadas atrás, os pacientes chagásicos eram predominantemente trabalhadores rurais, com baixo perfil de risco para doença obstrutiva coronária. Com a crescente urbanização, passaram a ter os mesmos fatores de risco para doença aterosclerótica que indivíduos não infectados. O ecocardiograma sob estresse com dobutamina (EED) é uma importante ferramenta no diagnóstico de coronariopatia. É referido, porém, como um método potencialmente arritmogênico, mas seguro, em pacientes coronarianos não chagásicos. Entretanto, há insegurança na prática clínica de indicá-lo no paciente chagásico, devido ao potencial arritmogênico já intrínseco nesta cardiopatia. Analisar a segurança do EED em uma população de chagásicos com suspeita clínica de coronariopatia. Análise retrospectiva de um banco de dados de pacientes encaminhados para a realização do EED entre maio/2012 e fevereiro/2015. Avaliou-se pacientes consecutivos portadores de doença de Chagas e com suspeita de coronariopatia. Confirmou-se a sorologia para doença de Chagas em todos os pacientes. A média etária dos 205 pacientes analisados foi de 64 ± 10 anos, sendo a maioria do sexo feminino (65,4%). Nenhum paciente apresentou eventos adversos significativos, como infarto agudo do miocárdio, fibrilação ventricular, assistolia, acidente vascular encefálico, ruptura cardíaca ou morte. Quanto às arritmias, extrassístoles ventriculares frequentes ocorreram em 48% dos pacientes, taquicardia ventricular não sustentada em 7,3%, bigeminismo em 4,4%, taquicardia supraventricular e taquicardia ventricular sustentada em 1% e fibrilação atrial em 0,5%. O EED mostrou ser um exame seguro nessa população de pacientes chagásicos, onde nenhum desfecho grave foi encontrado.

  13. High performance transcription factor-DNA docking with GPU computing

    PubMed Central

    2012-01-01

    Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575

  14. SU-E-T-29: A Web Application for GPU-Based Monte Carlo IMRT/VMAT QA with Delivered Dose Verification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Folkerts, M; University of California, San Diego, La Jolla, CA; Graves, Y

    Purpose: To enable an existing web application for GPU-based Monte Carlo (MC) 3D dosimetry quality assurance (QA) to compute “delivered dose” from linac logfile data. Methods: We added significant features to an IMRT/VMAT QA web application which is based on existing technologies (HTML5, Python, and Django). This tool interfaces with python, c-code libraries, and command line-based GPU applications to perform a MC-based IMRT/VMAT QA. The web app automates many complicated aspects of interfacing clinical DICOM and logfile data with cutting-edge GPU software to run a MC dose calculation. The resultant web app is powerful, easy to use, and is ablemore » to re-compute both plan dose (from DICOM data) and delivered dose (from logfile data). Both dynalog and trajectorylog file formats are supported. Users upload zipped DICOM RP, CT, and RD data and set the expected statistic uncertainty for the MC dose calculation. A 3D gamma index map, 3D dose distribution, gamma histogram, dosimetric statistics, and DVH curves are displayed to the user. Additional the user may upload the delivery logfile data from the linac to compute a 'delivered dose' calculation and corresponding gamma tests. A comprehensive PDF QA report summarizing the results can also be downloaded. Results: We successfully improved a web app for a GPU-based QA tool that consists of logfile parcing, fluence map generation, CT image processing, GPU based MC dose calculation, gamma index calculation, and DVH calculation. The result is an IMRT and VMAT QA tool that conducts an independent dose calculation for a given treatment plan and delivery log file. The system takes both DICOM data and logfile data to compute plan dose and delivered dose respectively. Conclusion: We sucessfully improved a GPU-based MC QA tool to allow for logfile dose calculation. The high efficiency and accessibility will greatly facilitate IMRT and VMAT QA.« less

  15. Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

    NASA Astrophysics Data System (ADS)

    Qin, Cheng-Zhi; Zhan, Lijun

    2012-06-01

    As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU-based algorithms based on existing parallelization strategies.

  16. Ice-sheet modelling accelerated by graphics cards

    NASA Astrophysics Data System (ADS)

    Brædstrup, Christian Fredborg; Damsgaard, Anders; Egholm, David Lundbek

    2014-11-01

    Studies of glaciers and ice sheets have increased the demand for high performance numerical ice flow models over the past decades. When exploring the highly non-linear dynamics of fast flowing glaciers and ice streams, or when coupling multiple flow processes for ice, water, and sediment, researchers are often forced to use super-computing clusters. As an alternative to conventional high-performance computing hardware, the Graphical Processing Unit (GPU) is capable of massively parallel computing while retaining a compact design and low cost. In this study, we present a strategy for accelerating a higher-order ice flow model using a GPU. By applying the newest GPU hardware, we achieve up to 180× speedup compared to a similar but serial CPU implementation. Our results suggest that GPU acceleration is a competitive option for ice-flow modelling when compared to CPU-optimised algorithms parallelised by the OpenMP or Message Passing Interface (MPI) protocols.

  17. Gpu Implementation of a Viscous Flow Solver on Unstructured Grids

    NASA Astrophysics Data System (ADS)

    Xu, Tianhao; Chen, Long

    2016-06-01

    Graphics processing units have gained popularities in scientific computing over past several years due to their outstanding parallel computing capability. Computational fluid dynamics applications involve large amounts of calculations, therefore a latest GPU card is preferable of which the peak computing performance and memory bandwidth are much better than a contemporary high-end CPU. We herein focus on the detailed implementation of our GPU targeting Reynolds-averaged Navier-Stokes equations solver based on finite-volume method. The solver employs a vertex-centered scheme on unstructured grids for the sake of being capable of handling complex topologies. Multiple optimizations are carried out to improve the memory accessing performance and kernel utilization. Both steady and unsteady flow simulation cases are carried out using explicit Runge-Kutta scheme. The solver with GPU acceleration in this paper is demonstrated to have competitive advantages over the CPU targeting one.

  18. Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems

    PubMed Central

    Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.

    2014-01-01

    The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545

  19. Medical image processing on the GPU - past, present and future.

    PubMed

    Eklund, Anders; Dufort, Paul; Forsberg, Daniel; LaConte, Stephen M

    2013-12-01

    Graphics processing units (GPUs) are used today in a wide range of applications, mainly because they can dramatically accelerate parallel computing, are affordable and energy efficient. In the field of medical imaging, GPUs are in some cases crucial for enabling practical use of computationally demanding algorithms. This review presents the past and present work on GPU accelerated medical image processing, and is meant to serve as an overview and introduction to existing GPU implementations. The review covers GPU acceleration of basic image processing operations (filtering, interpolation, histogram estimation and distance transforms), the most commonly used algorithms in medical imaging (image registration, image segmentation and image denoising) and algorithms that are specific to individual modalities (CT, PET, SPECT, MRI, fMRI, DTI, ultrasound, optical imaging and microscopy). The review ends by highlighting some future possibilities and challenges. Copyright © 2013 Elsevier B.V. All rights reserved.

  20. 3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

    PubMed Central

    Cuomo, Salvatore; De Michele, Pasquale; Piccialli, Francesco

    2014-01-01

    Nonlocal Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. Its high computational complexity leads researchers to the development of parallel programming approaches and the use of massively parallel architectures such as the GPUs. In the recent years, the GPU devices had led to achieving reasonable running times by filtering, slice-by-slice, and 3D datasets with a 2D NLM algorithm. In our approach we design and implement a fully 3D NonLocal Means parallel approach, adopting different algorithm mapping strategies on GPU architecture and multi-GPU framework, in order to demonstrate its high applicability and scalability. The experimental results we obtained encourage the usability of our approach in a large spectrum of applicative scenarios such as magnetic resonance imaging (MRI) or video sequence denoising. PMID:25045397

  1. Real-time generation of infrared ocean scene based on GPU

    NASA Astrophysics Data System (ADS)

    Jiang, Zhaoyi; Wang, Xun; Lin, Yun; Jin, Jianqiu

    2007-12-01

    Infrared (IR) image synthesis for ocean scene has become more and more important nowadays, especially for remote sensing and military application. Although a number of works present ready-to-use simulations, those techniques cover only a few possible ways of water interacting with the environment. And the detail calculation of ocean temperature is rarely considered by previous investigators. With the advance of programmable features of graphic card, many algorithms previously limited to offline processing have become feasible for real-time usage. In this paper, we propose an efficient algorithm for real-time rendering of infrared ocean scene using the newest features of programmable graphics processors (GPU). It differs from previous works in three aspects: adaptive GPU-based ocean surface tessellation, sophisticated balance equation of thermal balance for ocean surface, and GPU-based rendering for infrared ocean scene. Finally some results of infrared image are shown, which are in good accordance with real images.

  2. A GPU accelerated and error-controlled solver for the unbounded Poisson equation in three dimensions

    NASA Astrophysics Data System (ADS)

    Exl, Lukas

    2017-12-01

    An efficient solver for the three dimensional free-space Poisson equation is presented. The underlying numerical method is based on finite Fourier series approximation. While the error of all involved approximations can be fully controlled, the overall computation error is driven by the convergence of the finite Fourier series of the density. For smooth and fast-decaying densities the proposed method will be spectrally accurate. The method scales with O(N log N) operations, where N is the total number of discretization points in the Cartesian grid. The majority of the computational costs come from fast Fourier transforms (FFT), which makes it ideal for GPU computation. Several numerical computations on CPU and GPU validate the method and show efficiency and convergence behavior. Tests are performed using the Vienna Scientific Cluster 3 (VSC3). A free MATLAB implementation for CPU and GPU is provided to the interested community.

  3. Strong scaling of general-purpose molecular dynamics simulations on GPUs

    NASA Astrophysics Data System (ADS)

    Glaser, Jens; Nguyen, Trung Dac; Anderson, Joshua A.; Lui, Pak; Spiga, Filippo; Millan, Jaime A.; Morse, David C.; Glotzer, Sharon C.

    2015-07-01

    We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, 2013). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson et al., 2008). The software supports short-ranged pair force and bond force fields and achieves optimal GPU performance using an autotuning algorithm. We are able to demonstrate equivalent or superior scaling on up to 3375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD) simulations of up to 108 million particles. GPUDirect RDMA capabilities in recent GPU generations provide better performance in full double precision calculations. For a representative polymer physics application, HOOMD-blue 1.0 provides an effective GPU vs. CPU node speed-up of 12.5 ×.

  4. Massage and Reiki used to reduce stress and anxiety: Randomized Clinical Trial.

    PubMed

    Kurebayashi, Leonice Fumiko Sato; Turrini, Ruth Natalia Teresa; Souza, Talita Pavarini Borges de; Takiguchi, Raymond Sehiji; Kuba, Gisele; Nagumo, Marisa Toshi

    2016-11-28

    to evaluate the effectiveness of massage and reiki in the reduction of stress and anxiety in clients at the Institute for Integrated and Oriental Therapy in Sao Paulo (Brazil). clinical tests randomly done in parallel with an initial sample of 122 people divided into three groups: Massage + Rest (G1), Massage + Reiki (G2) and a Control group without intervention (G3). The Stress Systems list and the Trace State Anxiety Inventory were used to evaluate the groups at the start and after 8 sessions (1 month), during 2015. there were statistical differences (p = 0.000) according to the ANOVA (Analysis of Variance) for the stress amongst the groups 2 and 3 (p = 0.014) with a 33% reductions and a Cohen of 0.78. In relation to anxiety-state, there was a reduction in the intervention groups compared with the control group (p < 0.01) with a 21% reduction in group 2 (Cohen of 1.18) and a 16% reduction for group 1 (Cohen of 1.14). Massage + Reiki produced better results amongst the groups and the conclusion is for further studies to be done with the use of a placebo group to evaluate the impact of the technique separate from other techniques. RBR-42c8wp. avaliar a efetividade da Massagem e Reiki na redução de estresse e ansiedade em clientes do Instituto de Terapia Integrada e Oriental, em São Paulo (Brasil). ensaio clínico controlado randomizado paralelo com amostra inicial de 122 pessoas divididas em 3 grupos Massagem+Repouso (G1), Massagem+Reiki (G2) e Controle sem intervenção (G3). Foram avaliados pela Lista de Sintomas de Stress e pelo Inventário de Ansiedade Traço-Estado, no início e após 8 sessões (1 mês), durante o ano de 2015. houve diferença estatística (p = 0,000) segundo ANOVA para o estresse entre os grupos 2 e 3 (33% de redução e Cohen de 0,98) e entre os grupos 1 e 3 (p = 0,014), 24% de redução e Cohen de 0,78. Para a ansiedade-estado, houve redução nos grupos de intervenção comparados ao grupo Controle (p < 0,01), com 21% de redução para o Grupo 2 (Cohen de 1,18) e 16% de redução para o grupo 1 (Cohen de 1,14). a Massagem+Reiki conseguiu melhores resultados entre os grupos e se sugere outro estudo com uso de placebo para o Reiki, para avaliar o alcance da técnica em separado. RBR-42c8wp. evaluar la efectividad de Masaje y Reiki para reducción del estrés y ansiedad en clientes del Instituto de Terapia Integrada y Oriental, en Sao Paulo, Brasil. ensayo clínico controlado aleatorizado paralelo, con muestra inicial de 122 personas divididas en 3 grupos Masaje+Reposo (G1), Masaje+Reiki (G2) y Control sin intervención (G3). Los participantes fueron evaluados a través de la Lista de Síntomas de Stress y por el Inventario de Ansiedad Rasgo-Estado, en el inicio y después de 8 sesiones (1 mes), durante el año de 2015. hubo diferencia estadística (p = 0,000) según ANOVA para el estrés entre los grupos 2 y 3 (33% de reducción y Cohen de 0,98) y entre los grupos 1 y 3 (p = 0,014), 24% de reducción y Cohen de 0,78. Para la ansiedad-estado, hubo reducción en los grupos de intervención comparados al grupo Control (p < 0,01), con 21% de reducción para el Grupo 2 (Cohen de 1,18) y 16% de reducción para el grupo 1 (Cohen de 1,14). entre los grupos, el Masaje+Reiki consiguió mejores resultados; se sugiere realizar otro estudio con uso de placebo para el Reiki, para evaluar el alcance de la técnica de forma separada. RBR-42c8wp.

  5. Nursing diagnoses in patients with immune-bullous dermatosis.

    PubMed

    Brandão, Euzeli da Silva; Santos, Iraci Dos; Lanzillotti, Regina Serrão; Ferreira, Adriano Menis; Gamba, Mônica Antar; Azulay-Abulafia, Luna

    2016-08-15

    identify nursing diagnoses in patients with immune-bullous dermatosis. a quantitative and descriptive research, carried out in three institutions located in Rio de Janeiro and Mato Grosso do Sul, Brazil, using the Client Assessment Protocol in Dermatology during a nursing consultation. Simple descriptive statistics was used for data analysis. 14 subjects participated in the study, nine with a diagnosis of pemphigus vulgaris, pemphigus two and three of bullous pemphigoid. The age ranged between 27 and 82 years, predominantly females (11). 14 nursing diagnoses were discussed and identified from a clinical rationale in all study participants, representing the most common human responses in this sample. The application of the Assessment Protocol in Dermatology facilitated the comprehensive assessment, in addition to providing the identification of diagnostics according to the North American Nursing Diagnosis Association International. the nursing diagnoses presented confirm the necessity of interdisciplinary work during the care for this clientele. For better description of the phenomena related to the client in question, it is suggested the inclusion of two risk factors related in three diagnoses of this taxonomy. It is worth noting the contribution of the findings for the care, education and research in nursing in dermatology. identificar diagnósticos de enfermagem em clientes com dermatoses imunobolhosas. pesquisa quantitativa e descritiva, realizada em três instituições localizadas no Rio de Janeiro e no Mato Grosso do Sul-Brasil, aplicando o Protocolo de Avaliação do Cliente em Dermatologia, durante consulta de enfermagem. Utilizou-se a estatística descritiva simples para análise dos dados. participaram do estudo 14 sujeitos, nove com diagnóstico médico de pênfigo vulgar, dois de foliáceo e três de penfigoide bolhoso. A idade variou entre 27 e 82 anos, predominando 11 pessoas do sexo feminino. Foram discutidos 14 diagnósticos de enfermagem identificados a partir do raciocínio clínico, em todos os participantes do estudo, representando as respostas humanas mais frequentes nesta amostra. A aplicação do Protocolo de Avaliação do Cliente em Dermatologia facilitou a avaliação integral, além de propiciar a identificação dos diagnósticos de acordo com a North American Nursing Diagnosis Association International. os diagnósticos de enfermagem apresentados ratificam a necessidade do trabalho interdisciplinar durante atendimento a esta clientela. Para melhor descrição dos fenômenos relacionados à clientela em questão, sugere-se a inclusão de dois fatores de risco/relacionados em três diagnósticos desta taxonomia. Cabe ressaltar a contribuição dos achados para o cuidar/educar/pesquisar em enfermagem em dermatologia. identificar los diagnósticos de enfermería en pacientes con inmuno dermatosis ampollosa. investigación cuantitativa y descriptiva, realizada en tres instituciones ubicadas en Río de Janeiro y Mato Grosso do Sul, Brasil, utilizando el Protocolo de Evaluación del Cliente en Dermatología en la consulta de enfermería. Se utilizó estadística descriptiva simples para el análisis de datos. 14 sujetos participaron en el estudio, nueve con diagnóstico de pénfigo vulgar, dos de pénfigo foliáceo y tres de penfigoide ampolloso. La edad osciló entre 27 y 82 años, predominio femenino con 11 mujeres. Se discutieron 14 diagnósticos de enfermería identificados desde el razonamiento clínico, en todos los participantes en el estudio, que representa las respuestas humanas más comunes en esta muestra. La aplicación del Protocolo de Evaluación de Dermatología facilitó la evaluación global, además de proporcionar la identificación de los diagnósticos de acuerdo con la North American Nursing Diagnosis Association International. los diagnósticos de enfermería presentados confirman la necesidad del trabajo interdisciplinario en el servicio a estos clientes. Para una mejor descripción de los fenómenos relacionados con los clientes en cuestión, se sugiere la inclusión de dos factores de riesgo/relacionados en tres diagnósticos de esta taxonomía. Vale la pena señalar la contribución de los hallazgos para el cuidado/educación/investigación en enfermería en dermatología.

  6. Low completion rate of hepatitis B vaccination in female sex workers.

    PubMed

    Magalhães, Rosilane de Lima Brito; Teles, Sheila Araújo; Reis, Renata Karina; Galvão, Marli Teresinha Gimeniz; Gir, Elucir

    2017-01-01

    to assess predictive factors for noncompletion of the hepatitis B vaccination schedule in female sex workers in the city of Teresina, Northeastern Brazil. 402 women were interviewed and, for those who did not wish to visit specialized sites, or did not know their hepatitis B vaccination status, the vaccine was offered at their workplaces. Bi- and multivariate analyses were performed to identify potential predictors for noncompletion of the vaccination schedule. of the 284 women eligible for vaccination, 258 (90.8%) received the second dose, 157/258 (60.8%) and 68/258 (26.3%) received the second and third doses, respectively. Working at clubs and consuming illicit drugs were predictors for noncompletion of the vaccination schedule. the high acceptability of the vaccine's first dose, associated with low completion rates of the vaccination schedule in sex workers, shows the need for more persuasive strategies that go beyond offering the vaccine at their workplaces. avaliar fatores preditores de não completude do esquema vacinal contra hepatite B em mulheres que se prostituem em Teresina, Nordeste do Brasil. Um total de 402 mulheres foi entrevistado e, para as que se negaram a irem a lugares especializados, ou desconheciam sua situação vacinal contra hepatite B, a vacina foi oferecida no local do trabalho. Análises bi e multivariadas foram realizadas para identificar potenciais preditores de não completude do esquema vacinal. Das 284 mulheres elegíveis para vacinação, 258 (90,8%) receberam a primeira dose, 157/258 (60,8%) e 68/258 (26,3%) receberam a segunda e terceira doses. Trabalhar em boates e consumir drogas ilícitas foram preditores de não completude do esquema vacinal (p<0,05). A elevada aceitabilidade da primeira dose da vacina, associada à baixa completude do esquema vacinal em profissionais do sexo, evidencia a necessidade de estratégia mais persuasiva que vá além da oferta da vacina no local de trabalho.

  7. Minimum Map of Social Institutional Network: a multidimensional strategy for research in Nursing.

    PubMed

    Carlos, Diene Monique; Pádua, Elisabete Matallo Marchesini de; Nakano, Ana Márcia Spanó; Ferriani, Maria das Graças Carvalho

    2016-06-01

    To analyze the use of methodological strategies in qualitative research - Minimum Maps of Social Institutional Network, as proposed to understand the phenomena in the multidimensional perspective. Methodological theoretical essay in which we aimed to reflect on the use of innovative methodological strategies in nursing research, supported in Complex Paradigm fundamentals. The minimum map of Social Institutional External Network aims to identify institutional linkages and gaps for the intervention work of the surveyed institutions. The use of these maps provided important advances in know-how qualitative research in Health and Nursing. In this perspective, the use of minimum Social Intitutional Network maps can be stimulated and enhanced to meet the current demands of the contemporary world, particularly for its flexibility in adapting to various research subjects; breadth and depth of discussion; and possibilities with health services. Analisar o uso de estratégias metodológicas em pesquisas qualitativas - Mapa mínimo da Rede Social Institucional, como proposta para compreender os fenômenos na perspectiva multidimensional. Ensaio teórico metodológico em que buscou-se refletir sobre o uso de estratégias metodológicas inovadoras de pesquisa na enfermagem, sustentada nos fundamentos do Pensamento Complexo. O mapa mínimo da Rede Social Institucional Externa tem o objetivo de identificar os vínculos institucionais e lacunas para o trabalho de intervenção das instituições pesquisadas. O uso destes mapas proporcionou avanços importantes no saber-fazer pesquisa qualitativa em Saúde e Enfermagem. Nessa perspectiva, o uso de mapas mínimos da Rede Social Institucional pode ser estimulado e potencializado para responder às atuais demandas da contemporaneidade, em especial pela sua flexibilidade na adequação a diversos objetos de pesquisa; amplitude e profundidade de discussão; e possibilidades de articulação com a prática dos serviços.

  8. GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents

    NASA Astrophysics Data System (ADS)

    Srinivasa, K. G.; Shree Devi, B. N.

    2017-10-01

    String searching in documents has become a tedious task with the evolution of Big Data. Generation of large data sets demand for a high performance search algorithm in areas such as text mining, information retrieval and many others. The popularity of GPU's for general purpose computing has been increasing for various applications. Therefore it is of great interest to exploit the thread feature of a GPU to provide a high performance search algorithm. This paper proposes an optimized new approach to N-gram model for string search in a number of lengthy documents and its GPU implementation. The algorithm exploits GPGPUs for searching strings in many documents employing character level N-gram matching with parallel Score Table approach and search using CUDA API. The new approach of Score table used for frequency storage of N-grams in a document, makes the search independent of the document's length and allows faster access to the frequency values, thus decreasing the search complexity. The extensive thread feature in a GPU has been exploited to enable parallel pre-processing of trigrams in a document for Score Table creation and parallel search in huge number of documents, thus speeding up the whole search process even for a large pattern size. Experiments were carried out for many documents of varied length and search strings from the standard Lorem Ipsum text on NVIDIA's GeForce GT 540M GPU with 96 cores. Results prove that the parallel approach for Score Table creation and searching gives a good speed up than the same approach executed serially.

  9. Cost-effective GPU-grid for genome-wide epistasis calculations.

    PubMed

    Pütz, B; Kam-Thong, T; Karbalai, N; Altmann, A; Müller-Myhsok, B

    2013-01-01

    Until recently, genotype studies were limited to the investigation of single SNP effects due to the computational burden incurred when studying pairwise interactions of SNPs. However, some genetic effects as simple as coloring (in plants and animals) cannot be ascribed to a single locus but only understood when epistasis is taken into account [1]. It is expected that such effects are also found in complex diseases where many genes contribute to the clinical outcome of affected individuals. Only recently have such problems become feasible computationally. The inherently parallel structure of the problem makes it a perfect candidate for massive parallelization on either grid or cloud architectures. Since we are also dealing with confidential patient data, we were not able to consider a cloud-based solution but had to find a way to process the data in-house and aimed to build a local GPU-based grid structure. Sequential epistatsis calculations were ported to GPU using CUDA at various levels. Parallelization on the CPU was compared to corresponding GPU counterparts with regards to performance and cost. A cost-effective solution was created by combining custom-built nodes equipped with relatively inexpensive consumer-level graphics cards with highly parallel GPUs in a local grid. The GPU method outperforms current cluster-based systems on a price/performance criterion, as a single GPU shows speed performance comparable up to 200 CPU cores. The outlined approach will work for problems that easily lend themselves to massive parallelization. Code for various tasks has been made available and ongoing development of tools will further ease the transition from sequential to parallel algorithms.

  10. Fast 3D dosimetric verifications based on an electronic portal imaging device using a GPU calculation engine.

    PubMed

    Zhu, Jinhan; Chen, Lixin; Chen, Along; Luo, Guangwen; Deng, Xiaowu; Liu, Xiaowei

    2015-04-11

    To use a graphic processing unit (GPU) calculation engine to implement a fast 3D pre-treatment dosimetric verification procedure based on an electronic portal imaging device (EPID). The GPU algorithm includes the deconvolution and convolution method for the fluence-map calculations, the collapsed-cone convolution/superposition (CCCS) algorithm for the 3D dose calculations and the 3D gamma evaluation calculations. The results of the GPU-based CCCS algorithm were compared to those of Monte Carlo simulations. The planned and EPID-based reconstructed dose distributions in overridden-to-water phantoms and the original patients were compared for 6 MV and 10 MV photon beams in intensity-modulated radiation therapy (IMRT) treatment plans based on dose differences and gamma analysis. The total single-field dose computation time was less than 8 s, and the gamma evaluation for a 0.1-cm grid resolution was completed in approximately 1 s. The results of the GPU-based CCCS algorithm exhibited good agreement with those of the Monte Carlo simulations. The gamma analysis indicated good agreement between the planned and reconstructed dose distributions for the treatment plans. For the target volume, the differences in the mean dose were less than 1.8%, and the differences in the maximum dose were less than 2.5%. For the critical organs, minor differences were observed between the reconstructed and planned doses. The GPU calculation engine was used to boost the speed of 3D dose and gamma evaluation calculations, thus offering the possibility of true real-time 3D dosimetric verification.

  11. DeF-GPU: Efficient and effective deletions finding in hepatitis B viral genomic DNA using a GPU architecture.

    PubMed

    Cheng, Chun-Pei; Lan, Kuo-Lun; Liu, Wen-Chun; Chang, Ting-Tsung; Tseng, Vincent S

    2016-12-01

    Hepatitis B viral (HBV) infection is strongly associated with an increased risk of liver diseases like cirrhosis or hepatocellular carcinoma (HCC). Many lines of evidence suggest that deletions occurring in HBV genomic DNA are highly associated with the activity of HBV via the interplay between aberrant viral proteins release and human immune system. Deletions finding on the HBV whole genome sequences is thus a very important issue though there exist underlying the challenges in mining such big and complex biological data. Although some next generation sequencing (NGS) tools are recently designed for identifying structural variations such as insertions or deletions, their validity is generally committed to human sequences study. This design may not be suitable for viruses due to different species. We propose a graphics processing unit (GPU)-based data mining method called DeF-GPU to efficiently and precisely identify HBV deletions from large NGS data, which generally contain millions of reads. To fit the single instruction multiple data instructions, sequencing reads are referred to as multiple data and the deletion finding procedure is referred to as a single instruction. We use Compute Unified Device Architecture (CUDA) to parallelize the procedures, and further validate DeF-GPU on 5 synthetic and 1 real datasets. Our results suggest that DeF-GPU outperforms the existing commonly-used method Pindel and is able to exactly identify the deletions of our ground truth in few seconds. The source code and other related materials are available at https://sourceforge.net/projects/defgpu/. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Fast generation of computer-generated hologram by graphics processing unit

    NASA Astrophysics Data System (ADS)

    Matsuda, Sho; Fujii, Tomohiko; Yamaguchi, Takeshi; Yoshikawa, Hiroshi

    2009-02-01

    A cylindrical hologram is well known to be viewable in 360 deg. This hologram depends high pixel resolution.Therefore, Computer-Generated Cylindrical Hologram (CGCH) requires huge calculation amount.In our previous research, we used look-up table method for fast calculation with Intel Pentium4 2.8 GHz.It took 480 hours to calculate high resolution CGCH (504,000 x 63,000 pixels and the average number of object points are 27,000).To improve quality of CGCH reconstructed image, fringe pattern requires higher spatial frequency and resolution.Therefore, to increase the calculation speed, we have to change the calculation method. In this paper, to reduce the calculation time of CGCH (912,000 x 108,000 pixels), we employ Graphics Processing Unit (GPU).It took 4,406 hours to calculate high resolution CGCH on Xeon 3.4 GHz.Since GPU has many streaming processors and a parallel processing structure, GPU works as the high performance parallel processor.In addition, GPU gives max performance to 2 dimensional data and streaming data.Recently, GPU can be utilized for the general purpose (GPGPU).For example, NVIDIA's GeForce7 series became a programmable processor with Cg programming language.Next GeForce8 series have CUDA as software development kit made by NVIDIA.Theoretically, calculation ability of GPU is announced as 500 GFLOPS. From the experimental result, we have achieved that 47 times faster calculation compared with our previous work which used CPU.Therefore, CGCH can be generated in 95 hours.So, total time is 110 hours to calculate and print the CGCH.

  13. High-throughput sequence alignment using Graphics Processing Units

    PubMed Central

    Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

    2007-01-01

    Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356

  14. Embolização arterial superseletiva para tratamento de angiomiolipoma em paciente com rim único

    PubMed Central

    Góes, Adenauer Marinho de Oliveira; Jeha, Salim Abdon Haber; Salgado, José Rui Couto

    2016-01-01

    Resumo Os autores relatam o caso de uma paciente jovem previamente submetida a nefrectomia direita por apresentar angiomiolipomas renais (AMLRs) e portadora de dois volumosos angiomiolipomas no rim esquerdo remanescente. A paciente foi encaminhada pelo urologista para tratamento endovascular. Realizou-se embolização superseletiva de um dos tumores, localizado no polo renal inferior e em situação subcapsular; apesar de várias tentativas, não foi obtido um cateterismo seletivo suficiente para embolizar o segundo angiomiolipoma (localizado no polo renal superior) sem que um volume considerável de parênquima renal adjacente sofresse isquemia. O procedimento e a recuperação da paciente transcorreram sem complicações. A paciente recebeu alta no primeiro pós-operatório e vem sendo acompanhada ambulatorialmente há 9 meses sem intercorrências. É feita uma breve revisão sobre indicações, aspectos técnicos e complicações do tratamento endovascular dos AMLRs, além de serem discutidas vantagens dessa técnica quando comparada à ressecção cirúrgica dos tumores. PMID:29930580

  15. Launching Latin America: International and Domestic Factors in National Space Programs

    DTIC Science & Technology

    2014-12-01

    Rocket],” Globo.com, August 22, 2013, http://g1.globo.com/ ciencia -e-saude/noticia/2013/08/tragedia-em-alcantara-faz-dez-anos- e-brasil-ainda-sonha...October 2005, http://super.abril.com.br/ ciencia /sabotagem-tio-sam- 446333.shtml. 127 D’Alama, “Tragédia Em Alcântara.” 128 Associated Press...http://idbdocs.iadb.org/wsdocs/getdocument.aspx?docnum=33036507. 64 including the new Ministerio del Poder Popular para Ciencia , Tecnología e

  16. A GPU OpenCL based cross-platform Monte Carlo dose calculation engine (goMC)

    NASA Astrophysics Data System (ADS)

    Tian, Zhen; Shi, Feng; Folkerts, Michael; Qin, Nan; Jiang, Steve B.; Jia, Xun

    2015-09-01

    Monte Carlo (MC) simulation has been recognized as the most accurate dose calculation method for radiotherapy. However, the extremely long computation time impedes its clinical application. Recently, a lot of effort has been made to realize fast MC dose calculation on graphic processing units (GPUs). However, most of the GPU-based MC dose engines have been developed under NVidia’s CUDA environment. This limits the code portability to other platforms, hindering the introduction of GPU-based MC simulations to clinical practice. The objective of this paper is to develop a GPU OpenCL based cross-platform MC dose engine named goMC with coupled photon-electron simulation for external photon and electron radiotherapy in the MeV energy range. Compared to our previously developed GPU-based MC code named gDPM (Jia et al 2012 Phys. Med. Biol. 57 7783-97), goMC has two major differences. First, it was developed under the OpenCL environment for high code portability and hence could be run not only on different GPU cards but also on CPU platforms. Second, we adopted the electron transport model used in EGSnrc MC package and PENELOPE’s random hinge method in our new dose engine, instead of the dose planning method employed in gDPM. Dose distributions were calculated for a 15 MeV electron beam and a 6 MV photon beam in a homogenous water phantom, a water-bone-lung-water slab phantom and a half-slab phantom. Satisfactory agreement between the two MC dose engines goMC and gDPM was observed in all cases. The average dose differences in the regions that received a dose higher than 10% of the maximum dose were 0.48-0.53% for the electron beam cases and 0.15-0.17% for the photon beam cases. In terms of efficiency, goMC was ~4-16% slower than gDPM when running on the same NVidia TITAN card for all the cases we tested, due to both the different electron transport models and the different development environments. The code portability of our new dose engine goMC was validated by successfully running it on a variety of different computing devices including an NVidia GPU card, two AMD GPU cards and an Intel CPU processor. Computational efficiency among these platforms was compared.

  17. CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm: 2D and 3D Ising, Potts, and XY models

    NASA Astrophysics Data System (ADS)

    Komura, Yukihiro; Okabe, Yutaka

    2014-03-01

    We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the q-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported the idea of the GPU implementation for 2D models (Komura and Okabe, 2012). We here explain the details of sample programs, and discuss the performance of the present GPU implementation for the 3D Ising and XY models. We also show the calculated results of the moment ratio for these models, and discuss phase transitions. Catalogue identifier: AERM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 5632 No. of bytes in distributed program, including test data, etc.: 14688 Distribution format: tar.gz Programming language: C, CUDA. Computer: System with an NVIDIA CUDA enabled GPU. Operating system: System with an NVIDIA CUDA enabled GPU. Classification: 23. External routines: NVIDIA CUDA Toolkit 3.0 or newer Nature of problem: Monte Carlo simulation of classical spin systems. Ising, q-state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices. Solution method: GPU-based Swendsen-Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [1] and that by Kalentev et al. [2]. Restrictions: The system size is limited depending on the memory of a GPU. Running time: For the parameters used in the sample programs, it takes about a minute for each program. Of course, it depends on the system size, the number of Monte Carlo steps, etc. References: [1] K.A. Hawick, A. Leist, and D. P. Playne, Parallel Computing 36 (2010) 655-678 [2] O. Kalentev, A. Rai, S. Kemnitzb, and R. Schneider, J. Parallel Distrib. Comput. 71 (2011) 615-620

  18. Viscoelastic Finite Difference Modeling Using Graphics Processing Units

    NASA Astrophysics Data System (ADS)

    Fabien-Ouellet, G.; Gloaguen, E.; Giroux, B.

    2014-12-01

    Full waveform seismic modeling requires a huge amount of computing power that still challenges today's technology. This limits the applicability of powerful processing approaches in seismic exploration like full-waveform inversion. This paper explores the use of Graphics Processing Units (GPU) to compute a time based finite-difference solution to the viscoelastic wave equation. The aim is to investigate whether the adoption of the GPU technology is susceptible to reduce significantly the computing time of simulations. The code presented herein is based on the freely accessible software of Bohlen (2002) in 2D provided under a General Public License (GNU) licence. This implementation is based on a second order centred differences scheme to approximate time differences and staggered grid schemes with centred difference of order 2, 4, 6, 8, and 12 for spatial derivatives. The code is fully parallel and is written using the Message Passing Interface (MPI), and it thus supports simulations of vast seismic models on a cluster of CPUs. To port the code from Bohlen (2002) on GPUs, the OpenCl framework was chosen for its ability to work on both CPUs and GPUs and its adoption by most of GPU manufacturers. In our implementation, OpenCL works in conjunction with MPI, which allows computations on a cluster of GPU for large-scale model simulations. We tested our code for model sizes between 1002 and 60002 elements. Comparison shows a decrease in computation time of more than two orders of magnitude between the GPU implementation run on a AMD Radeon HD 7950 and the CPU implementation run on a 2.26 GHz Intel Xeon Quad-Core. The speed-up varies depending on the order of the finite difference approximation and generally increases for higher orders. Increasing speed-ups are also obtained for increasing model size, which can be explained by kernel overheads and delays introduced by memory transfers to and from the GPU through the PCI-E bus. Those tests indicate that the GPU memory size and the slow memory transfers are the limiting factors of our GPU implementation. Those results show the benefits of using GPUs instead of CPUs for time based finite-difference seismic simulations. The reductions in computation time and in hardware costs are significant and open the door for new approaches in seismic inversion.

  19. Musrfit-Real Time Parameter Fitting Using GPUs

    NASA Astrophysics Data System (ADS)

    Locans, Uldis; Suter, Andreas

    High transverse field μSR (HTF-μSR) experiments typically lead to a rather large data sets, since it is necessary to follow the high frequencies present in the positron decay histograms. The analysis of these data sets can be very time consuming, usually due to the limited computational power of the hardware. To overcome the limited computing resources rotating reference frame transformation (RRF) is often used to reduce the data sets that need to be handled. This comes at a price typically the μSR community is not aware of: (i) due to the RRF transformation the fitting parameter estimate is of poorer precision, i.e., more extended expensive beamtime is needed. (ii) RRF introduces systematic errors which hampers the statistical interpretation of χ2 or the maximum log-likelihood. We will briefly discuss these issues in a non-exhaustive practical way. The only and single purpose of the RRF transformation is the sluggish computer power. Therefore during this work GPU (Graphical Processing Units) based fitting was developed which allows to perform real-time full data analysis without RRF. GPUs have become increasingly popular in scientific computing in recent years. Due to their highly parallel architecture they provide the opportunity to accelerate many applications with considerably less costs than upgrading the CPU computational power. With the emergence of frameworks such as CUDA and OpenCL these devices have become more easily programmable. During this work GPU support was added to Musrfit- a data analysis framework for μSR experiments. The new fitting algorithm uses CUDA or OpenCL to offload the most time consuming parts of the calculations to Nvidia or AMD GPUs. Using the current CPU implementation in Musrfit parameter fitting can take hours for certain data sets while the GPU version can allow to perform real-time data analysis on the same data sets. This work describes the challenges that arise in adding the GPU support to t as well as results obtained using the GPU version. The speedups using the GPU were measured comparing to the CPU implementation. Two different GPUs were used for the comparison — high end Nvidia Tesla K40c GPU designed for HPC applications and AMD Radeon R9 390× GPU designed for gaming industry.

  20. A GPU OpenCL based cross-platform Monte Carlo dose calculation engine (goMC).

    PubMed

    Tian, Zhen; Shi, Feng; Folkerts, Michael; Qin, Nan; Jiang, Steve B; Jia, Xun

    2015-10-07

    Monte Carlo (MC) simulation has been recognized as the most accurate dose calculation method for radiotherapy. However, the extremely long computation time impedes its clinical application. Recently, a lot of effort has been made to realize fast MC dose calculation on graphic processing units (GPUs). However, most of the GPU-based MC dose engines have been developed under NVidia's CUDA environment. This limits the code portability to other platforms, hindering the introduction of GPU-based MC simulations to clinical practice. The objective of this paper is to develop a GPU OpenCL based cross-platform MC dose engine named goMC with coupled photon-electron simulation for external photon and electron radiotherapy in the MeV energy range. Compared to our previously developed GPU-based MC code named gDPM (Jia et al 2012 Phys. Med. Biol. 57 7783-97), goMC has two major differences. First, it was developed under the OpenCL environment for high code portability and hence could be run not only on different GPU cards but also on CPU platforms. Second, we adopted the electron transport model used in EGSnrc MC package and PENELOPE's random hinge method in our new dose engine, instead of the dose planning method employed in gDPM. Dose distributions were calculated for a 15 MeV electron beam and a 6 MV photon beam in a homogenous water phantom, a water-bone-lung-water slab phantom and a half-slab phantom. Satisfactory agreement between the two MC dose engines goMC and gDPM was observed in all cases. The average dose differences in the regions that received a dose higher than 10% of the maximum dose were 0.48-0.53% for the electron beam cases and 0.15-0.17% for the photon beam cases. In terms of efficiency, goMC was ~4-16% slower than gDPM when running on the same NVidia TITAN card for all the cases we tested, due to both the different electron transport models and the different development environments. The code portability of our new dose engine goMC was validated by successfully running it on a variety of different computing devices including an NVidia GPU card, two AMD GPU cards and an Intel CPU processor. Computational efficiency among these platforms was compared.

  1. INFLUENCE OF HEPATOCELLULAR CARCINOMA ETIOLOGY IN THE SURVIVAL AFTER RESECTION.

    PubMed

    Lopes, Felipe de Lucena Moreira; Coelho, Fabricio Ferreira; Kruger, Jaime Arthur Pirolla; Fonseca, Gilton Marques; Araujo, Raphael Leonardo Cunha de; Jeismann, Vagner Birk; Herman, Paulo

    2016-01-01

    Hepatocellular carcinoma (HCC) is the most frequent type of primary liver cancer and its incidence is increasing around the world in the last decades, making it the third cause of death by cancer in the world. Hepatic resection is one of the most effective treatments for HCC with five-year survival rates from 50-70%, especially for patients with a single nodule and preserved liver function. Some studies have shown a worse prognosis for HCC patients whose etiology is viral. That brings us to the question about the existence of a difference between the various causes of HCC and its prognosis. To compare the prognosis (overall and disease-free survival at five years) of patients undergoing hepatectomy for the treatment of HCC with respect to various causes of liver disease. Was performed a review of medical records of patients undergoing hepatectomy between 2000 and 2014 for the treatment of HCC. They were divided into groups according to the cause of liver disease, followed by overall and disease-free survival analysis for comparison. There was no statistically significant difference in the outcomes of the groups of patients divided according to the etiology of HCC. Overall and disease-free survival at five years of the patients in this sample were 49.9% and 40.7%, respectively. From the data of this sample, was verified that there was no prognostic differences among the groups of HCC patients of the various etiologies. O carcinoma hepatocelular (CHC) é o mais frequente tipo de câncer primário do fígado e a sua incidência vem aumentando nas últimas décadas, tornando-o hoje a terceira causa de morte por câncer no mundo. A ressecção hepática é um dos tratamentos mais eficazes para ele com taxas de sobrevida em cinco anos de 50-70%, especialmente para pacientes com nódulo único e função hepática preservada. Alguns estudos mostraram pior prognóstico para os pacientes com CHC cuja causa é a infecção por vírus B ou C. Isso leva à questão sobre a existência de possível diferença entre as diversas causas e o prognóstico. Comparar o prognóstico (sobrevida global e livre de doença em cinco anos) de pacientes submetidos à hepatectomia para o tratamento do CHC com relação às diversas causas da hepatopatia. Foi realizado levantamento de prontuários dos pacientes submetidos à hepatectomia entre 2000 e 2014 para tratamento de CHC. Eles foram divididos em grupos de acordo com a causa da hepatopatia, sendo feita análise de sobrevida para comparação. Não houve diferença estatisticamente significante de prognóstico entre os grupos de pacientes divididos conforme a causa do CHC. A sobrevida global e livre de doença em cinco anos foi de 49.9% e 40.7%, respectivamente. Pôde-se constatar que não houve diferença em relação ao prognóstico entre os grupos de pacientes das diversas causas de CHC.

  2. Precessão do jato de 3C120: simulações hidrodinâmicas 3D

    NASA Astrophysics Data System (ADS)

    Caproni, A.; de Gouveia dal Pino, E. M.; Abraham, Z.; Raga, A. C.

    2003-08-01

    Observações com técnicas de interferometria com longa linha de base têm mostrado a existência de um jato relativístico com componentes superluminais na região central de 3C 120. Estas componentes são ejetadas em distintas direções no plano do céu e com diferentes velocidades aparentes. Estas características foram interpretadas em trabalhos anteriores como efeitos da precessão do jato relativístico. Neste trabalho, realizamos simulações tri-dimensionais do jato de 3C 120 utilizando os parâmetros de precessão determinados em trabalhos anteriores e variando as características iniciais do jato e meio ambiente, tais como densidade numérica e temperatura. Todas as simulações foram feitas com o código hidrodinâmico YGUAZÚ-A, assumindo-se um jato adiabático descrito por uma equação de estado relativística. Pelo fato de estarmos utilizando um código hidrodinâmico, nós assumimos que a intensidade do campo magnético e a distribuição de partículas, necessários para se calcular a emissão sincrotron, são proporcionais à pressão hidrodinâmica. Comparação entre dois cenários distintos, nos quais o material do jato é ejetado com velocidade constante (jato contínuo) e com velocidade modulada por um padrão sinusoidal no tempo (jato intermitente), é apresentada e discutida. Para jatos que apresentam fenômenos de precessão e intermitência, com amplitude de variação na velocidade de injeção maior que dez por cento da velocidade média de injeção, a hipótese balística, controlada pela intermitencia, é mais provável. Por outro lado, para jatos com precessão mas sem intermitência (ou com amplitude de variabilidade em velocidade mais baixa que no caso anterior), o efeito da precessão na morfologia do jato não é desprezível. Portanto, de um modo geral, ambos efeitos (precessão e movimentos balísticos) devem estar concorrendo para afetar a morfologia dos jatos superluminais.

  3. Ensino de astronomia e óptica: é possível fazê-lo de forma contextualizada no nível médio?

    NASA Astrophysics Data System (ADS)

    Sobrinho, A. A.; Jafelice, L. C.

    2003-08-01

    Discutimos nossa participação em um curso de treinamento para professores de diversas disciplinas do ensino médio. Nossa preocupação básica foi desenvolver instrumentos educacionais adequados para levar à sala de aula, nesse nível de ensino, de forma contextualizada, questionamentos freqüentes dos alunos sobre astronomia e sua relação com tecnologia e sociedade. Encaminhamos questões como: a evolução da astronomia, suas relações com outros ramos do conhecimento humano e conseqüentes aplicações; avanços na tecnologia dos instrumentos ópticos versus a importância da observação do céu a olho nu; a relação entre olho humano, luneta e telescópio; e desenvolvimento da tecnologia espacial e sua influência em nosso cotidiano. Objetivamos com isto fazer um resgate histórico e pedagógico das aplicações e observações do céu no cenário escolar, destacando a relação entre eventos astronômicos, olho humano, instrumentos mediadores e suas contextualizações históricas e sociais. Produtos desta abordagem foram o desenvolvimento e a adaptação de práticas e materiais instrucionais diversos (e.g., "espelhos" de isopor e "raios luminosos" de bolinhas de gude; montagens envolvendo velas, lasers, lentes e espelhos; desmonte e análise de peças de um telescópio; etc.). Além disto, como outro resultado deste trabalho, elaboramos textos sobre história da astronomia e da óptica para atividades em classe. Com estas ações visamos facilitar a concretização de conceitos físicos envolvidos, exemplificar um ensino contextualizado e interdisciplinar motivado por temas astronômicos e favorecer que práticas e discussões feitas com os treinandos possam ser transpostas para a sala de aula. A reação dos professores às práticas propostas foi bastante positiva. Todos esses aspectos são discutidos em detalhe neste trabalho. (PPGECNM/UFRN; PRONEX/FINEP; NUPA/USP; Temáticos/FAPESP)

  4. Accelerating Pseudo-Random Number Generator for MCNP on GPU

    NASA Astrophysics Data System (ADS)

    Gong, Chunye; Liu, Jie; Chi, Lihua; Hu, Qingfeng; Deng, Li; Gong, Zhenghu

    2010-09-01

    Pseudo-random number generators (PRNG) are intensively used in many stochastic algorithms in particle simulations, artificial neural networks and other scientific computation. The PRNG in Monte Carlo N-Particle Transport Code (MCNP) requires long period, high quality, flexible jump and fast enough. In this paper, we implement such a PRNG for MCNP on NVIDIA's GTX200 Graphics Processor Units (GPU) using CUDA programming model. Results shows that 3.80 to 8.10 times speedup are achieved compared with 4 to 6 cores CPUs and more than 679.18 million double precision random numbers can be generated per second on GPU.

  5. Digital image processing using parallel computing based on CUDA technology

    NASA Astrophysics Data System (ADS)

    Skirnevskiy, I. P.; Pustovit, A. V.; Abdrashitova, M. O.

    2017-01-01

    This article describes expediency of using a graphics processing unit (GPU) in big data processing in the context of digital images processing. It provides a short description of a parallel computing technology and its usage in different areas, definition of the image noise and a brief overview of some noise removal algorithms. It also describes some basic requirements that should be met by certain noise removal algorithm in the projection to computer tomography. It provides comparison of the performance with and without using GPU as well as with different percentage of using CPU and GPU.

  6. GAPD: a GPU-accelerated atom-based polychromatic diffraction simulation code.

    PubMed

    E, J C; Wang, L; Chen, S; Zhang, Y Y; Luo, S N

    2018-03-01

    GAPD, a graphics-processing-unit (GPU)-accelerated atom-based polychromatic diffraction simulation code for direct, kinematics-based, simulations of X-ray/electron diffraction of large-scale atomic systems with mono-/polychromatic beams and arbitrary plane detector geometries, is presented. This code implements GPU parallel computation via both real- and reciprocal-space decompositions. With GAPD, direct simulations are performed of the reciprocal lattice node of ultralarge systems (∼5 billion atoms) and diffraction patterns of single-crystal and polycrystalline configurations with mono- and polychromatic X-ray beams (including synchrotron undulator sources), and validation, benchmark and application cases are presented.

  7. A numerical code for the simulation of non-equilibrium chemically reacting flows on hybrid CPU-GPU clusters

    NASA Astrophysics Data System (ADS)

    Kudryavtsev, Alexey N.; Kashkovsky, Alexander V.; Borisov, Semyon P.; Shershnev, Anton A.

    2017-10-01

    In the present work a computer code RCFS for numerical simulation of chemically reacting compressible flows on hybrid CPU/GPU supercomputers is developed. It solves 3D unsteady Euler equations for multispecies chemically reacting flows in general curvilinear coordinates using shock-capturing TVD schemes. Time advancement is carried out using the explicit Runge-Kutta TVD schemes. Program implementation uses CUDA application programming interface to perform GPU computations. Data between GPUs is distributed via domain decomposition technique. The developed code is verified on the number of test cases including supersonic flow over a cylinder.

  8. Implementing a GPU-based numerical algorithm for modelling dynamics of a high-speed train

    NASA Astrophysics Data System (ADS)

    Sytov, E. S.; Bratus, A. S.; Yurchenko, D.

    2018-04-01

    This paper discusses the initiative of implementing a GPU-based numerical algorithm for studying various phenomena associated with dynamics of a high-speed railway transport. The proposed numerical algorithm for calculating a critical speed of the bogie is based on the first Lyapunov number. Numerical algorithm is validated by analytical results, derived for a simple model. A dynamic model of a carriage connected to a new dual-wheelset flexible bogie is studied for linear and dry friction damping. Numerical results obtained by CPU, MPU and GPU approaches are compared and appropriateness of these methods is discussed.

  9. A 3D front tracking method on a CPU/GPU system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bo, Wurigen; Grove, John

    2011-01-21

    We describe the method to port a sequential 3D interface tracking code to a GPU with CUDA. The interface is represented as a triangular mesh. Interface geometry properties and point propagation are performed on a GPU. Interface mesh adaptation is performed on a CPU. The convergence of the method is assessed from the test problems with given velocity fields. Performance results show overall speedups from 11 to 14 for the test problems under mesh refinement. We also briefly describe our ongoing work to couple the interface tracking method with a hydro solver.

  10. Improving Quantum Gate Simulation using a GPU

    NASA Astrophysics Data System (ADS)

    Gutierrez, Eladio; Romero, Sergio; Trenas, Maria A.; Zapata, Emilio L.

    2008-11-01

    Due to the increasing computing power of the graphics processing units (GPU), they are becoming more and more popular when solving general purpose algorithms. As the simulation of quantum computers results on a problem with exponential complexity, it is advisable to perform a parallel computation, such as the one provided by the SIMD multiprocessors present in recent GPUs. In this paper, we focus on an important quantum algorithm, the quantum Fourier transform (QTF), in order to evaluate different parallelization strategies on a novel GPU architecture. Our implementation makes use of the new CUDA software/hardware architecture developed recently by NVIDIA.

  11. Interactive Light Stimulus Generation with High Performance Real-Time Image Processing and Simple Scripting.

    PubMed

    Szécsi, László; Kacsó, Ágota; Zeck, Günther; Hantz, Péter

    2017-01-01

    Light stimulation with precise and complex spatial and temporal modulation is demanded by a series of research fields like visual neuroscience, optogenetics, ophthalmology, and visual psychophysics. We developed a user-friendly and flexible stimulus generating framework (GEARS GPU-based Eye And Retina Stimulation Software), which offers access to GPU computing power, and allows interactive modification of stimulus parameters during experiments. Furthermore, it has built-in support for driving external equipment, as well as for synchronization tasks, via USB ports. The use of GEARS does not require elaborate programming skills. The necessary scripting is visually aided by an intuitive interface, while the details of the underlying software and hardware components remain hidden. Internally, the software is a C++/Python hybrid using OpenGL graphics. Computations are performed on the GPU, and are defined in the GLSL shading language. However, all GPU settings, including the GPU shader programs, are automatically generated by GEARS. This is configured through a method encountered in game programming, which allows high flexibility: stimuli are straightforwardly composed using a broad library of basic components. Stimulus rendering is implemented solely in C++, therefore intermediary libraries for interfacing could be omitted. This enables the program to perform computationally demanding tasks like en-masse random number generation or real-time image processing by local and global operations.

  12. ESPRIT-Like Two-Dimensional DOA Estimation for Monostatic MIMO Radar with Electromagnetic Vector Received Sensors under the Condition of Gain and Phase Uncertainties and Mutual Coupling

    PubMed Central

    Zhang, Yongshun; Zheng, Guimei; Feng, Cunqian; Tang, Jun

    2017-01-01

    In this paper, we focus on the problem of two-dimensional direction of arrival (2D-DOA) estimation for monostatic MIMO Radar with electromagnetic vector received sensors (MIMO-EMVSs) under the condition of gain and phase uncertainties (GPU) and mutual coupling (MC). GPU would spoil the invariance property of the EMVSs in MIMO-EMVSs, thus the effective ESPRIT algorithm unable to be used directly. Then we put forward a C-SPD ESPRIT-like algorithm. It estimates the 2D-DOA and polarization station angle (PSA) based on the instrumental sensors method (ISM). The C-SPD ESPRIT-like algorithm can obtain good angle estimation accuracy without knowing the GPU. Furthermore, it can be applied to arbitrary array configuration and has low complexity for avoiding the angle searching procedure. When MC and GPU exist together between the elements of EMVSs, in order to make our algorithm feasible, we derive a class of separated electromagnetic vector receiver and give the S-SPD ESPRIT-like algorithm. It can solve the problem of GPU and MC efficiently. And the array configuration can be arbitrary. The effectiveness of our proposed algorithms is verified by the simulation result. PMID:29072588

  13. ESPRIT-Like Two-Dimensional DOA Estimation for Monostatic MIMO Radar with Electromagnetic Vector Received Sensors under the Condition of Gain and Phase Uncertainties and Mutual Coupling.

    PubMed

    Zhang, Dong; Zhang, Yongshun; Zheng, Guimei; Feng, Cunqian; Tang, Jun

    2017-10-26

    In this paper, we focus on the problem of two-dimensional direction of arrival (2D-DOA) estimation for monostatic MIMO Radar with electromagnetic vector received sensors (MIMO-EMVSs) under the condition of gain and phase uncertainties (GPU) and mutual coupling (MC). GPU would spoil the invariance property of the EMVSs in MIMO-EMVSs, thus the effective ESPRIT algorithm unable to be used directly. Then we put forward a C-SPD ESPRIT-like algorithm. It estimates the 2D-DOA and polarization station angle (PSA) based on the instrumental sensors method (ISM). The C-SPD ESPRIT-like algorithm can obtain good angle estimation accuracy without knowing the GPU. Furthermore, it can be applied to arbitrary array configuration and has low complexity for avoiding the angle searching procedure. When MC and GPU exist together between the elements of EMVSs, in order to make our algorithm feasible, we derive a class of separated electromagnetic vector receiver and give the S-SPD ESPRIT-like algorithm. It can solve the problem of GPU and MC efficiently. And the array configuration can be arbitrary. The effectiveness of our proposed algorithms is verified by the simulation result.

  14. A GPU-accelerated 3D Coupled Sub-sample Estimation Algorithm for Volumetric Breast Strain Elastography

    PubMed Central

    Peng, Bo; Wang, Yuqi; Hall, Timothy J; Jiang, Jingfeng

    2017-01-01

    Our primary objective of this work was to extend a previously published 2D coupled sub-sample tracking algorithm for 3D speckle tracking in the framework of ultrasound breast strain elastography. In order to overcome heavy computational cost, we investigated the use of a graphic processing unit (GPU) to accelerate the 3D coupled sub-sample speckle tracking method. The performance of the proposed GPU implementation was tested using a tissue-mimicking (TM) phantom and in vivo breast ultrasound data. The performance of this 3D sub-sample tracking algorithm was compared with the conventional 3D quadratic sub-sample estimation algorithm. On the basis of these evaluations, we concluded that the GPU implementation of this 3D sub-sample estimation algorithm can provide high-quality strain data (i.e. high correlation between the pre- and the motion-compensated post-deformation RF echo data and high contrast-to-noise ratio strain images), as compared to the conventional 3D quadratic sub-sample algorithm. Using the GPU implementation of the 3D speckle tracking algorithm, volumetric strain data can be achieved relatively fast (approximately 20 seconds per volume [2.5 cm × 2.5 cm × 2.5 cm]). PMID:28166493

  15. GPU-BSM: A GPU-Based Tool to Map Bisulfite-Treated Reads

    PubMed Central

    Manconi, Andrea; Orro, Alessandro; Manca, Emanuele; Armano, Giuliano; Milanesi, Luciano

    2014-01-01

    Cytosine DNA methylation is an epigenetic mark implicated in several biological processes. Bisulfite treatment of DNA is acknowledged as the gold standard technique to study methylation. This technique introduces changes in the genomic DNA by converting cytosines to uracils while 5-methylcytosines remain nonreactive. During PCR amplification 5-methylcytosines are amplified as cytosine, whereas uracils and thymines as thymine. To detect the methylation levels, reads treated with the bisulfite must be aligned against a reference genome. Mapping these reads to a reference genome represents a significant computational challenge mainly due to the increased search space and the loss of information introduced by the treatment. To deal with this computational challenge we devised GPU-BSM, a tool based on modern Graphics Processing Units. Graphics Processing Units are hardware accelerators that are increasingly being used successfully to accelerate general-purpose scientific applications. GPU-BSM is a tool able to map bisulfite-treated reads from whole genome bisulfite sequencing and reduced representation bisulfite sequencing, and to estimate methylation levels, with the goal of detecting methylation. Due to the massive parallelization obtained by exploiting graphics cards, GPU-BSM aligns bisulfite-treated reads faster than other cutting-edge solutions, while outperforming most of them in terms of unique mapped reads. PMID:24842718

  16. cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU.

    PubMed

    Zhang, Jing; Wang, Hao; Feng, Wu-Chun

    2017-01-01

    BLAST, short for Basic Local Alignment Search Tool, is a ubiquitous tool used in the life sciences for pairwise sequence search. However, with the advent of next-generation sequencing (NGS), whether at the outset or downstream from NGS, the exponential growth of sequence databases is outstripping our ability to analyze the data. While recent studies have utilized the graphics processing unit (GPU) to speedup the BLAST algorithm for searching protein sequences (i.e., BLASTP), these studies use coarse-grained parallelism, where one sequence alignment is mapped to only one thread. Such an approach does not efficiently utilize the capabilities of a GPU, particularly due to the irregularity of BLASTP in both execution paths and memory-access patterns. To address the above shortcomings, we present a fine-grained approach to parallelize BLASTP, where each individual phase of sequence search is mapped to many threads on a GPU. This approach, which we refer to as cuBLASTP, reorders data-access patterns and reduces divergent branches of the most time-consuming phases (i.e., hit detection and ungapped extension). In addition, cuBLASTP optimizes the remaining phases (i.e., gapped extension and alignment with trace back) on a multicore CPU and overlaps their execution with the phases running on the GPU.

  17. MrBayes tgMC3++: A High Performance and Resource-Efficient GPU-Oriented Phylogenetic Analysis Method.

    PubMed

    Ling, Cheng; Hamada, Tsuyoshi; Gao, Jingyang; Zhao, Guoguang; Sun, Donghong; Shi, Weifeng

    2016-01-01

    MrBayes is a widespread phylogenetic inference tool harnessing empirical evolutionary models and Bayesian statistics. However, the computational cost on the likelihood estimation is very expensive, resulting in undesirably long execution time. Although a number of multi-threaded optimizations have been proposed to speed up MrBayes, there are bottlenecks that severely limit the GPU thread-level parallelism of likelihood estimations. This study proposes a high performance and resource-efficient method for GPU-oriented parallelization of likelihood estimations. Instead of having to rely on empirical programming, the proposed novel decomposition storage model implements high performance data transfers implicitly. In terms of performance improvement, a speedup factor of up to 178 can be achieved on the analysis of simulated datasets by four Tesla K40 cards. In comparison to the other publicly available GPU-oriented MrBayes, the tgMC 3 ++ method (proposed herein) outperforms the tgMC 3 (v1.0), nMC 3 (v2.1.1) and oMC 3 (v1.00) methods by speedup factors of up to 1.6, 1.9 and 2.9, respectively. Moreover, tgMC 3 ++ supports more evolutionary models and gamma categories, which previous GPU-oriented methods fail to take into analysis.

  18. GPU based contouring method on grid DEM data

    NASA Astrophysics Data System (ADS)

    Tan, Liheng; Wan, Gang; Li, Feng; Chen, Xiaohui; Du, Wenlong

    2017-08-01

    This paper presents a novel method to generate contour lines from grid DEM data based on the programmable GPU pipeline. The previous contouring approaches often use CPU to construct a finite element mesh from the raw DEM data, and then extract contour segments from the elements. They also need a tracing or sorting strategy to generate the final continuous contours. These approaches can be heavily CPU-costing and time-consuming. Meanwhile the generated contours would be unsmooth if the raw data is sparsely distributed. Unlike the CPU approaches, we employ the GPU's vertex shader to generate a triangular mesh with arbitrary user-defined density, in which the height of each vertex is calculated through a third-order Cardinal spline function. Then in the same frame, segments are extracted from the triangles by the geometry shader, and translated to the CPU-side with an internal order in the GPU's transform feedback stage. Finally we propose a "Grid Sorting" algorithm to achieve the continuous contour lines by travelling the segments only once. Our method makes use of multiple stages of GPU pipeline for computation, which can generate smooth contour lines, and is significantly faster than the previous CPU approaches. The algorithm can be easily implemented with OpenGL 3.3 API or higher on consumer-level PCs.

  19. Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

    PubMed Central

    Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

    2014-01-01

    Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6  mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868

  20. Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

    PubMed

    Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

    2014-07-01

    Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6  mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.

  1. GPU.proton.DOCK: Genuine Protein Ultrafast proton equilibria consistent DOCKing.

    PubMed

    Kantardjiev, Alexander A

    2011-07-01

    GPU.proton.DOCK (Genuine Protein Ultrafast proton equilibria consistent DOCKing) is a state of the art service for in silico prediction of protein-protein interactions via rigorous and ultrafast docking code. It is unique in providing stringent account of electrostatic interactions self-consistency and proton equilibria mutual effects of docking partners. GPU.proton.DOCK is the first server offering such a crucial supplement to protein docking algorithms--a step toward more reliable and high accuracy docking results. The code (especially the Fast Fourier Transform bottleneck and electrostatic fields computation) is parallelized to run on a GPU supercomputer. The high performance will be of use for large-scale structural bioinformatics and systems biology projects, thus bridging physics of the interactions with analysis of molecular networks. We propose workflows for exploring in silico charge mutagenesis effects. Special emphasis is given to the interface-intuitive and user-friendly. The input is comprised of the atomic coordinate files in PDB format. The advanced user is provided with a special input section for addition of non-polypeptide charges, extra ionogenic groups with intrinsic pK(a) values or fixed ions. The output is comprised of docked complexes in PDB format as well as interactive visualization in a molecular viewer. GPU.proton.DOCK server can be accessed at http://gpudock.orgchm.bas.bg/.

  2. GPU-accelerated Kernel Regression Reconstruction for Freehand 3D Ultrasound Imaging.

    PubMed

    Wen, Tiexiang; Li, Ling; Zhu, Qingsong; Qin, Wenjian; Gu, Jia; Yang, Feng; Xie, Yaoqin

    2017-07-01

    Volume reconstruction method plays an important role in improving reconstructed volumetric image quality for freehand three-dimensional (3D) ultrasound imaging. By utilizing the capability of programmable graphics processing unit (GPU), we can achieve a real-time incremental volume reconstruction at a speed of 25-50 frames per second (fps). After incremental reconstruction and visualization, hole-filling is performed on GPU to fill remaining empty voxels. However, traditional pixel nearest neighbor-based hole-filling fails to reconstruct volume with high image quality. On the contrary, the kernel regression provides an accurate volume reconstruction method for 3D ultrasound imaging but with the cost of heavy computational complexity. In this paper, a GPU-based fast kernel regression method is proposed for high-quality volume after the incremental reconstruction of freehand ultrasound. The experimental results show that improved image quality for speckle reduction and details preservation can be obtained with the parameter setting of kernel window size of [Formula: see text] and kernel bandwidth of 1.0. The computational performance of the proposed GPU-based method can be over 200 times faster than that on central processing unit (CPU), and the volume with size of 50 million voxels in our experiment can be reconstructed within 10 seconds.

  3. Hierarchical Petascale Simulation Framework For Stress Corrosion Cracking

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grama, Ananth

    2013-12-18

    A number of major accomplishments resulted from the project. These include: • Data Structures, Algorithms, and Numerical Methods for Reactive Molecular Dynamics. We have developed a range of novel data structures, algorithms, and solvers (amortized ILU, Spike) for use with ReaxFF and charge equilibration. • Parallel Formulations of ReactiveMD (Purdue ReactiveMolecular Dynamics Package, PuReMD, PuReMD-GPU, and PG-PuReMD) for Messaging, GPU, and GPU Cluster Platforms. We have developed efficient serial, parallel (MPI), GPU (Cuda), and GPU Cluster (MPI/Cuda) implementations. Our implementations have been demonstrated to be significantly better than the state of the art, both in terms of performance and scalability.more » • Comprehensive Validation in the Context of Diverse Applications. We have demonstrated the use of our software in diverse systems, including silica-water, silicon-germanium nanorods, and as part of other projects, extended it to applications ranging from explosives (RDX) to lipid bilayers (biomembranes under oxidative stress). • Open Source Software Packages for Reactive Molecular Dynamics. All versions of our soft- ware have been released over the public domain. There are over 100 major research groups worldwide using our software. • Implementation into the Department of Energy LAMMPS Software Package. We have also integrated our software into the Department of Energy LAMMPS software package.« less

  4. Quantum Chemical Calculations Using Accelerators: Migrating Matrix Operations to the NVIDIA Kepler GPU and the Intel Xeon Phi.

    PubMed

    Leang, Sarom S; Rendell, Alistair P; Gordon, Mark S

    2014-03-11

    Increasingly, modern computer systems comprise a multicore general-purpose processor augmented with a number of special purpose devices or accelerators connected via an external interface such as a PCI bus. The NVIDIA Kepler Graphical Processing Unit (GPU) and the Intel Phi are two examples of such accelerators. Accelerators offer peak performances that can be well above those of the host processor. How to exploit this heterogeneous environment for legacy application codes is not, however, straightforward. This paper considers how matrix operations in typical quantum chemical calculations can be migrated to the GPU and Phi systems. Double precision general matrix multiply operations are endemic in electronic structure calculations, especially methods that include electron correlation, such as density functional theory, second order perturbation theory, and coupled cluster theory. The use of approaches that automatically determine whether to use the host or an accelerator, based on problem size, is explored, with computations that are occurring on the accelerator and/or the host. For data-transfers over PCI-e, the GPU provides the best overall performance for data sizes up to 4096 MB with consistent upload and download rates between 5-5.6 GB/s and 5.4-6.3 GB/s, respectively. The GPU outperforms the Phi for both square and nonsquare matrix multiplications.

  5. An analytic linear accelerator source model for GPU-based Monte Carlo dose calculations.

    PubMed

    Tian, Zhen; Li, Yongbao; Folkerts, Michael; Shi, Feng; Jiang, Steve B; Jia, Xun

    2015-10-21

    Recently, there has been a lot of research interest in developing fast Monte Carlo (MC) dose calculation methods on graphics processing unit (GPU) platforms. A good linear accelerator (linac) source model is critical for both accuracy and efficiency considerations. In principle, an analytical source model should be more preferred for GPU-based MC dose engines than a phase-space file-based model, in that data loading and CPU-GPU data transfer can be avoided. In this paper, we presented an analytical field-independent source model specifically developed for GPU-based MC dose calculations, associated with a GPU-friendly sampling scheme. A key concept called phase-space-ring (PSR) was proposed. Each PSR contained a group of particles that were of the same type, close in energy and reside in a narrow ring on the phase-space plane located just above the upper jaws. The model parameterized the probability densities of particle location, direction and energy for each primary photon PSR, scattered photon PSR and electron PSR. Models of one 2D Gaussian distribution or multiple Gaussian components were employed to represent the particle direction distributions of these PSRs. A method was developed to analyze a reference phase-space file and derive corresponding model parameters. To efficiently use our model in MC dose calculations on GPU, we proposed a GPU-friendly sampling strategy, which ensured that the particles sampled and transported simultaneously are of the same type and close in energy to alleviate GPU thread divergences. To test the accuracy of our model, dose distributions of a set of open fields in a water phantom were calculated using our source model and compared to those calculated using the reference phase-space files. For the high dose gradient regions, the average distance-to-agreement (DTA) was within 1 mm and the maximum DTA within 2 mm. For relatively low dose gradient regions, the root-mean-square (RMS) dose difference was within 1.1% and the maximum dose difference within 1.7%. The maximum relative difference of output factors was within 0.5%. Over 98.5% passing rate was achieved in 3D gamma-index tests with 2%/2 mm criteria in both an IMRT prostate patient case and a head-and-neck case. These results demonstrated the efficacy of our model in terms of accurately representing a reference phase-space file. We have also tested the efficiency gain of our source model over our previously developed phase-space-let file source model. The overall efficiency of dose calculation was found to be improved by ~1.3-2.2 times in water and patient cases using our analytical model.

  6. FROM COMPLEX EVOLVING TO SIMPLE: CURRENT REVISIONAL AND ENDOSCOPIC PROCEDURES FOLLOWING BARIATRIC SURGERY.

    PubMed

    Zorron, Ricardo; Galvão-Neto, Manoel Passos; Campos, Josemberg; Branco, Alcides José; Sampaio, José; Junghans, Tido; Bothe, Claudia; Benzing, Christian; Krenzien, Felix

    Roux-en-Y gastric bypass (RYGB) is a standard therapy in bariatric surgery. Sleeve gastrectomy and gastric banding, although with good results in the literature, are showing higher rates of treatment failure to reduce obesity-associated morbidity and body weight. Other problems after bariatric may occur, as band erosion, gastroesophageal reflux disease and might be refractory to medication. Therefore, a laparoscopic conversion to a RYGB can be an effective alternative, as long as specific indications for revision are fulfilled. The objective of this study was to analyse own and literature data on revisional bariatric procedures to evaluate best alternatives to current practice. Institutional experience and systematic review from the literature on revisional bariatric surgery. Endoscopic procedures are recently applied to ameliorate failure and complications of bariatric procedures. Therapy failure following RYGB occurs in up to 20%. Transoral outlet reduction is currently an alternative method to reduce the gastrojejunal anastomosis. The diameter and volume of sleeve gastrectomy can enlarge as well, which can be reduced by endoscopic full-thickness sutures longitudinally. Dumping syndrome and severe hypoglycemic episodes (neuroglycopenia) can be present in patients following RYGB. The hypoglycemic episodes have to be evaluated and usually can be treated conventionally. To avoid partial pancreatectomy or conversion to normal anatomy, a new laparoscopic approach with remnant gastric resection and jejunal interposition can be applied in non-responders alternatively. Hypoglycemic episodes are ameliorated while weight loss is sustained. Revisional and endoscopic procedures following bariatric surgery in patients with collateral symptomatic or treatment failure can be applied. Conventional non-surgical approaches should have been applied intensively before a revisional surgery will be indicated. Former complex surgical revisional procedures are evolving to less complicated endoscopic solutions. Bypass gástrico em Y-de-Roux (BGYR) é procedimento padrão em cirurgia bariátrica. Gastrectomia vertical e banda gástrica, embora com bons resultados na literatura, estão mostrando taxas mais elevadas de insucesso no tratamento para reduzir a morbidade associada à obesidade e peso corporal. Outros problemas pós-operatórios podem ocorrer, como a erosão da banda, e doença do refluxo gastroesofágico refratária à medicação. Portanto, conversão laparoscópica para BGYR pode ser alternativa eficaz, desde que indicações específicas para a revisão sejam cumpridas. Analisar os nossos dados e os da literatura sobre procedimentos bariátricos revisionais para avaliar melhores alternativas para a prática atual. Foram efetuados experiência institucional e revisão sistemática da literatura sobre cirurgia bariátrica revisional. Procedimentos endoscópicos estão sendo aplicados recentemente para melhorar a falha e complicações de procedimentos bariátricos. Falha terapêutica após BGYR ocorre em até 20%. A redução transoral é atualmente um método alternativo para reduzir a anastomose gastrojejunal. A gastrectomia vertical pode apresentar aumento de volume e do diâmetro do pouch , o qual podem ser reduzidos por meio de sutura total endoscópica longitudinal. Síndrome de dumping e episódios de hipoglicemia grave (neuroglicopenia) podem estar presentes nos pacientes com BGYR. Os episódios hipoglicêmicos devem ser avaliados e geralmente podem ser tratados convencionalmente. Para evitar pancreatectomia parcial ou conversão à anatomia normal, uma nova abordagem laparoscópica com ressecção do remanescente gástrico e interposição de jejuno, pode ser aplicada como alternativa em não-respondedores. Episódios de hipoglicemia melhoram, enquanto a perda de peso é mantida. Procedimentos revisionais endoscópicos podem ser aplicados após cirurgia bariátrica em pacientes com sintomas colaterais ou na falha do tratamento. Abordagens convencionais não-cirúrgicas devem ser aplicadas intensivamente antes que uma operação revisional seja indicada. Antigos procedimentos cirúrgicos revisionais complexos estão evoluindo para soluções endoscópicas menos complicadas.

  7. Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

    NASA Astrophysics Data System (ADS)

    Rostrup, Scott; De Sterck, Hans

    2010-12-01

    Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.

  8. Astrossismologia e o satélite COROT

    NASA Astrophysics Data System (ADS)

    Andrade, L. B. P.; Janot Pacheco, E.

    2003-08-01

    Este trabalho centra-se em atividades na fase de pré-lançamento do satélite COROT, da agência espacial francesa (CNES), a ser lançado em 2005. O satélite será dedicado à sismologia estelar e à procura de exoplanetas. Nosso programa de trabalho centra-se em dois pontos principais: (1) efetuar uma procura detalhada nos campos COROT de alvos astrofísicos de especial interesse; (2) participar das análises espectroscópicas prévias de alvos selecionados para determinação de parâmetros físicos das estrelas com a maior precisão possível. Na presente etapa, priorizou-se o primeiro ponto do projeto. Foi feito um levantamento geral dos objetos astrofísicos encontrados nos dois campos de observação, centrados em 06H50M e 18H50M, com raios de 10 minutos. Concluiu-se que as estrelas B-Be deverão ser observadas no campo sismológico, enquanto que as anãs brancas deverão sê-lo no campo exoplanetário. Objetos a serem observados foram escolhidos de forma a estarem próximos de alvos principais dos programas centrais do satélite. Paralelamente, estudos e pesquisas bibliográficas foram feitos para compreender os assuntos de interesse principal, ou seja, as pulsações não-radiais de estrelas Ob-Be

  9. Use of a graphics processing unit (GPU) to facilitate real-time 3D graphic presentation of the patient skin-dose distribution during fluoroscopic interventional procedures

    PubMed Central

    Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R.

    2012-01-01

    We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient’s skin in real-time by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures. PMID:24027616

  10. Protein alignment algorithms with an efficient backtracking routine on multiple GPUs.

    PubMed

    Blazewicz, Jacek; Frohmberg, Wojciech; Kierzynka, Michal; Pesch, Erwin; Wojciechowski, Pawel

    2011-05-20

    Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.

  11. The normal vaginal and uterine bacterial microbiome in giant pandas (Ailuropoda melanoleuca).

    PubMed

    Yang, Xin; Cheng, Guangyang; Li, Caiwu; Yang, Jiang; Li, Jianan; Chen, Danyu; Zou, Wencheng; Jin, SenYan; Zhang, Hemin; Li, Desheng; He, Yongguo; Wang, Chengdong; Wang, Min; Wang, Hongning

    2017-06-01

    While the health effects of the colonization of the reproductive tracts of mammals by bacterial communities are widely known, there is a dearth of knowledge specifically in relation to giant panda microbiomes. In order to investigate the vaginal and uterine bacterial diversity of healthy giant pandas, we used high-throughput sequence analysis of portions of the 16S rRNA gene, based on samples taken from the vaginas (GPV group) and uteri (GPU group) of these animals. Results showed that the four most abundant phyla, which contained in excess of 98% of the total sequences, were Proteobacteria (59.2% for GPV and 51.4% for GPU), Firmicutes (34.4% for GPV and 23.3% for GPU), Actinobacteria (5.2% for GPV and 14.0% for GPU) and Bacteroidetes (0.3% for GPV and 10.3% for GPU). At the genus level, Escherichia was most abundant (11.0%) in the GPV, followed by Leuconostoc (8.7%), Pseudomonas (8.0%), Acinetobacter (7.3%), Streptococcus (6.3%) and Lactococcus (6.0%). In relation to the uterine samples, Janthinobacterium had the highest prevalence rate (20.2%), followed by Corynebacterium (13.2%), Streptococcus (19.6%), Psychrobacter (9.3%), Escherichia (7.5%) and Bacteroides (6.2%). Moreover, both Chao1 and abundance-based coverage estimator (ACE) species richness indices, which were operating at the same sequencing depth for each sample, demonstrated that GPV had more species richness than GPU, while Simpson and Shannon indices of diversity indicated that GPV had the higher bacterial diversity. These findings contribute to our understanding of the potential influence abnormal reproductive tract microbial communities have on negative pregnancy outcomes in giant pandas. Copyright © 2017 Elsevier GmbH. All rights reserved.

  12. CUDA-based high-performance computing of the S-BPF algorithm with no-waiting pipelining

    NASA Astrophysics Data System (ADS)

    Deng, Lin; Yan, Bin; Chang, Qingmei; Han, Yu; Zhang, Xiang; Xi, Xiaoqi; Li, Lei

    2015-10-01

    The backprojection-filtration (BPF) algorithm has become a good solution for local reconstruction in cone-beam computed tomography (CBCT). However, the reconstruction speed of BPF is a severe limitation for clinical applications. The selective-backprojection filtration (S-BPF) algorithm is developed to improve the parallel performance of BPF by selective backprojection. Furthermore, the general-purpose graphics processing unit (GP-GPU) is a popular tool for accelerating the reconstruction. Much work has been performed aiming for the optimization of the cone-beam back-projection. As the cone-beam back-projection process becomes faster, the data transportation holds a much bigger time proportion in the reconstruction than before. This paper focuses on minimizing the total time in the reconstruction with the S-BPF algorithm by hiding the data transportation among hard disk, CPU and GPU. And based on the analysis of the S-BPF algorithm, some strategies are implemented: (1) the asynchronous calls are used to overlap the implemention of CPU and GPU, (2) an innovative strategy is applied to obtain the DBP image to hide the transport time effectively, (3) two streams for data transportation and calculation are synchronized by the cudaEvent in the inverse of finite Hilbert transform on GPU. Our main contribution is a smart reconstruction of the S-BPF algorithm with GPU's continuous calculation and no data transportation time cost. a 5123 volume is reconstructed in less than 0.7 second on a single Tesla-based K20 GPU from 182 views projection with 5122 pixel per projection. The time cost of our implementation is about a half of that without the overlap behavior.

  13. Discovering epistasis in large scale genetic association studies by exploiting graphics cards.

    PubMed

    Chen, Gary K; Guo, Yunfei

    2013-12-03

    Despite the enormous investments made in collecting DNA samples and generating germline variation data across thousands of individuals in modern genome-wide association studies (GWAS), progress has been frustratingly slow in explaining much of the heritability in common disease. Today's paradigm of testing independent hypotheses on each single nucleotide polymorphism (SNP) marker is unlikely to adequately reflect the complex biological processes in disease risk. Alternatively, modeling risk as an ensemble of SNPs that act in concert in a pathway, and/or interact non-additively on log risk for example, may be a more sensible way to approach gene mapping in modern studies. Implementing such analyzes genome-wide can quickly become intractable due to the fact that even modest size SNP panels on modern genotype arrays (500k markers) pose a combinatorial nightmare, require tens of billions of models to be tested for evidence of interaction. In this article, we provide an in-depth analysis of programs that have been developed to explicitly overcome these enormous computational barriers through the use of processors on graphics cards known as Graphics Processing Units (GPU). We include tutorials on GPU technology, which will convey why they are growing in appeal with today's numerical scientists. One obvious advantage is the impressive density of microprocessor cores that are available on only a single GPU. Whereas high end servers feature up to 24 Intel or AMD CPU cores, the latest GPU offerings from nVidia feature over 2600 cores. Each compute node may be outfitted with up to 4 GPU devices. Success on GPUs varies across problems. However, epistasis screens fare well due to the high degree of parallelism exposed in these problems. Papers that we review routinely report GPU speedups of over two orders of magnitude (>100x) over standard CPU implementations.

  14. Use of a graphics processing unit (GPU) to facilitate real-time 3D graphic presentation of the patient skin-dose distribution during fluoroscopic interventional procedures.

    PubMed

    Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R

    2012-02-23

    We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient's skin in real-time by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures.

  15. A GPU-Parallelized Eigen-Based Clutter Filter Framework for Ultrasound Color Flow Imaging.

    PubMed

    Chee, Adrian J Y; Yiu, Billy Y S; Yu, Alfred C H

    2017-01-01

    Eigen-filters with attenuation response adapted to clutter statistics in color flow imaging (CFI) have shown improved flow detection sensitivity in the presence of tissue motion. Nevertheless, its practical adoption in clinical use is not straightforward due to the high computational cost for solving eigendecompositions. Here, we provide a pedagogical description of how a real-time computing framework for eigen-based clutter filtering can be developed through a single-instruction, multiple data (SIMD) computing approach that can be implemented on a graphical processing unit (GPU). Emphasis is placed on the single-ensemble-based eigen-filtering approach (Hankel singular value decomposition), since it is algorithmically compatible with GPU-based SIMD computing. The key algebraic principles and the corresponding SIMD algorithm are explained, and annotations on how such algorithm can be rationally implemented on the GPU are presented. Real-time efficacy of our framework was experimentally investigated on a single GPU device (GTX Titan X), and the computing throughput for varying scan depths and slow-time ensemble lengths was studied. Using our eigen-processing framework, real-time video-range throughput (24 frames/s) can be attained for CFI frames with full view in azimuth direction (128 scanlines), up to a scan depth of 5 cm ( λ pixel axial spacing) for slow-time ensemble length of 16 samples. The corresponding CFI image frames, with respect to the ones derived from non-adaptive polynomial regression clutter filtering, yielded enhanced flow detection sensitivity in vivo, as demonstrated in a carotid imaging case example. These findings indicate that the GPU-enabled eigen-based clutter filtering can improve CFI flow detection performance in real time.

  16. Accelerated GPU based SPECT Monte Carlo simulations.

    PubMed

    Garcia, Marie-Paule; Bert, Julien; Benoit, Didier; Bardiès, Manuel; Visvikis, Dimitris

    2016-06-07

    Monte Carlo (MC) modelling is widely used in the field of single photon emission computed tomography (SPECT) as it is a reliable technique to simulate very high quality scans. This technique provides very accurate modelling of the radiation transport and particle interactions in a heterogeneous medium. Various MC codes exist for nuclear medicine imaging simulations. Recently, new strategies exploiting the computing capabilities of graphical processing units (GPU) have been proposed. This work aims at evaluating the accuracy of such GPU implementation strategies in comparison to standard MC codes in the context of SPECT imaging. GATE was considered the reference MC toolkit and used to evaluate the performance of newly developed GPU Geant4-based Monte Carlo simulation (GGEMS) modules for SPECT imaging. Radioisotopes with different photon energies were used with these various CPU and GPU Geant4-based MC codes in order to assess the best strategy for each configuration. Three different isotopes were considered: (99m) Tc, (111)In and (131)I, using a low energy high resolution (LEHR) collimator, a medium energy general purpose (MEGP) collimator and a high energy general purpose (HEGP) collimator respectively. Point source, uniform source, cylindrical phantom and anthropomorphic phantom acquisitions were simulated using a model of the GE infinia II 3/8" gamma camera. Both simulation platforms yielded a similar system sensitivity and image statistical quality for the various combinations. The overall acceleration factor between GATE and GGEMS platform derived from the same cylindrical phantom acquisition was between 18 and 27 for the different radioisotopes. Besides, a full MC simulation using an anthropomorphic phantom showed the full potential of the GGEMS platform, with a resulting acceleration factor up to 71. The good agreement with reference codes and the acceleration factors obtained support the use of GPU implementation strategies for improving computational efficiency of SPECT imaging simulations.

  17. A comparison of native GPU computing versus OpenACC for implementing flow-routing algorithms in hydrological applications

    NASA Astrophysics Data System (ADS)

    Rueda, Antonio J.; Noguera, José M.; Luque, Adrián

    2016-02-01

    In recent years GPU computing has gained wide acceptance as a simple low-cost solution for speeding up computationally expensive processing in many scientific and engineering applications. However, in most cases accelerating a traditional CPU implementation for a GPU is a non-trivial task that requires a thorough refactorization of the code and specific optimizations that depend on the architecture of the device. OpenACC is a promising technology that aims at reducing the effort required to accelerate C/C++/Fortran code on an attached multicore device. Virtually with this technology the CPU code only has to be augmented with a few compiler directives to identify the areas to be accelerated and the way in which data has to be moved between the CPU and GPU. Its potential benefits are multiple: better code readability, less development time, lower risk of errors and less dependency on the underlying architecture and future evolution of the GPU technology. Our aim with this work is to evaluate the pros and cons of using OpenACC against native GPU implementations in computationally expensive hydrological applications, using the classic D8 algorithm of O'Callaghan and Mark for river network extraction as case-study. We implemented the flow accumulation step of this algorithm in CPU, using OpenACC and two different CUDA versions, comparing the length and complexity of the code and its performance with different datasets. We advance that although OpenACC can not match the performance of a CUDA optimized implementation (×3.5 slower in average), it provides a significant performance improvement against a CPU implementation (×2-6) with by far a simpler code and less implementation effort.

  18. Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units.

    PubMed

    Cawkwell, M J; Sanville, E J; Mniszewski, S M; Niklasson, Anders M N

    2012-11-13

    The self-consistent solution of a Schrödinger-like equation for the density matrix is a critical and computationally demanding step in quantum-based models of interatomic bonding. This step was tackled historically via the diagonalization of the Hamiltonian. We have investigated the performance and accuracy of the second-order spectral projection (SP2) algorithm for the computation of the density matrix via a recursive expansion of the Fermi operator in a series of generalized matrix-matrix multiplications. We demonstrate that owing to its simplicity, the SP2 algorithm [Niklasson, A. M. N. Phys. Rev. B2002, 66, 155115] is exceptionally well suited to implementation on graphics processing units (GPUs). The performance in double and single precision arithmetic of a hybrid GPU/central processing unit (CPU) and full GPU implementation of the SP2 algorithm exceed those of a CPU-only implementation of the SP2 algorithm and traditional matrix diagonalization when the dimensions of the matrices exceed about 2000 × 2000. Padding schemes for arrays allocated in the GPU memory that optimize the performance of the CUBLAS implementations of the level 3 BLAS DGEMM and SGEMM subroutines for generalized matrix-matrix multiplications are described in detail. The analysis of the relative performance of the hybrid CPU/GPU and full GPU implementations indicate that the transfer of arrays between the GPU and CPU constitutes only a small fraction of the total computation time. The errors measured in the self-consistent density matrices computed using the SP2 algorithm are generally smaller than those measured in matrices computed via diagonalization. Furthermore, the errors in the density matrices computed using the SP2 algorithm do not exhibit any dependence of system size, whereas the errors increase linearly with the number of orbitals when diagonalization is employed.

  19. Discovering epistasis in large scale genetic association studies by exploiting graphics cards

    PubMed Central

    Chen, Gary K.; Guo, Yunfei

    2013-01-01

    Despite the enormous investments made in collecting DNA samples and generating germline variation data across thousands of individuals in modern genome-wide association studies (GWAS), progress has been frustratingly slow in explaining much of the heritability in common disease. Today's paradigm of testing independent hypotheses on each single nucleotide polymorphism (SNP) marker is unlikely to adequately reflect the complex biological processes in disease risk. Alternatively, modeling risk as an ensemble of SNPs that act in concert in a pathway, and/or interact non-additively on log risk for example, may be a more sensible way to approach gene mapping in modern studies. Implementing such analyzes genome-wide can quickly become intractable due to the fact that even modest size SNP panels on modern genotype arrays (500k markers) pose a combinatorial nightmare, require tens of billions of models to be tested for evidence of interaction. In this article, we provide an in-depth analysis of programs that have been developed to explicitly overcome these enormous computational barriers through the use of processors on graphics cards known as Graphics Processing Units (GPU). We include tutorials on GPU technology, which will convey why they are growing in appeal with today's numerical scientists. One obvious advantage is the impressive density of microprocessor cores that are available on only a single GPU. Whereas high end servers feature up to 24 Intel or AMD CPU cores, the latest GPU offerings from nVidia feature over 2600 cores. Each compute node may be outfitted with up to 4 GPU devices. Success on GPUs varies across problems. However, epistasis screens fare well due to the high degree of parallelism exposed in these problems. Papers that we review routinely report GPU speedups of over two orders of magnitude (>100x) over standard CPU implementations. PMID:24348518

  20. Fast Simulation of Dynamic Ultrasound Images Using the GPU.

    PubMed

    Storve, Sigurd; Torp, Hans

    2017-10-01

    Simulated ultrasound data is a valuable tool for development and validation of quantitative image analysis methods in echocardiography. Unfortunately, simulation time can become prohibitive for phantoms consisting of a large number of point scatterers. The COLE algorithm by Gao et al. is a fast convolution-based simulator that trades simulation accuracy for improved speed. We present highly efficient parallelized CPU and GPU implementations of the COLE algorithm with an emphasis on dynamic simulations involving moving point scatterers. We argue that it is crucial to minimize the amount of data transfers from the CPU to achieve good performance on the GPU. We achieve this by storing the complete trajectories of the dynamic point scatterers as spline curves in the GPU memory. This leads to good efficiency when simulating sequences consisting of a large number of frames, such as B-mode and tissue Doppler data for a full cardiac cycle. In addition, we propose a phase-based subsample delay technique that efficiently eliminates flickering artifacts seen in B-mode sequences when COLE is used without enough temporal oversampling. To assess the performance, we used a laptop computer and a desktop computer, each equipped with a multicore Intel CPU and an NVIDIA GPU. Running the simulator on a high-end TITAN X GPU, we observed two orders of magnitude speedup compared to the parallel CPU version, three orders of magnitude speedup compared to simulation times reported by Gao et al. in their paper on COLE, and a speedup of 27000 times compared to the multithreaded version of Field II, using numbers reported in a paper by Jensen. We hope that by releasing the simulator as an open-source project we will encourage its use and further development.

  1. DIAGNOSTIC ACCURACY OF BARIUM ENEMA FINDINGS IN HIRSCHSPRUNG'S DISEASE.

    PubMed

    Peyvasteh, Mehran; Askarpour, Shahnam; Ostadian, Nasrollah; Moghimi, Mohammad-Reza; Javaherizadeh, Hazhir

    2016-01-01

    Hirschsprung's disease is the most common cause of pediatric intestinal obstruction. Contrast enema is used for evaluation of the patients with its diagnosis. To evaluate sensitivity, specificity, positive predictive value, and negative predictive value of radiologic findings for diagnosis of Hirschsprung in patients underwent barium enema. This cross sectional study was carried out in Imam Khomeini Hospital for one year starting from 2012, April. Sixty patients were enrolled. Inclusion criteria were: neonates with failure to pass meconium, abdominal distention, and refractory constipation who failed to respond with medical treatment. Transitional zone, delay in barium evacuation after 24 h, rectosigmoid index (maximum with of the rectum divided by maximum with of the sigmoid; abnormal if <1), and irregularity of mucosa (jejunization) were evaluated in barium enema. Biopsy was obtained at three locations apart above dentate line. PPV, NPV, specificity , and sensitivity was calculated for each finding. Mean age of the cases with Hirschsprung's disease and without was 17.90±18.29 months and 17.8±18.34 months respectively (p=0.983). It was confirmed in 30 (M=20, F=10) of cases. Failure to pass meconium was found in 21(70%) cases. Sensitivity, specificity, PPV, and NPV were 90%, 80%, 81.8% and 88.8% respectively for transitional zone in barium enema. Sensitivity, specificity, PPV, and NPV were 76.7%, 83.3%, 78.1% and 82.1% respectively for rectosigmoid index .Sensitivity, specificity, PPV, and NPV were 46.7%, 100%, 100% and 65.2% respectively for irregular contraction detected in barium enema. Sensitivity, specificity, PPV, and NPV were 23.3%, 100%, 100% and 56.6% respectively for mucosal irregularity in barium enema. The most sensitive finding was transitional zone. The most specific findings were irregular contraction, mucosal irregularity, and followed by cobblestone appearance. A doença de Hirschsprung é a causa mais comum de obstrução intestinal pediátrica. Enema baritado é usado para a avaliação dos pacientes com o diagnóstico . Avaliar a sensibilidade, especificidade, valor preditivo positivo e valor preditivo negativo de achados radiológicos para diagnóstico de Hirschsprung em pacientes submetidos ao enema opaco. Este estudo transversal foi realizado em Imam Khomeini Hospital por um ano a partir de abril de 2012. Sessenta pacientes foram incluídos. Os critérios de inclusão foram: recém-nascidos com insuficiência de passagem de mecônio, distensão abdominal, e constipação refratária sem resposta ao tratamento médico. Foram avaliadas no enema zona de transição, atraso na evacuação de bário após 24 h, índice retossigmoide (máximo do diâmetro do reto dividido pelo máximo do sigmóide; anormal se <1), e as irregularidades da mucosa (jejunização). Biópsia foi obtida em três localizações acima da linha dentada. VPP, VPN, especificidade e sensibilidade foram calculados para cada achado. A idade média dos casos com a doença de Hirschsprung e sem foi 17,90±18,29 meses e 17,8±18,34 meses, respectivamente (p=0,983). Confirmou-se em 30 (M=20, F=10) dos casos. Falha no mecônio foi encontrada em 21 (70%) casos. Sensibilidade, especificidade, VPP e VPN foram de 90%, 80%, 81,8% e 88,8%, respectivamente, para a zona de transição no enema. Sensibilidade, especificidade, VPP e VPN foram 76,7%, 83,3%, 78,1% e 82,1%, respectivamente para o índice de retossigmoide. Sensitividade, especificidade, VPP e VPN foram 46,7%, 100%, 100% e 65,2%, respectivamente, para contração irregular detectada no enema baritado. Sensibilidade, especificidade, VPP e VPN foram de 23,3%, 100%, 100% e 56,6%, respectivamente, para a irregularidade da mucosa. O achado mais sensível foi zona de transição. Os achados mais específicos foram contração irregular, irregularidade da mucosa, e seguido por aparecimento de mucosa em forma de paralelepípedos.

  2. Diagnosis of aggressive subtypes of eyelid basal cell carcinoma by 2-mm punch biopsy: prospective and comparative study.

    PubMed

    Rossato, Luiz Angelo; Carneiro, Rachel Camargo; Macedo, Erick Marcet Santiago de; Lima, Patrícia Picciarelli de; Miyazaki, Ahlys Ayumi; Matayoshi, Suzana

    2016-01-01

    : to compare the accuracy of preoperative 2-mm punch biopsy at one site and at two sites in the diagnosis of aggressive subtypes of eyelid basal cell carcinoma (BCC). : we randomly assigned patients to Group 1 (biopsy at one site) and Group 2 (biopsy at two sites). We compared the biopsy results to the gold standard (pathology of the surgical specimen). We calculated the sensitivity, specificity, positive predictive value, negative predictive value, accuracy and Kappa coefficient to determine the level of agreement in both groups. : we analyzed 105 lesions (Group 1: n = 44; Group 2: n = 61). The agreement was 54.5% in Group 1 and 73.8% in Group 2 (p = 0.041). There was no significant difference between the groups regarding the distribution of quantitative and qualitative variables (gender, age, disease duration, tumor larger diameter, area and commitment of margins). Biopsy at two sites was two times more likely to agree with the gold standard than the biopsy of a single site. : the accuracy and the performance indicators were better for 2-mm punch biopsy in two sites than in one site for the diagnosis of aggressive subtypes of eyelid BCC. comparar a acurácia da biópsia pré-operatória por trépano de 2mm em um sítio e em dois sítios no diagnóstico dos subtipos agressivos de carcinoma basocelular (CBC) palpebral. os pacientes foram distribuídos aleatoriamente em Grupo 1 (biópsia em um sítio) e Grupo 2 (biópsia em dois sítios). Os resultados das biópsias foram comparados com o padrão-ouro (exame anatomopatológico da peça cirúrgica). A sensibilidade, especificidade, valor preditivo positivo, valor preditivo negativo, precisão e coeficiente Kappa foram calculados para determinar o nível de concordância nos dois grupos. foram analisadas 105 lesões (Grupo 1: n = 44; Grupo 2: n = 61). A concordância foi de 54,5% no Grupo 1 e 73,8% no Grupo 2 (p-valor = 0,041). Não houve diferença significativa entre os grupos quanto à distribuição das variáveis quantitativas e qualitativas (sexo, idade, duração da doença, maior diâmetro do tumor, área e comprometimento de margens). A biópsia em dois sítios mostrou duas vezes mais chance de concordar com o padrão-ouro do que a biópsia de um sítio. a acurácia e os indicadores de desempenho foram melhores para a biópsia por trépano de 2 mm em dois sítios do que em um sítio para o diagnóstico dos subtipos agressivos de CBC palpebral.

  3. Teamwork in a coronary care unit: facilitating and hindering aspects.

    PubMed

    Goulart, Bethania Ferreira; Camelo, Silvia Helena Henriques; Simões, Ana Lúcia de Assis; Chaves, Lucieli Dias Pedreschi

    2016-01-01

    To identify, within a multidisciplinary team, the facilitating and hindering aspects for teamwork in a coronary care unit. A descriptive study, with qualitative and quantitative data, was carried out in the coronary care unit of a public hospital. The study population consisted of professionals working in the unit for at least one year. Those who were on leave or who were not located were excluded. The critical incident technique was used for data collection, by means of semi-structured interviews. For data analysis, content analysis and the critical incident technique were applied. Participants were 45 professionals: 29 nursing professionals; 11 physicians; 4 physical therapists; and 1 psychologist. A total of 49 situations (77.6% with negative references); 385 behaviors (54.2% with positive references); and 182 consequences emerged (71.9% with negative references). Positive references facilitate teamwork, whereas negative references hinder it. A collaborative/communicative interprofessional relationship was evidenced as a facilitator; whereas poor collaboration among agents/inadequate management was a hindering aspect. Despite the prevalence of negative situations and consequences, the emphasis on positive behaviors reveals the efforts the agents make in order to overcome obstacles and carry out teamwork. Identificar, junto à equipe multiprofissional, aspectos facilitadores e dificultadores do trabalho em equipe em Unidade Coronariana. Estudo descritivo, com dados qualitativos e quantitativos, realizado em Unidade Coronariana/Hospital público. População constituída de profissionais atuantes na Unidade há, pelo menos, um ano. Excluídos os afastados do trabalho e os que não foram não localizados. Para a coleta de informações, utilizou-se da Técnica do Incidente Crítico por meio de entrevista semiestruturada. Para a análise dos dados, utilizaram-se da Análise de Conteúdo e Técnica do Incidente Crítico. Participaram 45 profissionais: 29 profissionais de enfermagem; 11 médicos; quatro fisioterapeutas e um psicólogo. Emergiram 49 situações (77,6% com referências negativas); 385 comportamentos (54,2% com referências positivas); e 182 consequências (71,9% com referências negativas). Referências positivas facilitam o trabalho em equipe, e as negativas o dificultam. Relacionamento interprofissional colaborativo/comunicativo foi evidenciado como facilitador; baixa colaboração entre agentes/gerenciamento inadequado como dificultador. Apesar de predominarem situações e consequências negativas, ênfase em comportamentos positivos revela esforço dos agentes para vencer obstáculos e realizar trabalho em equipe.

  4. On efeito do achatamento nos pontos de equilíbrio e na dinâmica de sistemas coorbitais

    NASA Astrophysics Data System (ADS)

    Mourão, D. C.; Winter, O. C.; Yokoyama, T.

    2003-08-01

    Neste trabalho analisamos o efeito do achatamento do corpo principal nos pontos de equilíbrio lagrangianos e na configuração de órbitas girino-ferradura. Enfatizamos os sistemas coorbitais de satélites de Saturno, pois se encontram em relativa proximidade com o planeta, em que o efeito do achatamento se torna mais evidente. O estudo é dividido em três etapas independentes. Na primeira fase analisamos as equações de movimento do problema restrito de três corpos considerando o efeito do achatamento, e através do balanceamento de forças buscamos a nova configuração dos pontos de equilíbrio lagrangianos. Concluímos, nesta etapa, que os pontos de equilíbrio estáveis apresentam um pequeno deslocamento definido pelo parâmetro de achatamento, não podendo ser mais representados por triângulos eqüiláteros. Aplicamos este resultado aos satélites coorbitais de Tetis e Dione, encontrando as posições de equilíbrio levemente deslocadas em relação ao caso sem achatamento. Na segunda fase visamos o sistema Saturno-Jano-Epimeteu, que por se tratar de um sistema de massas comparáveis, optamos por desenvolver as equações de Yoder et al (Icarus 53, pág 431-443, 1983), que permitem determinar os pontos de equilíbrio e a amplitude de oscilação angular das órbitas girino-ferradura para o problema não-restrito de três corpos, porém, no nosso estudo consideramos o efeito do achatamento do corpo principal nestas equações. Encontramos que a distância angular entre satélites, quando em posição de equilíbrio estável, diminui quanto maior for o parâmetro de achatamento do corpo principal. Além disso, a órbita de transição girino-ferradura possui largura angular menor em relação ao caso sem achatamento. Por fim, realizamos integrações numéricas para os casos reais de coorbitais de Saturno comparando com os resultados analíticos. Nestas integrações simulamos diversas órbitas girino-ferradura com diferentes parâmetros de achatamento, utilizando condições iniciais corrigidas para a presença do achatamento.

  5. Burnout Syndrome prevalence of on-call surgeons in a trauma reference hospital and its correlation with weekly workload: cross-sectional study.

    PubMed

    Novais, Rodrigo Nobre DE; Rocha, Louise Matos; Eloi, Raissa Jardelino; Santos, Luciano Menezes Dos; Ribeiro, Marina Viegas Moura Rezende; Ramos, Fernando Wagner DA Silva; Lima, Fernando José Camello DE; Sousa-Rodrigues, Célio Fernando DE; Barbosa, Fabiano Timbó

    2016-01-01

    to determine the prevalence of Burnout Syndrome (BS) for surgeons working in referral hospital for trauma in Maceio and to evaluate the possible correlation between BS and weekly workload. cross-sectional study with 43 on-call surgeons at Professor Osvaldo Brandão Vilela General State Hospital, Maceió, between July and December, 2015. A self-administered form was used to evaluate BS through the Maslach Burnout Inventory (MBI) and socio-demographic characteristics among participants. Spearman's S test was used to compare BS and weekly workload. Significant level was 5%. among the surgeons studied, 95.35% were male and the mean age was 43.9 ± 8.95 years. The mean weekly workload on call in trauma was 33.90 ± 16.82 hours. The frequency of high scores in at least one of the three dimensions of MBI was 46.5%. Professional achievement was correlated with weekly workload (P = 0.020). the prevalence of Burnout Syndrome among on-call surgeons in referral hospital for trauma was 46.5%. In this sample there was correlation between weekly workload and the Burnout Syndrome. determinar a prevalência da Síndrome de Burnout (SB) em médicos cirurgiões que trabalham em hospital de referência para o trauma em Maceió e avaliar a possível correlação entre SB e a carga horária semanal de trabalho. estudo transversal com 43 cirurgiões de plantão do Hospital Geral do Estado Professor Osvaldo Brandão Vilela, Maceió, entre julho e dezembro de 2015. Um formulário autoadministrado foi utilizado para avaliar SB por meio do Maslach Burnout Inventory (MBI) e as características sociodemográficas entre os participantes. Foi utilizado o teste de Spearman S para comparar SB e carga horária semanal. O nível de significância foi 5%. entre os cirurgiões estudados, 95,35% eram do sexo masculino e a média de idade foi 43,9±8,95 anos. A média da carga horária semanal de plantão no trauma foi 33,90±16,82 horas. A frequência de pontuações elevadas em pelo menos uma das três dimensões do MBI foi 46,5%. Realização profissional foi correlacionada com a carga de trabalho semanal (P=0,020). a prevalência da Síndrome de Burnout entre cirurgiões plantonistas em hospital de referência para o trauma foi 46,5%. Nesta amostra houve correlação entre a carga horária semanal de trabalho e a Síndrome de Burnout.

  6. Smoking control: challenges and achievements.

    PubMed

    Silva, Luiz Carlos Corrêa da; Araújo, Alberto José de; Queiroz, Ângela Maria Dias de; Sales, Maria da Penha Uchoa; Castellano, Maria Vera Cruz de Oliveira

    2016-01-01

    Smoking is the most preventable and controllable health risk. Therefore, all health care professionals should give their utmost attention to and be more focused on the problem of smoking. Tobacco is a highly profitable product, because of its large-scale production and great number of consumers. Smoking control policies and treatment resources for smoking cessation have advanced in recent years, showing highly satisfactory results, particularly in Brazil. However, there is yet a long way to go before smoking can be considered a controlled disease from a public health standpoint. We can already perceive that the behavior of our society regarding smoking is changing, albeit slowly. Therefore, pulmonologists have a very promising area in which to work with their patients and the general population. We must act with greater impetus in support of health care policies and social living standards that directly contribute to improving health and quality of life. In this respect, pulmonologists can play a greater role as they get more involved in treating smokers, strengthening anti-smoking laws, and demanding health care policies related to lung diseases. RESUMO O tabagismo é o fator de risco mais prevenível e controlável em saúde e, por isso, precisa ter a máxima atenção e ser muito mais enfocado por todos os profissionais da saúde. O tabaco é um produto de alta rentabilidade pela sua grande produção e pelo elevado número de consumidores. As políticas de controle e os recursos terapêuticos para o tabagismo avançaram muito nos últimos anos e têm mostrado resultados altamente satisfatórios, particularmente no Brasil. Entretanto, ainda resta um longo caminho a ser percorrido para que se possa considerar o tabagismo como uma doença controlada sob o ponto de vista da saúde pública. Já se observam modificações do comportamento da sociedade com relação ao tabagismo, mas ainda em escala muito lenta, de modo que os pneumologistas têm nesse setor um campo muito promissor para atuar junto a seus pacientes e a população em geral. É preciso atuar com maior ímpeto em prol das políticas de saúde e das normas de convívio social que contribuem diretamente para melhorar a saúde e a vida. Nesse aspecto, os pneumologistas podem ter um papel de maior destaque na medida em que se envolvam com o tratamento dos fumantes, a aplicação da lei antifumo e as políticas de saúde relacionadas às doenças respiratórias.

  7. EXPERIENCE WITH THE BRAZILIAN NETWORK FOR STUDIES IN REPRODUCTIVE AND PERINATAL HEALTH: THE POWER OF COLLABORATION IN POSTGRADUATE PROGRAMS.

    PubMed

    Cecatti, José G; Silveira, Carla; Souza, Renato T; Fernandes, Karayna G; Surita, Fernanda G

    2015-01-01

    The scientific collaboration in networks may be developed among countries, academic institutions and among peer researchers. Once established, they contribute for knowledge dissemination and a strong structure for research in health. Several advantages are attributed to working in networks: the inclusion of a higher number of subjects in the studies; generation of stronger evidence with a higher representativeness of the population (secondary generalization and external validity); higher likelihood of articles derived from these studies to be accepted in high impact journals with a wide coverage; a higher likelihood of obtaining budgets for sponsorship; easier data collection on rare conditions; inclusions of subjects from different ethnic groups and cultures, among others. In Brazil, the Brazilian Network for Studies on Reproductive and Perinatal Health was created in 2008 with the initial purpose of developing a national network of scientific cooperation for the surveillance of severe maternal morbidity. Since the establishment of this Network, five studies were developed, some of them already finished and others almost being completed, and two new ones being implemented. Results of the activities in this Network have been very productive and with a positive impact on not only the Postgraduate Program of Obstetrics and Gynecology from the University of Campinas, its coordinating center, but also on other participating centers. A considerable number of scientific articles was published, master´s dissertations and PhD theses were presented, and post-doctorate programs were performed, including students from several areas of health, from distinct regions and from several institutions of the whole country. This represents a high social impact taking into account the relevance of the studied topics for the country. As colaborações científicas em rede podem ocorrer entre países, instituições acadêmicas e entre pares de pesquisadores e, uma vez estabelecidas, contribuem para a disseminação do conhecimento e estruturação da pesquisa em saúde. Diversas vantagens são atribuídas ao trabalho em rede como: a inclusão de maior número de participantes nos estudos; gerar evidências mais fortes e com maior representatividade da população (generalização secundária e validade externa); maior facilidade das publicações oriundas dos estudos serem aceitas em periódicos de impacto e abrangência; maior probabilidade de obtenção de verbas para financiamento; maior facilidade na coleta de dados sobre condições raras; inclusão de participantes de diferentes grupos étnicos e culturas, entre outras. No Brasil a Rede Brasileira de Estudos em Saúde Reprodutiva e Perinatal foi criada em 2008 com o objetivo inicial de desenvolver rede nacional de cooperação científica para vigilância da morbidade materna grave. Desde sua formação, cinco estudos foram desenvolvidos, alguns já encerrados e outros em fase de finalização, com outros dois em fase final de implantação. Os resultados das atividades desta Rede têm sido bastante produtivos e impactaram positivamente não apenas no Programa de Pós-Graduação em Tocoginecologia da Universidade Estadual de Campinas, seu centro coordenador, mas também o de outros centros participantes, uma vez que expressivo número de artigos científicos foi publicado, mestrados e doutorados foram defendidos e pós-doutorados finalizados, de alunos de diversas áreas da saúde, de diferentes regiões e de várias instituições de todo o país, com alto impacto social dada a relevância dos temas estudados para o país.

  8. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

    PubMed

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.

  9. Symplectic multi-particle tracking on GPUs

    NASA Astrophysics Data System (ADS)

    Liu, Zhicong; Qiang, Ji

    2018-05-01

    A symplectic multi-particle tracking model is implemented on the Graphic Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) language. The symplectic tracking model can preserve phase space structure and reduce non-physical effects in long term simulation, which is important for beam property evaluation in particle accelerators. Though this model is computationally expensive, it is very suitable for parallelization and can be accelerated significantly by using GPUs. In this paper, we optimized the implementation of the symplectic tracking model on both single GPU and multiple GPUs. Using a single GPU processor, the code achieves a factor of 2-10 speedup for a range of problem sizes compared with the time on a single state-of-the-art Central Processing Unit (CPU) node with similar power consumption and semiconductor technology. It also shows good scalability on a multi-GPU cluster at Oak Ridge Leadership Computing Facility. In an application to beam dynamics simulation, the GPU implementation helps save more than a factor of two total computing time in comparison to the CPU implementation.

  10. Particle-in-cell simulations with charge-conserving current deposition on graphic processing units

    NASA Astrophysics Data System (ADS)

    Ren, Chuang; Kong, Xianglong; Huang, Michael; Decyk, Viktor; Mori, Warren

    2011-10-01

    Recently using CUDA, we have developed an electromagnetic Particle-in-Cell (PIC) code with charge-conserving current deposition for Nvidia graphic processing units (GPU's) (Kong et al., Journal of Computational Physics 230, 1676 (2011). On a Tesla M2050 (Fermi) card, the GPU PIC code can achieve a one-particle-step process time of 1.2 - 3.2 ns in 2D and 2.3 - 7.2 ns in 3D, depending on plasma temperatures. In this talk we will discuss novel algorithms for GPU-PIC including charge-conserving current deposition scheme with few branching and parallel particle sorting. These algorithms have made efficient use of the GPU shared memory. We will also discuss how to replace the computation kernels of existing parallel CPU codes while keeping their parallel structures. This work was supported by U.S. Department of Energy under Grant Nos. DE-FG02-06ER54879 and DE-FC02-04ER54789 and by NSF under Grant Nos. PHY-0903797 and CCF-0747324.

  11. GPU computing of compressible flow problems by a meshless method with space-filling curves

    NASA Astrophysics Data System (ADS)

    Ma, Z. H.; Wang, H.; Pu, S. H.

    2014-04-01

    A graphic processing unit (GPU) implementation of a meshless method for solving compressible flow problems is presented in this paper. Least-square fit is used to discretize the spatial derivatives of Euler equations and an upwind scheme is applied to estimate the flux terms. The compute unified device architecture (CUDA) C programming model is employed to efficiently and flexibly port the meshless solver from CPU to GPU. Considering the data locality of randomly distributed points, space-filling curves are adopted to re-number the points in order to improve the memory performance. Detailed evaluations are firstly carried out to assess the accuracy and conservation property of the underlying numerical method. Then the GPU accelerated flow solver is used to solve external steady flows over aerodynamic configurations. Representative results are validated through extensive comparisons with the experimental, finite volume or other available reference solutions. Performance analysis reveals that the running time cost of simulations is significantly reduced while impressive (more than an order of magnitude) speedups are achieved.

  12. GPU-based streaming architectures for fast cone-beam CT image reconstruction and demons deformable registration.

    PubMed

    Sharp, G C; Kandasamy, N; Singh, H; Folkert, M

    2007-10-07

    This paper shows how to significantly accelerate cone-beam CT reconstruction and 3D deformable image registration using the stream-processing model. We describe data-parallel designs for the Feldkamp, Davis and Kress (FDK) reconstruction algorithm, and the demons deformable registration algorithm, suitable for use on a commodity graphics processing unit. The streaming versions of these algorithms are implemented using the Brook programming environment and executed on an NVidia 8800 GPU. Performance results using CT data of a preserved swine lung indicate that the GPU-based implementations of the FDK and demons algorithms achieve a substantial speedup--up to 80 times for FDK and 70 times for demons when compared to an optimized reference implementation on a 2.8 GHz Intel processor. In addition, the accuracy of the GPU-based implementations was found to be excellent. Compared with CPU-based implementations, the RMS differences were less than 0.1 Hounsfield unit for reconstruction and less than 0.1 mm for deformable registration.

  13. Power and Performance Trade-offs for Space Time Adaptive Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gawande, Nitin A.; Manzano Franco, Joseph B.; Tumeo, Antonino

    Computational efficiency – performance relative to power or energy – is one of the most important concerns when designing RADAR processing systems. This paper analyzes power and performance trade-offs for a typical Space Time Adaptive Processing (STAP) application. We study STAP implementations for CUDA and OpenMP on two computationally efficient architectures, Intel Haswell Core I7-4770TE and NVIDIA Kayla with a GK208 GPU. We analyze the power and performance of STAP’s computationally intensive kernels across the two hardware testbeds. We also show the impact and trade-offs of GPU optimization techniques. We show that data parallelism can be exploited for efficient implementationmore » on the Haswell CPU architecture. The GPU architecture is able to process large size data sets without increase in power requirement. The use of shared memory has a significant impact on the power requirement for the GPU. A balance between the use of shared memory and main memory access leads to an improved performance in a typical STAP application.« less

  14. The density matrix renormalization group algorithm on kilo-processor architectures: Implementation and trade-offs

    NASA Astrophysics Data System (ADS)

    Nemes, Csaba; Barcza, Gergely; Nagy, Zoltán; Legeza, Örs; Szolgay, Péter

    2014-06-01

    In the numerical analysis of strongly correlated quantum lattice models one of the leading algorithms developed to balance the size of the effective Hilbert space and the accuracy of the simulation is the density matrix renormalization group (DMRG) algorithm, in which the run-time is dominated by the iterative diagonalization of the Hamilton operator. As the most time-dominant step of the diagonalization can be expressed as a list of dense matrix operations, the DMRG is an appealing candidate to fully utilize the computing power residing in novel kilo-processor architectures. In the paper a smart hybrid CPU-GPU implementation is presented, which exploits the power of both CPU and GPU and tolerates problems exceeding the GPU memory size. Furthermore, a new CUDA kernel has been designed for asymmetric matrix-vector multiplication to accelerate the rest of the diagonalization. Besides the evaluation of the GPU implementation, the practical limits of an FPGA implementation are also discussed.

  15. GPU-Based Real-Time Volumetric Ultrasound Image Reconstruction for a Ring Array

    PubMed Central

    Choe, Jung Woo; Nikoozadeh, Amin; Oralkan, Ömer; Khuri-Yakub, Butrus T.

    2014-01-01

    Synthetic phased array (SPA) beamforming with Hadamard coding and aperture weighting is an optimal option for real-time volumetric imaging with a ring array, a particularly attractive geometry in intracardiac and intravascular applications. However, the imaging frame rate of this method is limited by the immense computational load required in synthetic beamforming. For fast imaging with a ring array, we developed graphics processing unit (GPU)-based, real-time image reconstruction software that exploits massive data-level parallelism in beamforming operations. The GPU-based software reconstructs and displays three cross-sectional images at 45 frames per second (fps). This frame rate is 4.5 times higher than that for our previously-developed multi-core CPU-based software. In an alternative imaging mode, it shows one B-mode image rotating about the axis and its maximum intensity projection (MIP), processed at a rate of 104 fps. This paper describes the image reconstruction procedure on the GPU platform and presents the experimental images obtained using this software. PMID:23529080

  16. Accelerating a three-dimensional eco-hydrological cellular automaton on GPGPU with OpenCL

    NASA Astrophysics Data System (ADS)

    Senatore, Alfonso; D'Ambrosio, Donato; De Rango, Alessio; Rongo, Rocco; Spataro, William; Straface, Salvatore; Mendicino, Giuseppe

    2016-10-01

    This work presents an effective implementation of a numerical model for complete eco-hydrological Cellular Automata modeling on Graphical Processing Units (GPU) with OpenCL (Open Computing Language) for heterogeneous computation (i.e., on CPUs and/or GPUs). Different types of parallel implementations were carried out (e.g., use of fast local memory, loop unrolling, etc), showing increasing performance improvements in terms of speedup, adopting also some original optimizations strategies. Moreover, numerical analysis of results (i.e., comparison of CPU and GPU outcomes in terms of rounding errors) have proven to be satisfactory. Experiments were carried out on a workstation with two CPUs (Intel Xeon E5440 at 2.83GHz), one GPU AMD R9 280X and one GPU nVIDIA Tesla K20c. Results have been extremely positive, but further testing should be performed to assess the functionality of the adopted strategies on other complete models and their ability to fruitfully exploit parallel systems resources.

  17. High Performance Computing of Meshless Time Domain Method on Multi-GPU Cluster

    NASA Astrophysics Data System (ADS)

    Ikuno, Soichiro; Nakata, Susumu; Hirokawa, Yuta; Itoh, Taku

    2015-01-01

    High performance computing of Meshless Time Domain Method (MTDM) on multi-GPU using the supercomputer HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) at University of Tsukuba is investigated. Generally, the finite difference time domain (FDTD) method is adopted for the numerical simulation of the electromagnetic wave propagation phenomena. However, the numerical domain must be divided into rectangle meshes, and it is difficult to adopt the problem in a complexed domain to the method. On the other hand, MTDM can be easily adept to the problem because MTDM does not requires meshes. In the present study, we implement MTDM on multi-GPU cluster to speedup the method, and numerically investigate the performance of the method on multi-GPU cluster. To reduce the computation time, the communication time between the decomposed domain is hided below the perfect matched layer (PML) calculation procedure. The results of computation show that speedup of MTDM on 128 GPUs is 173 times faster than that of single CPU calculation.

  18. DEM GPU studies of industrial scale particle simulations for granular flow civil engineering applications

    NASA Astrophysics Data System (ADS)

    Pizette, Patrick; Govender, Nicolin; Wilke, Daniel N.; Abriak, Nor-Edine

    2017-06-01

    The use of the Discrete Element Method (DEM) for industrial civil engineering industrial applications is currently limited due to the computational demands when large numbers of particles are considered. The graphics processing unit (GPU) with its highly parallelized hardware architecture shows potential to enable solution of civil engineering problems using discrete granular approaches. We demonstrate in this study the pratical utility of a validated GPU-enabled DEM modeling environment to simulate industrial scale granular problems. As illustration, the flow discharge of storage silos using 8 and 17 million particles is considered. DEM simulations have been performed to investigate the influence of particle size (equivalent size for the 20/40-mesh gravel) and induced shear stress for two hopper shapes. The preliminary results indicate that the shape of the hopper significantly influences the discharge rates for the same material. Specifically, this work shows that GPU-enabled DEM modeling environments can model industrial scale problems on a single portable computer within a day for 30 seconds of process time.

  19. Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods

    PubMed Central

    Smith, David S.; Gore, John C.; Yankeelov, Thomas E.; Welch, E. Brian

    2012-01-01

    Compressive sensing (CS) has been shown to enable dramatic acceleration of MRI acquisition in some applications. Being an iterative reconstruction technique, CS MRI reconstructions can be more time-consuming than traditional inverse Fourier reconstruction. We have accelerated our CS MRI reconstruction by factors of up to 27 by using a split Bregman solver combined with a graphics processing unit (GPU) computing platform. The increases in speed we find are similar to those we measure for matrix multiplication on this platform, suggesting that the split Bregman methods parallelize efficiently. We demonstrate that the combination of the rapid convergence of the split Bregman algorithm and the massively parallel strategy of GPU computing can enable real-time CS reconstruction of even acquisition data matrices of dimension 40962 or more, depending on available GPU VRAM. Reconstruction of two-dimensional data matrices of dimension 10242 and smaller took ~0.3 s or less, showing that this platform also provides very fast iterative reconstruction for small-to-moderate size images. PMID:22481908

  20. Stochastic DT-MRI connectivity mapping on the GPU.

    PubMed

    McGraw, Tim; Nadar, Mariappan

    2007-01-01

    We present a method for stochastic fiber tract mapping from diffusion tensor MRI (DT-MRI) implemented on graphics hardware. From the simulated fibers we compute a connectivity map that gives an indication of the probability that two points in the dataset are connected by a neuronal fiber path. A Bayesian formulation of the fiber model is given and it is shown that the inversion method can be used to construct plausible connectivity. An implementation of this fiber model on the graphics processing unit (GPU) is presented. Since the fiber paths can be stochastically generated independently of one another, the algorithm is highly parallelizable. This allows us to exploit the data-parallel nature of the GPU fragment processors. We also present a framework for the connectivity computation on the GPU. Our implementation allows the user to interactively select regions of interest and observe the evolving connectivity results during computation. Results are presented from the stochastic generation of over 250,000 fiber steps per iteration at interactive frame rates on consumer-grade graphics hardware.

Top