Science.gov

Sample records for microarray preprocessing algorithms

  1. affyPara-a Bioconductor Package for Parallelized Preprocessing Algorithms of Affymetrix Microarray Data.

    PubMed

    Schmidberger, Markus; Vicedo, Esmeralda; Mansmann, Ulrich

    2009-07-22

    Microarray data repositories as well as large clinical applications of gene expression allow to analyse several hundreds of microarrays at one time. The preprocessing of large amounts of microarrays is still a challenge. The algorithms are limited by the available computer hardware. For example, building classification or prognostic rules from large microarray sets will be very time consuming. Here, preprocessing has to be a part of the cross-validation and resampling strategy which is necessary to estimate the rule's prediction quality honestly.This paper proposes the new Bioconductor package affyPara for parallelized preprocessing of Affymetrix microarray data. Partition of data can be applied on arrays and parallelization of algorithms is a straightforward consequence. The partition of data and distribution to several nodes solves the main memory problems and accelerates preprocessing by up to the factor 20 for 200 or more arrays.affyPara is a free and open source package, under GPL license, available form the Bioconductor project at www.bioconductor.org. A user guide and examples are provided with the package.

  2. A fully scalable online pre-processing algorithm for short oligonucleotide microarray atlases

    PubMed Central

    Lahti, Leo; Torrente, Aurora; Elo, Laura L.; Brazma, Alvis; Rung, Johan

    2013-01-01

    Rapid accumulation of large and standardized microarray data collections is opening up novel opportunities for holistic characterization of genome function. The limited scalability of current preprocessing techniques has, however, formed a bottleneck for full utilization of these data resources. Although short oligonucleotide arrays constitute a major source of genome-wide profiling data, scalable probe-level techniques have been available only for few platforms based on pre-calculated probe effects from restricted reference training sets. To overcome these key limitations, we introduce a fully scalable online-learning algorithm for probe-level analysis and pre-processing of large microarray atlases involving tens of thousands of arrays. In contrast to the alternatives, our algorithm scales up linearly with respect to sample size and is applicable to all short oligonucleotide platforms. The model can use the most comprehensive data collections available to date to pinpoint individual probes affected by noise and biases, providing tools to guide array design and quality control. This is the only available algorithm that can learn probe-level parameters based on sequential hyperparameter updates at small consecutive batches of data, thus circumventing the extensive memory requirements of the standard approaches and opening up novel opportunities to take full advantage of contemporary microarray collections. PMID:23563154

  3. Comparison of microarray preprocessing methods.

    PubMed

    Shakya, K; Ruskin, H J; Kerr, G; Crane, M; Becker, J

    2010-01-01

    Data preprocessing in microarray technology is a crucial initial step before data analysis is performed. Many preprocessing methods have been proposed but none has proved to be ideal to date. Frequently, datasets are limited by laboratory constraints so that the need is for guidelines on quality and robustness, to inform further experimentation while data are yet restricted. In this paper, we compared the performance of four popular methods, namely MAS5, Li & Wong pmonly (LWPM), Li & Wong subtractMM (LWMM), and Robust Multichip Average (RMA). The comparison is based on the analysis carried out on sets of laboratory-generated data from the Bioinformatics Lab, National Institute of Cellular Biotechnology (NICB), Dublin City University, Ireland. These experiments were designed to examine the effect of Bromodeoxyuridine (5-bromo-2-deoxyuridine, BrdU) treatment in deep lamellar keratoplasty (DLKP) cells. The methodology employed is to assess dispersion across the replicates and analyze the false discovery rate. From the dispersion analysis, we found that variability is reduced more effectively by LWPM and RMA methods. From the false positive analysis, and for both parametric and nonparametric approaches, LWMM is found to perform best. Based on a complementary q-value analysis, LWMM approach again is the strongest candidate. The indications are that, while LWMM is marginally less effective than LWPM and RMA in terms of variance reduction, it has considerably improved discrimination overall.

  4. Affymetrix GeneChip microarray preprocessing for multivariate analyses.

    PubMed

    McCall, Matthew N; Almudevar, Anthony

    2012-09-01

    Affymetrix GeneChip microarrays are the most widely used high-throughput technology to measure gene expression, and a wide variety of preprocessing methods have been developed to transform probe intensities reported by a microarray scanner into gene expression estimates. There have been numerous comparisons of these preprocessing methods, focusing on the most common analyses-detection of differential expression and gene or sample clustering. Recently, more complex multivariate analyses, such as gene co-expression, differential co-expression, gene set analysis and network modeling, are becoming more common; however, the same preprocessing methods are typically applied. In this article, we examine the effect of preprocessing methods on some of these multivariate analyses and provide guidance to the user as to which methods are most appropriate.

  5. A review of statistical methods for preprocessing oligonucleotide microarrays.

    PubMed

    Wu, Zhijin

    2009-12-01

    Microarrays have become an indispensable tool in biomedical research. This powerful technology not only makes it possible to quantify a large number of nucleic acid molecules simultaneously, but also produces data with many sources of noise. A number of preprocessing steps are therefore necessary to convert the raw data, usually in the form of hybridisation images, to measures of biological meaning that can be used in further statistical analysis. Preprocessing of oligonucleotide arrays includes image processing, background adjustment, data normalisation/transformation and sometimes summarisation when multiple probes are used to target one genomic unit. In this article, we review the issues encountered in each preprocessing step and introduce the statistical models and methods in preprocessing.

  6. Micro-Analyzer: automatic preprocessing of Affymetrix microarray data.

    PubMed

    Guzzi, Pietro Hiram; Cannataro, Mario

    2013-08-01

    A current trend in genomics is the investigation of the cell mechanism using different technologies, in order to explain the relationship among genes, molecular processes and diseases. For instance, the combined use of gene-expression arrays and genomic arrays has been demonstrated as an effective instrument in clinical practice. Consequently, in a single experiment different kind of microarrays may be used, resulting in the production of different types of binary data (images and textual raw data). The analysis of microarray data requires an initial preprocessing phase, that makes raw data suitable for use on existing analysis platforms, such as the TIGR M4 (TM4) Suite. An additional challenge to be faced by emerging data analysis platforms is the ability to treat in a combined way those different microarray formats coupled with clinical data. In fact, resulting integrated data may include both numerical and symbolic data (e.g. gene expression and SNPs regarding molecular data), as well as temporal data (e.g. the response to a drug, time to progression and survival rate), regarding clinical data. Raw data preprocessing is a crucial step in analysis but is often performed in a manual and error prone way using different software tools. Thus novel, platform independent, and possibly open source tools enabling the semi-automatic preprocessing and annotation of different microarray data are needed. The paper presents Micro-Analyzer (Microarray Analyzer), a cross-platform tool for the automatic normalization, summarization and annotation of Affymetrix gene expression and SNP binary data. It represents the evolution of the μ-CS tool, extending the preprocessing to SNP arrays that were not allowed in μ-CS. The Micro-Analyzer is provided as a Java standalone tool and enables users to read, preprocess and analyse binary microarray data (gene expression and SNPs) by invoking TM4 platform. It avoids: (i) the manual invocation of external tools (e.g. the Affymetrix Power

  7. Micro-Analyzer: automatic preprocessing of Affymetrix microarray data.

    PubMed

    Guzzi, Pietro Hiram; Cannataro, Mario

    2013-08-01

    A current trend in genomics is the investigation of the cell mechanism using different technologies, in order to explain the relationship among genes, molecular processes and diseases. For instance, the combined use of gene-expression arrays and genomic arrays has been demonstrated as an effective instrument in clinical practice. Consequently, in a single experiment different kind of microarrays may be used, resulting in the production of different types of binary data (images and textual raw data). The analysis of microarray data requires an initial preprocessing phase, that makes raw data suitable for use on existing analysis platforms, such as the TIGR M4 (TM4) Suite. An additional challenge to be faced by emerging data analysis platforms is the ability to treat in a combined way those different microarray formats coupled with clinical data. In fact, resulting integrated data may include both numerical and symbolic data (e.g. gene expression and SNPs regarding molecular data), as well as temporal data (e.g. the response to a drug, time to progression and survival rate), regarding clinical data. Raw data preprocessing is a crucial step in analysis but is often performed in a manual and error prone way using different software tools. Thus novel, platform independent, and possibly open source tools enabling the semi-automatic preprocessing and annotation of different microarray data are needed. The paper presents Micro-Analyzer (Microarray Analyzer), a cross-platform tool for the automatic normalization, summarization and annotation of Affymetrix gene expression and SNP binary data. It represents the evolution of the μ-CS tool, extending the preprocessing to SNP arrays that were not allowed in μ-CS. The Micro-Analyzer is provided as a Java standalone tool and enables users to read, preprocess and analyse binary microarray data (gene expression and SNPs) by invoking TM4 platform. It avoids: (i) the manual invocation of external tools (e.g. the Affymetrix Power

  8. User-friendly solutions for microarray quality control and pre-processing on ArrayAnalysis.org.

    PubMed

    Eijssen, Lars M T; Jaillard, Magali; Adriaens, Michiel E; Gaj, Stan; de Groot, Philip J; Müller, Michael; Evelo, Chris T

    2013-07-01

    Quality control (QC) is crucial for any scientific method producing data. Applying adequate QC introduces new challenges in the genomics field where large amounts of data are produced with complex technologies. For DNA microarrays, specific algorithms for QC and pre-processing including normalization have been developed by the scientific community, especially for expression chips of the Affymetrix platform. Many of these have been implemented in the statistical scripting language R and are available from the Bioconductor repository. However, application is hampered by lack of integrative tools that can be used by users of any experience level. To fill this gap, we developed a freely available tool for QC and pre-processing of Affymetrix gene expression results, extending, integrating and harmonizing functionality of Bioconductor packages. The tool can be easily accessed through a wizard-like web portal at http://www.arrayanalysis.org or downloaded for local use in R. The portal provides extensive documentation, including user guides, interpretation help with real output illustrations and detailed technical documentation. It assists newcomers to the field in performing state-of-the-art QC and pre-processing while offering data analysts an integral open-source package. Providing the scientific community with this easily accessible tool will allow improving data quality and reuse and adoption of standards. PMID:23620278

  9. User-friendly solutions for microarray quality control and pre-processing on ArrayAnalysis.org.

    PubMed

    Eijssen, Lars M T; Jaillard, Magali; Adriaens, Michiel E; Gaj, Stan; de Groot, Philip J; Müller, Michael; Evelo, Chris T

    2013-07-01

    Quality control (QC) is crucial for any scientific method producing data. Applying adequate QC introduces new challenges in the genomics field where large amounts of data are produced with complex technologies. For DNA microarrays, specific algorithms for QC and pre-processing including normalization have been developed by the scientific community, especially for expression chips of the Affymetrix platform. Many of these have been implemented in the statistical scripting language R and are available from the Bioconductor repository. However, application is hampered by lack of integrative tools that can be used by users of any experience level. To fill this gap, we developed a freely available tool for QC and pre-processing of Affymetrix gene expression results, extending, integrating and harmonizing functionality of Bioconductor packages. The tool can be easily accessed through a wizard-like web portal at http://www.arrayanalysis.org or downloaded for local use in R. The portal provides extensive documentation, including user guides, interpretation help with real output illustrations and detailed technical documentation. It assists newcomers to the field in performing state-of-the-art QC and pre-processing while offering data analysts an integral open-source package. Providing the scientific community with this easily accessible tool will allow improving data quality and reuse and adoption of standards.

  10. An Efficient and Configurable Preprocessing Algorithm to Improve Stability Analysis.

    PubMed

    Sesia, Ilaria; Cantoni, Elena; Cernigliaro, Alice; Signorile, Giovanna; Fantino, Gianluca; Tavella, Patrizia

    2016-04-01

    The Allan variance (AVAR) is widely used to measure the stability of experimental time series. Specifically, AVAR is commonly used in space applications such as monitoring the clocks of the global navigation satellite systems (GNSSs). In these applications, the experimental data present some peculiar aspects which are not generally encountered when the measurements are carried out in a laboratory. Space clocks' data can in fact present outliers, jumps, and missing values, which corrupt the clock characterization. Therefore, an efficient preprocessing is fundamental to ensure a proper data analysis and improve the stability estimation performed with the AVAR or other similar variances. In this work, we propose a preprocessing algorithm and its implementation in a robust software code (in MATLAB language) able to deal with time series of experimental data affected by nonstationarities and missing data; our method is properly detecting and removing anomalous behaviors, hence making the subsequent stability analysis more reliable. PMID:26540679

  11. An Efficient and Configurable Preprocessing Algorithm to Improve Stability Analysis.

    PubMed

    Sesia, Ilaria; Cantoni, Elena; Cernigliaro, Alice; Signorile, Giovanna; Fantino, Gianluca; Tavella, Patrizia

    2016-04-01

    The Allan variance (AVAR) is widely used to measure the stability of experimental time series. Specifically, AVAR is commonly used in space applications such as monitoring the clocks of the global navigation satellite systems (GNSSs). In these applications, the experimental data present some peculiar aspects which are not generally encountered when the measurements are carried out in a laboratory. Space clocks' data can in fact present outliers, jumps, and missing values, which corrupt the clock characterization. Therefore, an efficient preprocessing is fundamental to ensure a proper data analysis and improve the stability estimation performed with the AVAR or other similar variances. In this work, we propose a preprocessing algorithm and its implementation in a robust software code (in MATLAB language) able to deal with time series of experimental data affected by nonstationarities and missing data; our method is properly detecting and removing anomalous behaviors, hence making the subsequent stability analysis more reliable.

  12. Fully Automated Complementary DNA Microarray Segmentation using a Novel Fuzzy-based Algorithm.

    PubMed

    Saberkari, Hamidreza; Bahrami, Sheyda; Shamsi, Mousa; Amoshahy, Mohammad Javad; Ghavifekr, Habib Badri; Sedaaghi, Mohammad Hossein

    2015-01-01

    DNA microarray is a powerful approach to study simultaneously, the expression of 1000 of genes in a single experiment. The average value of the fluorescent intensity could be calculated in a microarray experiment. The calculated intensity values are very close in amount to the levels of expression of a particular gene. However, determining the appropriate position of every spot in microarray images is a main challenge, which leads to the accurate classification of normal and abnormal (cancer) cells. In this paper, first a preprocessing approach is performed to eliminate the noise and artifacts available in microarray cells using the nonlinear anisotropic diffusion filtering method. Then, the coordinate center of each spot is positioned utilizing the mathematical morphology operations. Finally, the position of each spot is exactly determined through applying a novel hybrid model based on the principle component analysis and the spatial fuzzy c-means clustering (SFCM) algorithm. Using a Gaussian kernel in SFCM algorithm will lead to improving the quality in complementary DNA microarray segmentation. The performance of the proposed algorithm has been evaluated on the real microarray images, which is available in Stanford Microarray Databases. Results illustrate that the accuracy of microarray cells segmentation in the proposed algorithm reaches to 100% and 98% for noiseless/noisy cells, respectively.

  13. Finding differentially expressed genes in two-channel DNA microarray datasets: how to increase reliability of data preprocessing.

    PubMed

    Rotter, Ana; Hren, Matjaz; Baebler, Spela; Blejec, Andrej; Gruden, Kristina

    2008-09-01

    Due to the great variety of preprocessing tools in two-channel expression microarray data analysis it is difficult to choose the most appropriate one for a given experimental setup. In our study, two independent two-channel inhouse microarray experiments as well as a publicly available dataset were used to investigate the influence of the selection of preprocessing methods (background correction, normalization, and duplicate spots correlation calculation) on the discovery of differentially expressed genes. Here we are showing that both the list of differentially expressed genes and the expression values of selected genes depend significantly on the preprocessing approach applied. The choice of normalization method to be used had the highest impact on the results. We propose a simple but efficient approach to increase the reliability of obtained results, where two normalization methods which are theoretically distinct from one another are used on the same dataset. Then the intersection of results, that is, the lists of differentially expressed genes, is used in order to get a more accurate estimation of the genes that were de facto differentially expressed.

  14. Genetic Algorithm for Optimization: Preprocessing with n Dimensional Bisection and Error Estimation

    NASA Technical Reports Server (NTRS)

    Sen, S. K.; Shaykhian, Gholam Ali

    2006-01-01

    A knowledge of the appropriate values of the parameters of a genetic algorithm (GA) such as the population size, the shrunk search space containing the solution, crossover and mutation probabilities is not available a priori for a general optimization problem. Recommended here is a polynomial-time preprocessing scheme that includes an n-dimensional bisection and that determines the foregoing parameters before deciding upon an appropriate GA for all problems of similar nature and type. Such a preprocessing is not only fast but also enables us to get the global optimal solution and its reasonably narrow error bounds with a high degree of confidence.

  15. A biomimetic algorithm for the improved detection of microarray features

    NASA Astrophysics Data System (ADS)

    Nicolau, Dan V., Jr.; Nicolau, Dan V.; Maini, Philip K.

    2007-02-01

    One the major difficulties of microarray technology relate to the processing of large and - importantly - error-loaded images of the dots on the chip surface. Whatever the source of these errors, those obtained in the first stage of data acquisition - segmentation - are passed down to the subsequent processes, with deleterious results. As it has been demonstrated recently that biological systems have evolved algorithms that are mathematically efficient, this contribution attempts to test an algorithm that mimics a bacterial-"patented" algorithm for the search of available space and nutrients to find, "zero-in" and eventually delimitate the features existent on the microarray surface.

  16. Cancer Classification in Microarray Data using a Hybrid Selective Independent Component Analysis and υ-Support Vector Machine Algorithm.

    PubMed

    Saberkari, Hamidreza; Shamsi, Mousa; Joroughi, Mahsa; Golabi, Faegheh; Sedaaghi, Mohammad Hossein

    2014-10-01

    Microarray data have an important role in identification and classification of the cancer tissues. Having a few samples of microarrays in cancer researches is always one of the most concerns which lead to some problems in designing the classifiers. For this matter, preprocessing gene selection techniques should be utilized before classification to remove the noninformative genes from the microarray data. An appropriate gene selection method can significantly improve the performance of cancer classification. In this paper, we use selective independent component analysis (SICA) for decreasing the dimension of microarray data. Using this selective algorithm, we can solve the instability problem occurred in the case of employing conventional independent component analysis (ICA) methods. First, the reconstruction error and selective set are analyzed as independent components of each gene, which have a small part in making error in order to reconstruct new sample. Then, some of the modified support vector machine (υ-SVM) algorithm sub-classifiers are trained, simultaneously. Eventually, the best sub-classifier with the highest recognition rate is selected. The proposed algorithm is applied on three cancer datasets (leukemia, breast cancer and lung cancer datasets), and its results are compared with other existing methods. The results illustrate that the proposed algorithm (SICA + υ-SVM) has higher accuracy and validity in order to increase the classification accuracy. Such that, our proposed algorithm exhibits relative improvements of 3.3% in correctness rate over ICA + SVM and SVM algorithms in lung cancer dataset.

  17. Cancer Classification in Microarray Data using a Hybrid Selective Independent Component Analysis and υ-Support Vector Machine Algorithm

    PubMed Central

    Saberkari, Hamidreza; Shamsi, Mousa; Joroughi, Mahsa; Golabi, Faegheh; Sedaaghi, Mohammad Hossein

    2014-01-01

    Microarray data have an important role in identification and classification of the cancer tissues. Having a few samples of microarrays in cancer researches is always one of the most concerns which lead to some problems in designing the classifiers. For this matter, preprocessing gene selection techniques should be utilized before classification to remove the noninformative genes from the microarray data. An appropriate gene selection method can significantly improve the performance of cancer classification. In this paper, we use selective independent component analysis (SICA) for decreasing the dimension of microarray data. Using this selective algorithm, we can solve the instability problem occurred in the case of employing conventional independent component analysis (ICA) methods. First, the reconstruction error and selective set are analyzed as independent components of each gene, which have a small part in making error in order to reconstruct new sample. Then, some of the modified support vector machine (υ-SVM) algorithm sub-classifiers are trained, simultaneously. Eventually, the best sub-classifier with the highest recognition rate is selected. The proposed algorithm is applied on three cancer datasets (leukemia, breast cancer and lung cancer datasets), and its results are compared with other existing methods. The results illustrate that the proposed algorithm (SICA + υ-SVM) has higher accuracy and validity in order to increase the classification accuracy. Such that, our proposed algorithm exhibits relative improvements of 3.3% in correctness rate over ICA + SVM and SVM algorithms in lung cancer dataset. PMID:25426433

  18. An Automatic and Power Spectra-based Rotate Correcting Algorithm for Microarray Image.

    PubMed

    Deng, Ning; Duan, Huilong

    2005-01-01

    Microarray image analysis, an important aspect of microarray technology, faces vast amount of data processing. At present, the speed of microarray image analysis is quite limited by excessive manual intervention. The geometric structure of microarray determines that, while being analyzed, microarray image should be collimated in the scanning vertical orientation. If rotation or tilt happens in microarray image, the analysis result may be incorrect. Although some automatic image analysis algorithms are used for microarray, still few methods are reported to calibrate the microarray image rotation problem. In this paper, an automatic rotate correcting algorithm is presented which aims at the deflective problem of microarray image. This method is based on image power spectra. Examined by hundreds of samples of clinical data, the algorithm is proved to achieve high precision. As a result, adopting this algorithm, the overall procedure automation in microarray image analysis can be realized.

  19. Image preprocessing for improving computational efficiency in implementation of restoration and superresolution algorithms.

    PubMed

    Sundareshan, Malur K; Bhattacharjee, Supratik; Inampudi, Radhika; Pang, Ho-Yuen

    2002-12-10

    Computational complexity is a major impediment to the real-time implementation of image restoration and superresolution algorithms in many applications. Although powerful restoration algorithms have been developed within the past few years utilizing sophisticated mathematical machinery (based on statistical optimization and convex set theory), these algorithms are typically iterative in nature and require a sufficient number of iterations to be executed to achieve the desired resolution improvement that may be needed to meaningfully perform postprocessing image exploitation tasks in practice. Additionally, recent technological breakthroughs have facilitated novel sensor designs (focal plane arrays, for instance) that make it possible to capture megapixel imagery data at video frame rates. A major challenge in the processing of these large-format images is to complete the execution of the image processing steps within the frame capture times and to keep up with the output rate of the sensor so that all data captured by the sensor can be efficiently utilized. Consequently, development of novel methods that facilitate real-time implementation of image restoration and superresolution algorithms is of significant practical interest and is the primary focus of this study. The key to designing computationally efficient processing schemes lies in strategically introducing appropriate preprocessing steps together with the superresolution iterations to tailor optimized overall processing sequences for imagery data of specific formats. For substantiating this assertion, three distinct methods for tailoring a preprocessing filter and integrating it with the superresolution processing steps are outlined. These methods consist of a region-of-interest extraction scheme, a background-detail separation procedure, and a scene-derived information extraction step for implementing a set-theoretic restoration of the image that is less demanding in computation compared with the

  20. Image preprocessing for improving computational efficiency in implementation of restoration and superresolution algorithms

    NASA Astrophysics Data System (ADS)

    Sundareshan, Malur K.; Bhattacharjee, Supratik; Inampudi, Radhika; Pang, Ho-Yuen

    2002-12-01

    Computational complexity is a major impediment to the real-time implementation of image restoration and superresolution algorithms in many applications. Although powerful restoration algorithms have been developed within the past few years utilizing sophisticated mathematical machinery (based on statistical optimization and convex set theory), these algorithms are typically iterative in nature and require a sufficient number of iterations to be executed to achieve the desired resolution improvement that may be needed to meaningfully perform postprocessing image exploitation tasks in practice. Additionally, recent technological breakthroughs have facilitated novel sensor designs (focal plane arrays, for instance) that make it possible to capture megapixel imagery data at video frame rates. A major challenge in the processing of these large-format images is to complete the execution of the image processing steps within the frame capture times and to keep up with the output rate of the sensor so that all data captured by the sensor can be efficiently utilized. Consequently, development of novel methods that facilitate real-time implementation of image restoration and superresolution algorithms is of significant practical interest and is the primary focus of this study. The key to designing computationally efficient processing schemes lies in strategically introducing appropriate preprocessing steps together with the superresolution iterations to tailor optimized overall processing sequences for imagery data of specific formats. For substantiating this assertion, three distinct methods for tailoring a preprocessing filter and integrating it with the superresolution processing steps are outlined. These methods consist of a region-of-interest extraction scheme, a background-detail separation procedure, and a scene-derived information extraction step for implementing a set-theoretic restoration of the image that is less demanding in computation compared with the

  1. Syndromic surveillance using veterinary laboratory data: data pre-processing and algorithm performance evaluation

    PubMed Central

    Dórea, Fernanda C.; McEwen, Beverly J.; McNab, W. Bruce; Revie, Crawford W.; Sanchez, Javier

    2013-01-01

    Diagnostic test orders to an animal laboratory were explored as a data source for monitoring trends in the incidence of clinical syndromes in cattle. Four years of real data and over 200 simulated outbreak signals were used to compare pre-processing methods that could remove temporal effects in the data, as well as temporal aberration detection algorithms that provided high sensitivity and specificity. Weekly differencing demonstrated solid performance in removing day-of-week effects, even in series with low daily counts. For aberration detection, the results indicated that no single algorithm showed performance superior to all others across the range of outbreak scenarios simulated. Exponentially weighted moving average charts and Holt–Winters exponential smoothing demonstrated complementary performance, with the latter offering an automated method to adjust to changes in the time series that will likely occur in the future. Shewhart charts provided lower sensitivity but earlier detection in some scenarios. Cumulative sum charts did not appear to add value to the system; however, the poor performance of this algorithm was attributed to characteristics of the data monitored. These findings indicate that automated monitoring aimed at early detection of temporal aberrations will likely be most effective when a range of algorithms are implemented in parallel. PMID:23576782

  2. Syndromic surveillance using veterinary laboratory data: data pre-processing and algorithm performance evaluation.

    PubMed

    Dórea, Fernanda C; McEwen, Beverly J; McNab, W Bruce; Revie, Crawford W; Sanchez, Javier

    2013-06-01

    Diagnostic test orders to an animal laboratory were explored as a data source for monitoring trends in the incidence of clinical syndromes in cattle. Four years of real data and over 200 simulated outbreak signals were used to compare pre-processing methods that could remove temporal effects in the data, as well as temporal aberration detection algorithms that provided high sensitivity and specificity. Weekly differencing demonstrated solid performance in removing day-of-week effects, even in series with low daily counts. For aberration detection, the results indicated that no single algorithm showed performance superior to all others across the range of outbreak scenarios simulated. Exponentially weighted moving average charts and Holt-Winters exponential smoothing demonstrated complementary performance, with the latter offering an automated method to adjust to changes in the time series that will likely occur in the future. Shewhart charts provided lower sensitivity but earlier detection in some scenarios. Cumulative sum charts did not appear to add value to the system; however, the poor performance of this algorithm was attributed to characteristics of the data monitored. These findings indicate that automated monitoring aimed at early detection of temporal aberrations will likely be most effective when a range of algorithms are implemented in parallel.

  3. Preprocessing of hyperspectral images: a comparative study of destriping algorithms for EO1-hyperion

    NASA Astrophysics Data System (ADS)

    Scheffler, Daniel; Karrasch, Pierre

    2013-10-01

    In this study, data from the EO-1 Hyperion instrument were used. Apart from atmospheric influences or topographic effects, the data represent a good choice in order to show different steps of the preprocessing process targeting sensorinternal sources of errors. These include the diffuse sensor noise, the striping effect, the smile effect, the keystone effect and the spatial misalignments between the detector arrays. For this research paper, the authors focus on the striping effect by comparing and evaluating different algorithms, methods and configurations to correct striping errors. The correction of striping effects becomes necessary due to the imprecise calibration of the detector array. This inaccuracy affects especially the first 12 visual and near infrared bands (VNIR) and also a large number of the bands in the short wave infrared array (SWIR). Altogether six destriping techniques were tested on the basis of a Hyperion dataset covering a test site in Central Europe. For the final evaluation, various analyses across all Hyperion channels were performed. The results show that some correction methods have almost no effect on the striping in the images. Other methods may eliminate the striping, but analyses show that these algorithms also alter pixel values in adjacent areas which originally had not been disturbed by the striping effect.

  4. Rank-based algorithms for anlaysis of microarrays

    NASA Astrophysics Data System (ADS)

    Liu, Wei-min; Mei, Rui; Bartell, Daniel M.; Di, Xiaojun; Webster, Teresa A.; Ryder, Tom

    2001-06-01

    Analysis of microarray data often involves extracting information from raw intensities of spots of cells and making certain calls. Rank-based algorithms are powerful tools to provide probability values of hypothesis tests, especially when the distribution of the intensities is unknown. For our current gene expression arrays, a gene is detected by a set of probe pairs consisting of perfect match and mismatch cells. The one-sided upper-tail Wilcoxon's signed rank test is used in our algorithms for absolute calls (whether a gene is detected or not), as well as comparative calls (whether a gene is increasing or decreasing or no significant change in a sample compared with another sample). We also test the possibility to use only perfect match cells to make calls. This paper focuses on absolute calls. We have developed error analysis methods and software tools that allow us to compare the accuracy of the calls in the presence or absence of mismatch cells at different target concentrations. The usage of nonparametric rank-based tests is not limited to absolute and comparative calls of gene expression chips. They can also be applied to other oligonucleotide microarrays for genotyping and mutation detection, as well as spotted arrays.

  5. Microarrays

    ERIC Educational Resources Information Center

    Plomin, Robert; Schalkwyk, Leonard C.

    2007-01-01

    Microarrays are revolutionizing genetics by making it possible to genotype hundreds of thousands of DNA markers and to assess the expression (RNA transcripts) of all of the genes in the genome. Microarrays are slides the size of a postage stamp that contain millions of DNA sequences to which single-stranded DNA or RNA can hybridize. This…

  6. Effective preprocessing in #SAT

    NASA Astrophysics Data System (ADS)

    Guo, Qin; Sang, Juan; He, Yong-mei

    2011-12-01

    Preprocessing #SAT instances can reduce their size considerably and decrease the solving time. In this paper we investigate the use of the hyper-binary resolution and equality reduction to preprocess the #SAT instances. And a preprocessing algorithm Preprocess MC is presented, which combines the unit propagation, the hyper-binary resolution, and the equality reduction together. The experiment shows that these excellent technologies not only reduce the size of the #SAT formula, but also improve the ability of the model counters to solve #SAT problems.

  7. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

    PubMed

    Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A

    2015-06-01

    Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.

  8. Artifact Removal from Biosignal using Fixed Point ICA Algorithm for Pre-processing in Biometric Recognition

    NASA Astrophysics Data System (ADS)

    Mishra, Puneet; Singla, Sunil Kumar

    2013-01-01

    In the modern world of automation, biological signals, especially Electroencephalogram (EEG) and Electrocardiogram (ECG), are gaining wide attention as a source of biometric information. Earlier studies have shown that EEG and ECG show versatility with individuals and every individual has distinct EEG and ECG spectrum. EEG (which can be recorded from the scalp due to the effect of millions of neurons) may contain noise signals such as eye blink, eye movement, muscular movement, line noise, etc. Similarly, ECG may contain artifact like line noise, tremor artifacts, baseline wandering, etc. These noise signals are required to be separated from the EEG and ECG signals to obtain the accurate results. This paper proposes a technique for the removal of eye blink artifact from EEG and ECG signal using fixed point or FastICA algorithm of Independent Component Analysis (ICA). For validation, FastICA algorithm has been applied to synthetic signal prepared by adding random noise to the Electrocardiogram (ECG) signal. FastICA algorithm separates the signal into two independent components, i.e. ECG pure and artifact signal. Similarly, the same algorithm has been applied to remove the artifacts (Electrooculogram or eye blink) from the EEG signal.

  9. LS-CAP: an algorithm for identifying cytogenetic aberrations in hepatocellular carcinoma using microarray data.

    PubMed

    He, Xianmin; Wei, Qing; Sun, Meiqian; Fu, Xuping; Fan, Sichang; Li, Yao

    2006-05-01

    Biological techniques such as Array-Comparative genomic hybridization (CGH), fluorescent in situ hybridization (FISH) and affymetrix single nucleotide pleomorphism (SNP) array have been used to detect cytogenetic aberrations. However, on genomic scale, these techniques are labor intensive and time consuming. Comparative genomic microarray analysis (CGMA) has been used to identify cytogenetic changes in hepatocellular carcinoma (HCC) using gene expression microarray data. However, CGMA algorithm can not give precise localization of aberrations, fails to identify small cytogenetic changes, and exhibits false negatives and positives. Locally un-weighted smoothing cytogenetic aberrations prediction (LS-CAP) based on local smoothing and binomial distribution can be expected to address these problems. LS-CAP algorithm was built and used on HCC microarray profiles. Eighteen cytogenetic abnormalities were identified, among them 5 were reported previously, and 12 were proven by CGH studies. LS-CAP effectively reduced the false negatives and positives, and precisely located small fragments with cytogenetic aberrations.

  10. LANDSAT data preprocessing

    NASA Technical Reports Server (NTRS)

    Austin, W. W.

    1983-01-01

    The effect on LANDSAT data of a Sun angle correction, an intersatellite LANDSAT-2 and LANDSAT-3 data range adjustment, and the atmospheric correction algorithm was evaluated. Fourteen 1978 crop year LACIE sites were used as the site data set. The preprocessing techniques were applied to multispectral scanner channel data and transformed data were plotted and used to analyze the effectiveness of the preprocessing techniques. Ratio transformations effectively reduce the need for preprocessing techniques to be applied directly to the data. Subtractive transformations are more sensitive to Sun angle and atmospheric corrections than ratios. Preprocessing techniques, other than those applied at the Goddard Space Flight Center, should only be applied as an option of the user. While performed on LANDSAT data the study results are also applicable to meteorological satellite data.

  11. PMCR-Miner: parallel maximal confident association rules miner algorithm for microarray data set.

    PubMed

    Zakaria, Wael; Kotb, Yasser; Ghaleb, Fayed F M

    2015-01-01

    The MCR-Miner algorithm is aimed to mine all maximal high confident association rules form the microarray up/down-expressed genes data set. This paper introduces two new algorithms: IMCR-Miner and PMCR-Miner. The IMCR-Miner algorithm is an extension of the MCR-Miner algorithm with some improvements. These improvements implement a novel way to store the samples of each gene into a list of unsigned integers in order to benefit using the bitwise operations. In addition, the IMCR-Miner algorithm overcomes the drawbacks faced by the MCR-Miner algorithm by setting some restrictions to ignore repeated comparisons. The PMCR-Miner algorithm is a parallel version of the new proposed IMCR-Miner algorithm. The PMCR-Miner algorithm is based on shared-memory systems and task parallelism, where no time is needed in the process of sharing and combining data between processors. The experimental results on real microarray data sets show that the PMCR-Miner algorithm is more efficient and scalable than the counterparts.

  12. Melanoma Prognostic Model Using Tissue Microarrays and Genetic Algorithms

    PubMed Central

    Gould Rothberg, Bonnie E.; Berger, Aaron J.; Molinaro, Annette M.; Subtil, Antonio; Krauthammer, Michael O.; Camp, Robert L.; Bradley, William R.; Ariyan, Stephan; Kluger, Harriet M.; Rimm, David L.

    2009-01-01

    Purpose As a result of the questionable risk-to-benefit ratio of adjuvant therapies, stage II melanoma is currently managed by observation because available clinicopathologic parameters cannot identify the 20% to 60% of such patients likely to develop metastatic disease. Here, we propose a multimarker molecular prognostic assay that can help triage patients at increased risk of recurrence. Methods Protein expression for 38 candidates relevant to melanoma oncogenesis was evaluated using the automated quantitative analysis (AQUA) method for immunofluorescence-based immunohistochemistry in formalin-fixed, paraffin-embedded specimens from a cohort of 192 primary melanomas collected during 1959 to 1994. The prognostic assay was built using a genetic algorithm and validated on an independent cohort of 246 serial primary melanomas collected from 1997 to 2004. Results Multiple iterations of the genetic algorithm yielded a consistent five-marker solution. A favorable prognosis was predicted by ATF2 ln(non-nuclear/nuclear AQUA score ratio) of more than –0.052, p21WAF1 nuclear compartment AQUA score of more than 12.98, p16INK4A ln(non-nuclear/nuclear AQUA score ratio) of ≤ −0.083, β-catenin total AQUA score of more than 38.68, and fibronectin total AQUA score of ≤ 57.93. Primary tumors that met at least four of these five conditions were considered a low-risk group, and those that met three or fewer conditions formed a high-risk group (log-rank P < .0001). Multivariable proportional hazards analysis adjusting for clinicopathologic parameters shows that the high-risk group has significantly reduced survival on both the discovery (hazard ratio = 2.84; 95% CI, 1.46 to 5.49; P = .002) and validation (hazard ratio = 2.72; 95% CI, 1.12 to 6.58; P = .027) cohorts. Conclusion This multimarker prognostic assay, an independent determinant of melanoma survival, might be beneficial in improving the selection of stage II patients for adjuvant therapy. PMID:19884546

  13. An algorithm for finding biologically significant features in microarray data based on a priori manifold learning.

    PubMed

    Hira, Zena M; Trigeorgis, George; Gillies, Duncan F

    2014-01-01

    Microarray databases are a large source of genetic data, which, upon proper analysis, could enhance our understanding of biology and medicine. Many microarray experiments have been designed to investigate the genetic mechanisms of cancer, and analytical approaches have been applied in order to classify different types of cancer or distinguish between cancerous and non-cancerous tissue. However, microarrays are high-dimensional datasets with high levels of noise and this causes problems when using machine learning methods. A popular approach to this problem is to search for a set of features that will simplify the structure and to some degree remove the noise from the data. The most widely used approach to feature extraction is principal component analysis (PCA) which assumes a multivariate Gaussian model of the data. More recently, non-linear methods have been investigated. Among these, manifold learning algorithms, for example Isomap, aim to project the data from a higher dimensional space onto a lower dimension one. We have proposed a priori manifold learning for finding a manifold in which a representative set of microarray data is fused with relevant data taken from the KEGG pathway database. Once the manifold has been constructed the raw microarray data is projected onto it and clustering and classification can take place. In contrast to earlier fusion based methods, the prior knowledge from the KEGG databases is not used in, and does not bias the classification process--it merely acts as an aid to find the best space in which to search the data. In our experiments we have found that using our new manifold method gives better classification results than using either PCA or conventional Isomap. PMID:24595155

  14. Krylov subspace algorithms for computing GeneRank for the analysis of microarray data mining.

    PubMed

    Wu, Gang; Zhang, Ying; Wei, Yimin

    2010-04-01

    GeneRank is a new engine technology for the analysis of microarray experiments. It combines gene expression information with a network structure derived from gene notations or expression profile correlations. Using matrix decomposition techniques, we first give a matrix analysis of the GeneRank model. We reformulate the GeneRank vector as a linear combination of three parts in the general case when the matrix in question is non-diagonalizable. We then propose two Krylov subspace methods for computing GeneRank. Numerical experiments show that, when the GeneRank problem is very large, the new algorithms are appropriate choices. PMID:20426695

  15. SPACE: an algorithm to predict and quantify alternatively spliced isoforms using microarrays.

    PubMed

    Anton, Miguel A; Gorostiaga, Dorleta; Guruceaga, Elizabeth; Segura, Victor; Carmona-Saez, Pedro; Pascual-Montano, Alberto; Pio, Ruben; Montuenga, Luis M; Rubio, Angel

    2008-01-01

    Exon and exon+junction microarrays are promising tools for studying alternative splicing. Current analytical tools applied to these arrays lack two relevant features: the ability to predict unknown spliced forms and the ability to quantify the concentration of known and unknown isoforms. SPACE is an algorithm that has been developed to (1) estimate the number of different transcripts expressed under several conditions, (2) predict the precursor mRNA splicing structure and (3) quantify the transcript concentrations including unknown forms. The results presented here show its robustness and accuracy for real and simulated data. PMID:18312629

  16. SPACE: an algorithm to predict and quantify alternatively spliced isoforms using microarrays

    PubMed Central

    Anton, Miguel A; Gorostiaga, Dorleta; Guruceaga, Elizabeth; Segura, Victor; Carmona-Saez, Pedro; Pascual-Montano, Alberto; Pio, Ruben; Montuenga, Luis M; Rubio, Angel

    2008-01-01

    Exon and exon+junction microarrays are promising tools for studying alternative splicing. Current analytical tools applied to these arrays lack two relevant features: the ability to predict unknown spliced forms and the ability to quantify the concentration of known and unknown isoforms. SPACE is an algorithm that has been developed to (1) estimate the number of different transcripts expressed under several conditions, (2) predict the precursor mRNA splicing structure and (3) quantify the transcript concentrations including unknown forms. The results presented here show its robustness and accuracy for real and simulated data. PMID:18312629

  17. Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords.

    PubMed

    Luque-Baena, R M; Urda, D; Gonzalo Claros, M; Franco, L; Jerez, J M

    2014-06-01

    Genetic algorithms are widely used in the estimation of expression profiles from microarrays data. However, these techniques are unable to produce stable and robust solutions suitable to use in clinical and biomedical studies. This paper presents a novel two-stage evolutionary strategy for gene feature selection combining the genetic algorithm with biological information extracted from the KEGG database. A comparative study is carried out over public data from three different types of cancer (leukemia, lung cancer and prostate cancer). Even though the analyses only use features having KEGG information, the results demonstrate that this two-stage evolutionary strategy increased the consistency, robustness and accuracy of a blind discrimination among relapsed and healthy individuals. Therefore, this approach could facilitate the definition of gene signatures for the clinical prognosis and diagnostic of cancer diseases in a near future. Additionally, it could also be used for biological knowledge discovery about the studied disease.

  18. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling

    PubMed Central

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems. PMID:25961028

  19. Gene order computation using Alzheimer's DNA microarray gene expression data and the Ant Colony Optimisation algorithm.

    PubMed

    Pang, Chaoyang; Jiang, Gang; Wang, Shipeng; Hu, Benqiong; Liu, Qingzhong; Deng, Youping; Huang, Xudong

    2012-01-01

    As Alzheimer's Disease (AD) is the most common form of dementia, the study of AD-related genes via biocomputation is an important research topic. One method of studying AD-related gene is to cluster similar genes together into a gene order. Gene order is a good clustering method as the results can be optimal globally while other clustering methods are only optimal locally. Herein we use the Ant Colony Optimisation (ACO)-based algorithm to calculate the gene order from an Alzheimer's DNA microarray dataset. We test it with four distance measurements: Pearson distance, Spearmen distance, Euclidean distance, and squared Euclidean distance. Our computing results indicate: a different distance formula generated a different quality of gene order, the squared Euclidean distance approach produced the optimal AD-related gene order.

  20. A novel biclustering algorithm of binary microarray data: BiBinCons and BiBinAlter.

    PubMed

    Saber, Haifa Ben; Elloumi, Mourad

    2015-01-01

    The biclustering of microarray data has been the subject of a large research. No one of the existing biclustering algorithms is perfect. The construction of biologically significant groups of biclusters for large microarray data is still a problem that requires a continuous work. Biological validation of biclusters of microarray data is one of the most important open issues. So far, there are no general guidelines in the literature on how to validate biologically extracted biclusters. In this paper, we develop two biclustering algorithms of binary microarray data, adopting the Iterative Row and Column Clustering Combination (IRCCC) approach, called BiBinCons and BiBinAlter. However, the BiBinAlter algorithm is an improvement of BiBinCons. On the other hand, BiBinAlter differs from BiBinCons by the use of the EvalStab and IndHomog evaluation functions in addition to the CroBin one (Bioinformatics 20:1993-2003, 2004). BiBinAlter can extracts biclusters of good quality with better p-values.

  1. Sample phenotype clusters in high-density oligonucleotide microarray data sets are revealed using Isomap, a nonlinear algorithm

    PubMed Central

    Dawson, Kevin; Rodriguez, Raymond L; Malyj, Wasyl

    2005-01-01

    Background Life processes are determined by the organism's genetic profile and multiple environmental variables. However the interaction between these factors is inherently non-linear [1]. Microarray data is one representation of the nonlinear interactions among genes and genes and environmental factors. Still most microarray studies use linear methods for the interpretation of nonlinear data. In this study, we apply Isomap, a nonlinear method of dimensionality reduction, to analyze three independent large Affymetrix high-density oligonucleotide microarray data sets. Results Isomap discovered low-dimensional structures embedded in the Affymetrix microarray data sets. These structures correspond to and help to interpret biological phenomena present in the data. This analysis provides examples of temporal, spatial, and functional processes revealed by the Isomap algorithm. In a spinal cord injury data set, Isomap discovers the three main modalities of the experiment – location and severity of the injury and the time elapsed after the injury. In a multiple tissue data set, Isomap discovers a low-dimensional structure that corresponds to anatomical locations of the source tissues. This model is capable of describing low- and high-resolution differences in the same model, such as kidney-vs.-brain and differences between the nuclei of the amygdala, respectively. In a high-throughput drug screening data set, Isomap discovers the monocytic and granulocytic differentiation of myeloid cells and maps several chemical compounds on the two-dimensional model. Conclusion Visualization of Isomap models provides useful tools for exploratory analysis of microarray data sets. In most instances, Isomap models explain more of the variance present in the microarray data than PCA or MDS. Finally, Isomap is a promising new algorithm for class discovery and class prediction in high-density oligonucleotide data sets. PMID:16076401

  2. Forward-Masked Frequency Selectivity Improvements in Simulated and Actual Cochlear Implant Users Using a Preprocessing Algorithm

    PubMed Central

    Jürgens, Tim

    2016-01-01

    Frequency selectivity can be quantified using masking paradigms, such as psychophysical tuning curves (PTCs). Normal-hearing (NH) listeners show sharp PTCs that are level- and frequency-dependent, whereas frequency selectivity is strongly reduced in cochlear implant (CI) users. This study aims at (a) assessing individual shapes of PTCs in CI users, (b) comparing these shapes to those of simulated CI listeners (NH listeners hearing through a CI simulation), and (c) increasing the sharpness of PTCs using a biologically inspired dynamic compression algorithm, BioAid, which has been shown to sharpen the PTC shape in hearing-impaired listeners. A three-alternative-forced-choice forward-masking technique was used to assess PTCs in 8 CI users (with their own speech processor) and 11 NH listeners (with and without listening through a vocoder to simulate electric hearing). CI users showed flat PTCs with large interindividual variability in shape, whereas simulated CI listeners had PTCs of the same average flatness, but more homogeneous shapes across listeners. The algorithm BioAid was used to process the stimuli before entering the CI users’ speech processor or the vocoder simulation. This algorithm was able to partially restore frequency selectivity in both groups, particularly in seven out of eight CI users, meaning significantly sharper PTCs than in the unprocessed condition. The results indicate that algorithms can improve the large-scale sharpness of frequency selectivity in some CI users. This finding may be useful for the design of sound coding strategies particularly for situations in which high frequency selectivity is desired, such as for music perception. PMID:27604785

  3. Forward-Masked Frequency Selectivity Improvements in Simulated and Actual Cochlear Implant Users Using a Preprocessing Algorithm.

    PubMed

    Langner, Florian; Jürgens, Tim

    2016-01-01

    Frequency selectivity can be quantified using masking paradigms, such as psychophysical tuning curves (PTCs). Normal-hearing (NH) listeners show sharp PTCs that are level- and frequency-dependent, whereas frequency selectivity is strongly reduced in cochlear implant (CI) users. This study aims at (a) assessing individual shapes of PTCs in CI users, (b) comparing these shapes to those of simulated CI listeners (NH listeners hearing through a CI simulation), and (c) increasing the sharpness of PTCs using a biologically inspired dynamic compression algorithm, BioAid, which has been shown to sharpen the PTC shape in hearing-impaired listeners. A three-alternative-forced-choice forward-masking technique was used to assess PTCs in 8 CI users (with their own speech processor) and 11 NH listeners (with and without listening through a vocoder to simulate electric hearing). CI users showed flat PTCs with large interindividual variability in shape, whereas simulated CI listeners had PTCs of the same average flatness, but more homogeneous shapes across listeners. The algorithm BioAid was used to process the stimuli before entering the CI users' speech processor or the vocoder simulation. This algorithm was able to partially restore frequency selectivity in both groups, particularly in seven out of eight CI users, meaning significantly sharper PTCs than in the unprocessed condition. The results indicate that algorithms can improve the large-scale sharpness of frequency selectivity in some CI users. This finding may be useful for the design of sound coding strategies particularly for situations in which high frequency selectivity is desired, such as for music perception. PMID:27604785

  4. Forward-Masked Frequency Selectivity Improvements in Simulated and Actual Cochlear Implant Users Using a Preprocessing Algorithm.

    PubMed

    Langner, Florian; Jürgens, Tim

    2016-09-07

    Frequency selectivity can be quantified using masking paradigms, such as psychophysical tuning curves (PTCs). Normal-hearing (NH) listeners show sharp PTCs that are level- and frequency-dependent, whereas frequency selectivity is strongly reduced in cochlear implant (CI) users. This study aims at (a) assessing individual shapes of PTCs in CI users, (b) comparing these shapes to those of simulated CI listeners (NH listeners hearing through a CI simulation), and (c) increasing the sharpness of PTCs using a biologically inspired dynamic compression algorithm, BioAid, which has been shown to sharpen the PTC shape in hearing-impaired listeners. A three-alternative-forced-choice forward-masking technique was used to assess PTCs in 8 CI users (with their own speech processor) and 11 NH listeners (with and without listening through a vocoder to simulate electric hearing). CI users showed flat PTCs with large interindividual variability in shape, whereas simulated CI listeners had PTCs of the same average flatness, but more homogeneous shapes across listeners. The algorithm BioAid was used to process the stimuli before entering the CI users' speech processor or the vocoder simulation. This algorithm was able to partially restore frequency selectivity in both groups, particularly in seven out of eight CI users, meaning significantly sharper PTCs than in the unprocessed condition. The results indicate that algorithms can improve the large-scale sharpness of frequency selectivity in some CI users. This finding may be useful for the design of sound coding strategies particularly for situations in which high frequency selectivity is desired, such as for music perception.

  5. The LeFE algorithm: embracing the complexity of gene expression in the interpretation of microarray data

    PubMed Central

    Eichler, Gabriel S; Reimers, Mark; Kane, David; Weinstein, John N

    2007-01-01

    Interpretation of microarray data remains a challenge, and most methods fail to consider the complex, nonlinear regulation of gene expression. To address that limitation, we introduce Learner of Functional Enrichment (LeFE), a statistical/machine learning algorithm based on Random Forest, and demonstrate it on several diverse datasets: smoker/never smoker, breast cancer classification, and cancer drug sensitivity. We also compare it with previously published algorithms, including Gene Set Enrichment Analysis. LeFE regularly identifies statistically significant functional themes consistent with known biology. PMID:17845722

  6. Evaluation of multivariate calibration models with different pre-processing and processing algorithms for a novel resolution and quantitation of spectrally overlapped quaternary mixture in syrup

    NASA Astrophysics Data System (ADS)

    Moustafa, Azza A.; Hegazy, Maha A.; Mohamed, Dalia; Ali, Omnia

    2016-02-01

    A novel approach for the resolution and quantitation of severely overlapped quaternary mixture of carbinoxamine maleate (CAR), pholcodine (PHL), ephedrine hydrochloride (EPH) and sunset yellow (SUN) in syrup was demonstrated utilizing different spectrophotometric assisted multivariate calibration methods. The applied methods have used different processing and pre-processing algorithms. The proposed methods were partial least squares (PLS), concentration residuals augmented classical least squares (CRACLS), and a novel method; continuous wavelet transforms coupled with partial least squares (CWT-PLS). These methods were applied to a training set in the concentration ranges of 40-100 μg/mL, 40-160 μg/mL, 100-500 μg/mL and 8-24 μg/mL for the four components, respectively. The utilized methods have not required any preliminary separation step or chemical pretreatment. The validity of the methods was evaluated by an external validation set. The selectivity of the developed methods was demonstrated by analyzing the drugs in their combined pharmaceutical formulation without any interference from additives. The obtained results were statistically compared with the official and reported methods where no significant difference was observed regarding both accuracy and precision.

  7. A regression-based differential expression detection algorithm for microarray studies with ultra-low sample size.

    PubMed

    Vasiliu, Daniel; Clamons, Samuel; McDonough, Molly; Rabe, Brian; Saha, Margaret

    2015-01-01

    Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality.

  8. An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data

    PubMed Central

    Lecocke, Michael; Hess, Kenneth

    2007-01-01

    Background We consider both univariate- and multivariate-based feature selection for the problem of binary classification with microarray data. The idea is to determine whether the more sophisticated multivariate approach leads to better misclassification error rates because of the potential to consider jointly significant subsets of genes (but without overfitting the data). Methods We present an empirical study in which 10-fold cross-validation is applied externally to both a univariate-based and two multivariate- (genetic algorithm (GA)-) based feature selection processes. These procedures are applied with respect to three supervised learning algorithms and six published two-class microarray datasets. Results Considering all datasets, and learning algorithms, the average 10-fold external cross-validation error rates for the univariate-, single-stage GA-, and two-stage GA-based processes are 14.2%, 14.6%, and 14.2%, respectively. We also find that the optimism bias estimates from the GA analyses were half that of the univariate approach, but the selection bias estimates from the GA analyses were 2.5 times that of the univariate results. Conclusions We find that the 10-fold external cross-validation misclassification error rates were very comparable. Further, we find that a two-stage GA approach did not demonstrate a significant advantage over a 1-stage approach. We also find that the univariate approach had higher optimism bias and lower selection bias compared to both GA approaches. PMID:19458774

  9. Preprocessing and Quality Control Strategies for Illumina DASL Assay-Based Brain Gene Expression Studies with Semi-Degraded Samples.

    PubMed

    Chow, Maggie L; Winn, Mary E; Li, Hai-Ri; April, Craig; Wynshaw-Boris, Anthony; Fan, Jian-Bing; Fu, Xiang-Dong; Courchesne, Eric; Schork, Nicholas J

    2012-01-01

    Available statistical preprocessing or quality control analysis tools for gene expression microarray datasets are known to greatly affect downstream data analysis, especially when degraded samples, unique tissue samples, or novel expression assays are used. It is therefore important to assess the validity and impact of the assumptions built in to preprocessing schemes for a dataset. We developed and assessed a data preprocessing strategy for use with the Illumina DASL-based gene expression assay with partially degraded postmortem prefrontal cortex samples. The samples were obtained from individuals with autism as part of an investigation of the pathogenic factors contributing to autism. Using statistical analysis methods and metrics such as those associated with multivariate distance matrix regression and mean inter-array correlation, we developed a DASL-based assay gene expression preprocessing pipeline to accommodate and detect problems with microarray-based gene expression values obtained with degraded brain samples. Key steps in the pipeline included outlier exclusion, data transformation and normalization, and batch effect and covariate corrections. Our goal was to produce a clean dataset for subsequent downstream differential expression analysis. We ultimately settled on available transformation and normalization algorithms in the R/Bioconductor package lumi based on an assessment of their use in various combinations. A log2-transformed, quantile-normalized, and batch and seizure-corrected procedure was likely the most appropriate for our data. We empirically tested different components of our proposed preprocessing strategy and believe that our results suggest that a preprocessing strategy that effectively identifies outliers, normalizes the data, and corrects for batch effects can be applied to all studies, even those pursued with degraded samples.

  10. A new method for gridding DNA microarrays.

    PubMed

    Charalambous, Christoforos C; Matsopoulos, George K

    2013-10-01

    In this paper, a new methodological scheme for the gridding of DNA microarrays is proposed. The scheme composes of a series of processes applied sequentially. Each DNA microarray image is pre-processed to remove any noise and the center of each spot is detected using a template matching algorithm. Then, an initial gridding is automatically placed on the DNA microarray image by 'building' rectangular pyramids around the detected spots' centers. The gridlines "move" between the pyramids, horizontally and vertically, forming this initial grid. Furthermore, a refinement process is applied composing of a five-step approach in order to correct gridding imperfections caused by its initial placement, both in non-spot cases and in more than one spot enclosure cases. The proposed gridding scheme is applied on DNA microarray images under known transformations and on real-world DNA data. Its performance is compared against the projection pursuit method, which is often used due to its speed and simplicity, as well as against a state-of-the-art method, the Optimal Multi-level Thresholding Gridding (OMTG). According to the obtained results, the proposed gridding scheme outperforms both methods, qualitatively and quantitatively.

  11. A new method for class prediction based on signed-rank algorithms applied to Affymetrix® microarray experiments

    PubMed Central

    Rème, Thierry; Hose, Dirk; De Vos, John; Vassal, Aurélien; Poulain, Pierre-Olivier; Pantesco, Véronique; Goldschmidt, Hartmut; Klein, Bernard

    2008-01-01

    Background The huge amount of data generated by DNA chips is a powerful basis to classify various pathologies. However, constant evolution of microarray technology makes it difficult to mix data from different chip types for class prediction of limited sample populations. Affymetrix® technology provides both a quantitative fluorescence signal and a decision (detection call: absent or present) based on signed-rank algorithms applied to several hybridization repeats of each gene, with a per-chip normalization. We developed a new prediction method for class belonging based on the detection call only from recent Affymetrix chip type. Biological data were obtained by hybridization on U133A, U133B and U133Plus 2.0 microarrays of purified normal B cells and cells from three independent groups of multiple myeloma (MM) patients. Results After a call-based data reduction step to filter out non class-discriminative probe sets, the gene list obtained was reduced to a predictor with correction for multiple testing by iterative deletion of probe sets that sequentially improve inter-class comparisons and their significance. The error rate of the method was determined using leave-one-out and 5-fold cross-validation. It was successfully applied to (i) determine a sex predictor with the normal donor group classifying gender with no error in all patient groups except for male MM samples with a Y chromosome deletion, (ii) predict the immunoglobulin light and heavy chains expressed by the malignant myeloma clones of the validation group and (iii) predict sex, light and heavy chain nature for every new patient. Finally, this method was shown powerful when compared to the popular classification method Prediction Analysis of Microarray (PAM). Conclusion This normalization-free method is routinely used for quality control and correction of collection errors in patient reports to clinicians. It can be easily extended to multiple class prediction suitable with clinical groups, and looks

  12. Fuzzy-rough supervised attribute clustering algorithm and classification of microarray data.

    PubMed

    Maji, Pradipta

    2011-02-01

    One of the major tasks with gene expression data is to find groups of coregulated genes whose collective expression is strongly associated with sample categories. In this regard, a new clustering algorithm, termed as fuzzy-rough supervised attribute clustering (FRSAC), is proposed to find such groups of genes. The proposed algorithm is based on the theory of fuzzy-rough sets, which directly incorporates the information of sample categories into the gene clustering process. A new quantitative measure is introduced based on fuzzy-rough sets that incorporates the information of sample categories to measure the similarity among genes. The proposed algorithm is based on measuring the similarity between genes using the new quantitative measure, whereby redundancy among the genes is removed. The clusters are refined incrementally based on sample categories. The effectiveness of the proposed FRSAC algorithm, along with a comparison with existing supervised and unsupervised gene selection and clustering algorithms, is demonstrated on six cancer and two arthritis data sets based on the class separability index and predictive accuracy of the naive Bayes' classifier, the K-nearest neighbor rule, and the support vector machine. PMID:20542768

  13. Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data

    PubMed Central

    Baross, Ágnes; Delaney, Allen D; Li, H Irene; Nayar, Tarun; Flibotte, Stephane; Qian, Hong; Chan, Susanna Y; Asano, Jennifer; Ally, Adrian; Cao, Manqiu; Birch, Patricia; Brown-John, Mabel; Fernandes, Nicole; Go, Anne; Kennedy, Giulia; Langlois, Sylvie; Eydoux, Patrice; Friedman, JM; Marra, Marco A

    2007-01-01

    Background Genomic deletions and duplications are important in the pathogenesis of diseases, such as cancer and mental retardation, and have recently been shown to occur frequently in unaffected individuals as polymorphisms. Affymetrix GeneChip whole genome sampling analysis (WGSA) combined with 100 K single nucleotide polymorphism (SNP) genotyping arrays is one of several microarray-based approaches that are now being used to detect such structural genomic changes. The popularity of this technology and its associated open source data format have resulted in the development of an increasing number of software packages for the analysis of copy number changes using these SNP arrays. Results We evaluated four publicly available software packages for high throughput copy number analysis using synthetic and empirical 100 K SNP array data sets, the latter obtained from 107 mental retardation (MR) patients and their unaffected parents and siblings. We evaluated the software with regards to overall suitability for high-throughput 100 K SNP array data analysis, as well as effectiveness of normalization, scaling with various reference sets and feature extraction, as well as true and false positive rates of genomic copy number variant (CNV) detection. Conclusion We observed considerable variation among the numbers and types of candidate CNVs detected by different analysis approaches, and found that multiple programs were needed to find all real aberrations in our test set. The frequency of false positive deletions was substantial, but could be greatly reduced by using the SNP genotype information to confirm loss of heterozygosity. PMID:17910767

  14. Forensic considerations for preprocessing effects on clinical MDCT scans.

    PubMed

    Wade, Andrew D; Conlogue, Gerald J

    2013-05-01

    Manipulation of digital photographs destined for medico-legal inquiry must be thoroughly documented and presented with explanation of any manipulations. Unlike digital photography, computed tomography (CT) data must pass through an additional step before viewing. Reconstruction of raw data involves reconstruction algorithms to preprocess the raw information into display data. Preprocessing of raw data, although it occurs at the source, alters the images and must be accounted for in the same way as postprocessing. Repeated CT scans of a gunshot wound phantom were made using the Toshiba Aquilion 64-slice multidetector CT scanner. The appearance of fragments, high-density inclusion artifacts, and soft tissue were assessed. Preprocessing with different algorithms results in substantial differences in image output. It is important to appreciate that preprocessing affects the image, that it does so differently in the presence of high-density inclusions, and that preprocessing algorithms and scanning parameters may be used to overcome the resulting artifacts.

  15. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm

    PubMed Central

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed. PMID:27446945

  16. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm.

    PubMed

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu; Tian, Suyan

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed. PMID:27446945

  17. Preoperative overnight parenteral nutrition (TPN) improves skeletal muscle protein metabolism indicated by microarray algorithm analyses in a randomized trial.

    PubMed

    Iresjö, Britt-Marie; Engström, Cecilia; Lundholm, Kent

    2016-06-01

    Loss of muscle mass is associated with increased risk of morbidity and mortality in hospitalized patients. Uncertainties of treatment efficiency by short-term artificial nutrition remain, specifically improvement of protein balance in skeletal muscles. In this study, algorithmic microarray analysis was applied to map cellular changes related to muscle protein metabolism in human skeletal muscle tissue during provision of overnight preoperative total parenteral nutrition (TPN). Twenty-two patients (11/group) scheduled for upper GI surgery due to malignant or benign disease received a continuous peripheral all-in-one TPN infusion (30 kcal/kg/day, 0.16 gN/kg/day) or saline infusion for 12 h prior operation. Biopsies from the rectus abdominis muscle were taken at the start of operation for isolation of muscle RNA RNA expression microarray analyses were performed with Agilent Sureprint G3, 8 × 60K arrays using one-color labeling. 447 mRNAs were differently expressed between study and control patients (P < 0.1). mRNAs related to ribosomal biogenesis, mRNA processing, and translation were upregulated during overnight nutrition; particularly anabolic signaling S6K1 (P < 0.01-0.1). Transcripts of genes associated with lysosomal degradation showed consistently lower expression during TPN while mRNAs for ubiquitin-mediated degradation of proteins as well as transcripts related to intracellular signaling pathways, PI3 kinase/MAPkinase, were either increased or decreased. In conclusion, muscle mRNA alterations during overnight standard TPN infusions at constant rate altered mRNAs associated with mTOR signaling; increased initiation of protein translation; and suppressed autophagy/lysosomal degradation of proteins. This indicates that overnight preoperative parenteral nutrition is effective to promote muscle protein metabolism. PMID:27273879

  18. Preoperative overnight parenteral nutrition (TPN) improves skeletal muscle protein metabolism indicated by microarray algorithm analyses in a randomized trial.

    PubMed

    Iresjö, Britt-Marie; Engström, Cecilia; Lundholm, Kent

    2016-06-01

    Loss of muscle mass is associated with increased risk of morbidity and mortality in hospitalized patients. Uncertainties of treatment efficiency by short-term artificial nutrition remain, specifically improvement of protein balance in skeletal muscles. In this study, algorithmic microarray analysis was applied to map cellular changes related to muscle protein metabolism in human skeletal muscle tissue during provision of overnight preoperative total parenteral nutrition (TPN). Twenty-two patients (11/group) scheduled for upper GI surgery due to malignant or benign disease received a continuous peripheral all-in-one TPN infusion (30 kcal/kg/day, 0.16 gN/kg/day) or saline infusion for 12 h prior operation. Biopsies from the rectus abdominis muscle were taken at the start of operation for isolation of muscle RNA RNA expression microarray analyses were performed with Agilent Sureprint G3, 8 × 60K arrays using one-color labeling. 447 mRNAs were differently expressed between study and control patients (P < 0.1). mRNAs related to ribosomal biogenesis, mRNA processing, and translation were upregulated during overnight nutrition; particularly anabolic signaling S6K1 (P < 0.01-0.1). Transcripts of genes associated with lysosomal degradation showed consistently lower expression during TPN while mRNAs for ubiquitin-mediated degradation of proteins as well as transcripts related to intracellular signaling pathways, PI3 kinase/MAPkinase, were either increased or decreased. In conclusion, muscle mRNA alterations during overnight standard TPN infusions at constant rate altered mRNAs associated with mTOR signaling; increased initiation of protein translation; and suppressed autophagy/lysosomal degradation of proteins. This indicates that overnight preoperative parenteral nutrition is effective to promote muscle protein metabolism.

  19. The preprocessed doacross loop

    NASA Technical Reports Server (NTRS)

    Saltz, Joel H.; Mirchandaney, Ravi

    1990-01-01

    Dependencies between loop iterations cannot always be characterized during program compilation. Doacross loops typically make use of a-priori knowledge of inter-iteration dependencies to carry out required synchronizations. A type of doacross loop is proposed that allows the scheduling of iterations of a loop among processors without advance knowledge of inter-iteration dependencies. The method proposed for loop iterations requires that parallelizable preprocessing and postprocessing steps be carried out during program execution.

  20. Comparing Binaural Pre-processing Strategies III

    PubMed Central

    Warzybok, Anna; Ernst, Stephan M. A.

    2015-01-01

    A comprehensive evaluation of eight signal pre-processing strategies, including directional microphones, coherence filters, single-channel noise reduction, binaural beamformers, and their combinations, was undertaken with normal-hearing (NH) and hearing-impaired (HI) listeners. Speech reception thresholds (SRTs) were measured in three noise scenarios (multitalker babble, cafeteria noise, and single competing talker). Predictions of three common instrumental measures were compared with the general perceptual benefit caused by the algorithms. The individual SRTs measured without pre-processing and individual benefits were objectively estimated using the binaural speech intelligibility model. Ten listeners with NH and 12 HI listeners participated. The participants varied in age and pure-tone threshold levels. Although HI listeners required a better signal-to-noise ratio to obtain 50% intelligibility than listeners with NH, no differences in SRT benefit from the different algorithms were found between the two groups. With the exception of single-channel noise reduction, all algorithms showed an improvement in SRT of between 2.1 dB (in cafeteria noise) and 4.8 dB (in single competing talker condition). Model predictions with binaural speech intelligibility model explained 83% of the measured variance of the individual SRTs in the no pre-processing condition. Regarding the benefit from the algorithms, the instrumental measures were not able to predict the perceptual data in all tested noise conditions. The comparable benefit observed for both groups suggests a possible application of noise reduction schemes for listeners with different hearing status. Although the model can predict the individual SRTs without pre-processing, further development is necessary to predict the benefits obtained from the algorithms at an individual level. PMID:26721922

  1. Retinex Preprocessing for Improved Multi-Spectral Image Classification

    NASA Technical Reports Server (NTRS)

    Thompson, B.; Rahman, Z.; Park, S.

    2000-01-01

    The goal of multi-image classification is to identify and label "similar regions" within a scene. The ability to correctly classify a remotely sensed multi-image of a scene is affected by the ability of the classification process to adequately compensate for the effects of atmospheric variations and sensor anomalies. Better classification may be obtained if the multi-image is preprocessed before classification, so as to reduce the adverse effects of image formation. In this paper, we discuss the overall impact on multi-spectral image classification when the retinex image enhancement algorithm is used to preprocess multi-spectral images. The retinex is a multi-purpose image enhancement algorithm that performs dynamic range compression, reduces the dependence on lighting conditions, and generally enhances apparent spatial resolution. The retinex has been successfully applied to the enhancement of many different types of grayscale and color images. We show in this paper that retinex preprocessing improves the spatial structure of multi-spectral images and thus provides better within-class variations than would otherwise be obtained without the preprocessing. For a series of multi-spectral images obtained with diffuse and direct lighting, we show that without retinex preprocessing the class spectral signatures vary substantially with the lighting conditions. Whereas multi-dimensional clustering without preprocessing produced one-class homogeneous regions, the classification on the preprocessed images produced multi-class non-homogeneous regions. This lack of homogeneity is explained by the interaction between different agronomic treatments applied to the regions: the preprocessed images are closer to ground truth. The principle advantage that the retinex offers is that for different lighting conditions classifications derived from the retinex preprocessed images look remarkably "similar", and thus more consistent, whereas classifications derived from the original

  2. Exploration, visualization, and preprocessing of high-dimensional data.

    PubMed

    Wu, Zhijin; Wu, Zhiqiang

    2010-01-01

    The rapid advances in biotechnology have given rise to a variety of high-dimensional data. Many of these data, including DNA microarray data, mass spectrometry protein data, and high-throughput screening (HTS) assay data, are generated by complex experimental procedures that involve multiple steps such as sample extraction, purification and/or amplification, labeling, fragmentation, and detection. Therefore, the quantity of interest is not directly obtained and a number of preprocessing procedures are necessary to convert the raw data into the format with biological relevance. This also makes exploratory data analysis and visualization essential steps to detect possible defects, anomalies or distortion of the data, to test underlying assumptions and thus ensure data quality. The characteristics of the data structure revealed in exploratory analysis often motivate decisions in preprocessing procedures to produce data suitable for downstream analysis. In this chapter we review the common techniques in exploring and visualizing high-dimensional data and introduce the basic preprocessing procedures.

  3. A data-driven algorithm for offline pupil signal preprocessing and eyeblink detection in low-speed eye-tracking protocols.

    PubMed

    Pedrotti, Marco; Lei, Shengguang; Dzaack, Jeronimo; Rötting, Matthias

    2011-06-01

    Event detection is the conversion of raw eye-tracking data into events--such as fixations, saccades, glissades, blinks, and so forth--that are relevant for researchers. In eye-tracking studies, event detection algorithms can have a serious impact on higher level analyses, although most studies do not accurately report their settings. We developed a data-driven eyeblink detection algorithm (Identification-Artifact Correction [I-AC]) for 50-Hz eye-tracking protocols. I-AC works by first correcting blink-related artifacts within pupil diameter values and then estimating blink onset and offset. Artifact correction is achieved with data-driven thresholds, and more reliable pupil data are output. Blink parameters are defined according to previous studies on blink-related visual suppression. Blink detection performance was tested with experimental data by visually checking the actual correspondence between I-AC output and participants' eye images, recorded by the eyetracker simultaneously with gaze data. Results showed a 97% correct detection percentage.

  4. Preprocessing of NMR metabolomics data.

    PubMed

    Euceda, Leslie R; Giskeødegård, Guro F; Bathen, Tone F

    2015-05-01

    Metabolomics involves the large scale analysis of metabolites and thus, provides information regarding cellular processes in a biological sample. Independently of the analytical technique used, a vast amount of data is always acquired when carrying out metabolomics studies; this results in complex datasets with large amounts of variables. This type of data requires multivariate statistical analysis for its proper biological interpretation. Prior to multivariate analysis, preprocessing of the data must be carried out to remove unwanted variation such as instrumental or experimental artifacts. This review aims to outline the steps in the preprocessing of NMR metabolomics data and describe some of the methods to perform these. Since using different preprocessing methods may produce different results, it is important that an appropriate pipeline exists for the selection of the optimal combination of methods in the preprocessing workflow.

  5. Preprocessing of raw metabonomic data.

    PubMed

    Vettukattil, Riyas

    2015-01-01

    Recent advances in metabolic profiling techniques allow global profiling of metabolites in cells, tissues, or organisms, using a wide range of analytical techniques such as nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS). The raw data acquired from these instruments are abundant with technical and structural complexity, which makes it statistically difficult to extract meaningful information. Preprocessing involves various computational procedures where data from the instruments (gas chromatography (GC)/liquid chromatography (LC)-MS, NMR spectra) are converted into a usable form for further analysis and biological interpretation. This chapter covers the common data preprocessing techniques used in metabonomics and is primarily focused on baseline correction, normalization, scaling, peak alignment, detection, and quantification. Recent years have witnessed development of several software tools for data preprocessing, and an overview of the frequently used tools in data preprocessing pipeline is covered.

  6. Context-based preprocessing of molecular docking data

    PubMed Central

    2013-01-01

    Background Data preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generation of comprehensive models from large datasets. In this article, we propose a context-based data preprocessing approach to mine data from molecular docking simulation results. The test cases used a fully-flexible receptor (FFR) model of Mycobacterium tuberculosis InhA enzyme (FFR_InhA) and four different ligands. Results We generated an initial set of attributes as well as their respective instances. To improve this initial set, we applied two selection strategies. The first was based on our context-based approach while the second used the CFS (Correlation-based Feature Selection) machine learning algorithm. Additionally, we produced an extra dataset containing features selected by combining our context strategy and the CFS algorithm. To demonstrate the effectiveness of the proposed method, we evaluated its performance based on various predictive (RMSE, MAE, Correlation, and Nodes) and context (Precision, Recall and FScore) measures. Conclusions Statistical analysis of the results shows that the proposed context-based data preprocessing approach significantly improves predictive and context measures and outperforms the CFS algorithm. Context-based data preprocessing improves mining results by producing superior interpretable models, which makes it well-suited for practical applications in molecular docking simulations using FFR models. PMID:24564276

  7. Compact Circuit Preprocesses Accelerometer Output

    NASA Technical Reports Server (NTRS)

    Bozeman, Richard J., Jr.

    1993-01-01

    Compact electronic circuit transfers dc power to, and preprocesses ac output of, accelerometer and associated preamplifier. Incorporated into accelerometer case during initial fabrication or retrofit onto commercial accelerometer. Made of commercial integrated circuits and other conventional components; made smaller by use of micrologic and surface-mount technology.

  8. Optimal Preprocessing Of GPS Data

    NASA Technical Reports Server (NTRS)

    Wu, Sien-Chong; Melbourne, William G.

    1994-01-01

    Improved technique for preprocessing data from Global Positioning System (GPS) receiver reduces processing time and number of data to be stored. Technique optimal in sense it maintains strength of data. Also sometimes increases ability to resolve ambiguities in numbers of cycles of received GPS carrier signals.

  9. Optimal Preprocessing Of GPS Data

    NASA Technical Reports Server (NTRS)

    Wu, Sien-Chong; Melbourne, William G.

    1994-01-01

    Improved technique for preprocessing data from Global Positioning System receiver reduces processing time and number of data to be stored. Optimal in sense that it maintains strength of data. Also increases ability to resolve ambiguities in numbers of cycles of received GPS carrier signals.

  10. Arabic handwritten: pre-processing and segmentation

    NASA Astrophysics Data System (ADS)

    Maliki, Makki; Jassim, Sabah; Al-Jawad, Naseer; Sellahewa, Harin

    2012-06-01

    This paper is concerned with pre-processing and segmentation tasks that influence the performance of Optical Character Recognition (OCR) systems and handwritten/printed text recognition. In Arabic, these tasks are adversely effected by the fact that many words are made up of sub-words, with many sub-words there associated one or more diacritics that are not connected to the sub-word's body; there could be multiple instances of sub-words overlap. To overcome these problems we investigate and develop segmentation techniques that first segment a document into sub-words, link the diacritics with their sub-words, and removes possible overlapping between words and sub-words. We shall also investigate two approaches for pre-processing tasks to estimate sub-words baseline, and to determine parameters that yield appropriate slope correction, slant removal. We shall investigate the use of linear regression on sub-words pixels to determine their central x and y coordinates, as well as their high density part. We also develop a new incremental rotation procedure to be performed on sub-words that determines the best rotation angle needed to realign baselines. We shall demonstrate the benefits of these proposals by conducting extensive experiments on publicly available databases and in-house created databases. These algorithms help improve character segmentation accuracy by transforming handwritten Arabic text into a form that could benefit from analysis of printed text.

  11. Analysis of High-Throughput ELISA Microarray Data

    SciTech Connect

    White, Amanda M.; Daly, Don S.; Zangar, Richard C.

    2011-02-23

    Our research group develops analytical methods and software for the high-throughput analysis of quantitative enzyme-linked immunosorbent assay (ELISA) microarrays. ELISA microarrays differ from DNA microarrays in several fundamental aspects and most algorithms for analysis of DNA microarray data are not applicable to ELISA microarrays. In this review, we provide an overview of the steps involved in ELISA microarray data analysis and how the statistically sound algorithms we have developed provide an integrated software suite to address the needs of each data-processing step. The algorithms discussed are available in a set of open-source software tools (http://www.pnl.gov/statistics/ProMAT).

  12. Protein Microarrays

    NASA Astrophysics Data System (ADS)

    Ricard-Blum, S.

    Proteins are key actors in the life of the cell, involved in many physiological and pathological processes. Since variations in the expression of messenger RNA are not systematically correlated with variations in the protein levels, the latter better reflect the way a cell functions. Protein microarrays thus supply complementary information to DNA chips. They are used in particular to analyse protein expression profiles, to detect proteins within complex biological media, and to study protein-protein interactions, which give information about the functions of those proteins [3-9]. They have the same advantages as DNA microarrays for high-throughput analysis, miniaturisation, and the possibility of automation. Section 18.1 gives a brief overview of proteins. Following this, Sect. 18.2 describes how protein microarrays can be made on flat supports, explaining how proteins can be produced and immobilised on a solid support, and discussing the different kinds of substrate and detection method. Section 18.3 discusses the particular format of protein microarrays in suspension. The diversity of protein microarrays and their applications are then reported in Sect. 18.4, with applications to therapeutics (protein-drug interactions) and diagnostics. The prospects for future developments of protein microarrays are then outlined in the conclusion. The bibliography provides an extensive list of reviews and detailed references for those readers who wish to go further in this area. Indeed, the aim of the present chapter is not to give an exhaustive or detailed analysis of the state of the art, but rather to provide the reader with the basic elements needed to understand how proteins are designed and used.

  13. Image preprocessing study on KPCA-based face recognition

    NASA Astrophysics Data System (ADS)

    Li, Xuan; Li, Dehua

    2015-12-01

    Face recognition as an important biometric identification method, with its friendly, natural, convenient advantages, has obtained more and more attention. This paper intends to research a face recognition system including face detection, feature extraction and face recognition, mainly through researching on related theory and the key technology of various preprocessing methods in face detection process, using KPCA method, focuses on the different recognition results in different preprocessing methods. In this paper, we choose YCbCr color space for skin segmentation and choose integral projection for face location. We use erosion and dilation of the opening and closing operation and illumination compensation method to preprocess face images, and then use the face recognition method based on kernel principal component analysis method for analysis and research, and the experiments were carried out using the typical face database. The algorithms experiment on MATLAB platform. Experimental results show that integration of the kernel method based on PCA algorithm under certain conditions make the extracted features represent the original image information better for using nonlinear feature extraction method, which can obtain higher recognition rate. In the image preprocessing stage, we found that images under various operations may appear different results, so as to obtain different recognition rate in recognition stage. At the same time, in the process of the kernel principal component analysis, the value of the power of the polynomial function can affect the recognition result.

  14. Preprocessing and analysis of the ECG signals

    NASA Astrophysics Data System (ADS)

    Zhu, Jianmin; Zhang, Xiaolan; Wang, Zhongyu; Wang, Xiaoling

    2008-10-01

    According to the request of automatic analysis and depressing high frequency interference of the ECG signals, this paper applies low-pass filter to preprocess ECG signals, and proposes a QRS complex detection method based on wavelet transform, which takes advantage of Marr wavelet to decompose and filter the ECG signals with Mallat algorithm, using the relationship between wavelet transform and signal singularity to detect QRS complex with amplitude threshold method in scale 3, and to detect P wave and R wave in scale 4. Meanwhile, compositive detection method is used for re-detection, thus to improving the detection accuracy ratio. At last, records from ECG database of MIT/BIH which is widely accepted in the world are used to test the algorithm. And the result shows that correction detecting ratio under this algorithm has been more than 99.8 percent. The detection method in this paper is simple and running fast, and is easy to be realized in the real-time detecting system using for clinical diagnosis.

  15. Tissue Microarrays.

    PubMed

    Dancau, Ana-Maria; Simon, Ronald; Mirlacher, Martina; Sauter, Guido

    2016-01-01

    Modern next-generation sequencing and microarray technologies allow for the simultaneous analysis of all human genes on the DNA, RNA, miRNA, and methylation RNA level. Studies using such techniques have lead to the identification of hundreds of genes with a potential role in cancer or other diseases. The validation of all of these candidate genes requires in situ analysis of high numbers of clinical tissues samples. The tissue microarray technology greatly facilitates such analysis. In this method minute tissue samples (typically 0.6 mm in diameter) from up to 1000 different tissues can be analyzed on one microscope glass slide. All in situ methods suitable for histological studies can be applied to TMAs without major changes of protocols, including immunohistochemistry, fluorescence in situ hybridization, or RNA in situ hybridization. Because all tissues are analyzed simultaneously with the same batch of reagents, TMA studies provide an unprecedented degree of standardization, speed, and cost efficiency.

  16. Chromosome Microarray.

    PubMed

    Anderson, Sharon

    2016-01-01

    Over the last half century, knowledge about genetics, genetic testing, and its complexity has flourished. Completion of the Human Genome Project provided a foundation upon which the accuracy of genetics, genomics, and integration of bioinformatics knowledge and testing has grown exponentially. What is lagging, however, are efforts to reach and engage nurses about this rapidly changing field. The purpose of this article is to familiarize nurses with several frequently ordered genetic tests including chromosomes and fluorescence in situ hybridization followed by a comprehensive review of chromosome microarray. It shares the complexity of microarray including how testing is performed and results analyzed. A case report demonstrates how this technology is applied in clinical practice and reveals benefits and limitations of this scientific and bioinformatics genetic technology. Clinical implications for maternal-child nurses across practice levels are discussed. PMID:27276104

  17. Microarray data analysis and mining approaches.

    PubMed

    Cordero, Francesca; Botta, Marco; Calogero, Raffaele A

    2007-12-01

    Microarray based transcription profiling is now a consolidated methodology and has widespread use in areas such as pharmacogenomics, diagnostics and drug target identification. Large-scale microarray studies are also becoming crucial to a new way of conceiving experimental biology. A main issue in microarray transcription profiling is data analysis and mining. When microarrays became a methodology of general use, considerable effort was made to produce algorithms and methods for the identification of differentially expressed genes. More recently, the focus has switched to algorithms and database development for microarray data mining. Furthermore, the evolution of microarray technology is allowing researchers to grasp the regulative nature of transcription, integrating basic expression analysis with mRNA characteristics, i.e. exon-based arrays, and with DNA characteristics, i.e. comparative genomic hybridization, single nucleotide polymorphism, tiling and promoter structure. In this article, we will review approaches used to detect differentially expressed genes and to link differential expression to specific biological functions.

  18. Automated Microarray Image Analysis Toolbox for MATLAB

    SciTech Connect

    White, Amanda M.; Daly, Don S.; Willse, Alan R.; Protic, Miroslava; Chandler, Darrell P.

    2005-09-01

    The Automated Microarray Image Analysis (AMIA) Toolbox for MATLAB is a flexible, open-source microarray image analysis tool that allows the user to customize analysis of sets of microarray images. This tool provides several methods of identifying and quantify spot statistics, as well as extensive diagnostic statistics and images to identify poor data quality or processing. The open nature of this software allows researchers to understand the algorithms used to provide intensity estimates and to modify them easily if desired.

  19. Adaptive fingerprint image enhancement with emphasis on preprocessing of data.

    PubMed

    Bartůnek, Josef Ström; Nilsson, Mikael; Sällberg, Benny; Claesson, Ingvar

    2013-02-01

    This article proposes several improvements to an adaptive fingerprint enhancement method that is based on contextual filtering. The term adaptive implies that parameters of the method are automatically adjusted based on the input fingerprint image. Five processing blocks comprise the adaptive fingerprint enhancement method, where four of these blocks are updated in our proposed system. Hence, the proposed overall system is novel. The four updated processing blocks are: 1) preprocessing; 2) global analysis; 3) local analysis; and 4) matched filtering. In the preprocessing and local analysis blocks, a nonlinear dynamic range adjustment method is used. In the global analysis and matched filtering blocks, different forms of order statistical filters are applied. These processing blocks yield an improved and new adaptive fingerprint image processing method. The performance of the updated processing blocks is presented in the evaluation part of this paper. The algorithm is evaluated toward the NIST developed NBIS software for fingerprint recognition on FVC databases.

  20. Research on pre-processing of QR Code

    NASA Astrophysics Data System (ADS)

    Sun, Haixing; Xia, Haojie; Dong, Ning

    2013-10-01

    QR code encodes many kinds of information because of its advantages: large storage capacity, high reliability, full arrange of utter-high-speed reading, small printing size and high-efficient representation of Chinese characters, etc. In order to obtain the clearer binarization image from complex background, and improve the recognition rate of QR code, this paper researches on pre-processing methods of QR code (Quick Response Code), and shows algorithms and results of image pre-processing for QR code recognition. Improve the conventional method by changing the Souvola's adaptive text recognition method. Additionally, introduce the QR code Extraction which adapts to different image size, flexible image correction approach, and improve the efficiency and accuracy of QR code image processing.

  1. Feature detection techniques for preprocessing proteomic data.

    PubMed

    Sellers, Kimberly F; Miecznikowski, Jeffrey C

    2010-01-01

    Numerous gel-based and nongel-based technologies are used to detect protein changes potentially associated with disease. The raw data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. Low-level analysis issues (including normalization, background correction, gel and/or spectral alignment, feature detection, and image registration) are substantial problems that need to be addressed, because any large-level data analyses are contingent on appropriate and statistically sound low-level procedures. Feature detection approaches are particularly interesting due to the increased computational speed associated with subsequent calculations. Such summary data corresponding to image features provide a significant reduction in overall data size and structure while retaining key information. In this paper, we focus on recent advances in feature detection as a tool for preprocessing proteomic data. This work highlights existing and newly developed feature detection algorithms for proteomic datasets, particularly relating to time-of-flight mass spectrometry, and two-dimensional gel electrophoresis. Note, however, that the associated data structures (i.e., spectral data, and images containing spots) used as input for these methods are obtained via all gel-based and nongel-based methods discussed in this manuscript, and thus the discussed methods are likewise applicable.

  2. DNA Microarrays

    NASA Astrophysics Data System (ADS)

    Nguyen, C.; Gidrol, X.

    Genomics has revolutionised biological and biomedical research. This revolution was predictable on the basis of its two driving forces: the ever increasing availability of genome sequences and the development of new technology able to exploit them. Up until now, technical limitations meant that molecular biology could only analyse one or two parameters per experiment, providing relatively little information compared with the great complexity of the systems under investigation. This gene by gene approach is inadequate to understand biological systems containing several thousand genes. It is essential to have an overall view of the DNA, RNA, and relevant proteins. A simple inventory of the genome is not sufficient to understand the functions of the genes, or indeed the way that cells and organisms work. For this purpose, functional studies based on whole genomes are needed. Among these new large-scale methods of molecular analysis, DNA microarrays provide a way of studying the genome and the transcriptome. The idea of integrating a large amount of data derived from a support with very small area has led biologists to call these chips, borrowing the term from the microelectronics industry. At the beginning of the 1990s, the development of DNA chips on nylon membranes [1, 2], then on glass [3] and silicon [4] supports, made it possible for the first time to carry out simultaneous measurements of the equilibrium concentration of all the messenger RNA (mRNA) or transcribed RNA in a cell. These microarrays offer a wide range of applications, in both fundamental and clinical research, providing a method for genome-wide characterisation of changes occurring within a cell or tissue, as for example in polymorphism studies, detection of mutations, and quantitative assays of gene copies. With regard to the transcriptome, it provides a way of characterising differentially expressed genes, profiling given biological states, and identifying regulatory channels.

  3. hemaClass.org: Online One-By-One Microarray Normalization and Classification of Hematological Cancers for Precision Medicine

    PubMed Central

    Falgreen, Steffen; Ellern Bilgrau, Anders; Brøndum, Rasmus Froberg; Hjort Jakobsen, Lasse; Have, Jonas; Lindblad Nielsen, Kasper; El-Galaly, Tarec Christoffer; Bødker, Julie Støve; Schmitz, Alexander; H. Young, Ken; Johnsen, Hans Erik; Dybkær, Karen; Bøgsted, Martin

    2016-01-01

    Background Dozens of omics based cancer classification systems have been introduced with prognostic, diagnostic, and predictive capabilities. However, they often employ complex algorithms and are only applicable on whole cohorts of patients, making them difficult to apply in a personalized clinical setting. Results This prompted us to create hemaClass.org, an online web application providing an easy interface to one-by-one RMA normalization of microarrays and subsequent risk classifications of diffuse large B-cell lymphoma (DLBCL) into cell-of-origin and chemotherapeutic sensitivity classes. Classification results for one-by-one array pre-processing with and without a laboratory specific RMA reference dataset were compared to cohort based classifiers in 4 publicly available datasets. Classifications showed high agreement between one-by-one and whole cohort pre-processsed data when a laboratory specific reference set was supplied. The website is essentially the R-package hemaClass accompanied by a Shiny web application. The well-documented package can be used to run the website locally or to use the developed methods programmatically. Conclusions The website and R-package is relevant for biological and clinical lymphoma researchers using affymetrix U-133 Plus 2 arrays, as it provides reliable and swift methods for calculation of disease subclasses. The proposed one-by-one pre-processing method is relevant for all researchers using microarrays. PMID:27701436

  4. Evaluation of the efficiency of continuous wavelet transform as processing and preprocessing algorithm for resolution of overlapped signals in univariate and multivariate regression analyses; an application to ternary and quaternary mixtures

    NASA Astrophysics Data System (ADS)

    Hegazy, Maha A.; Lotfy, Hayam M.; Mowaka, Shereen; Mohamed, Ekram Hany

    2016-07-01

    Wavelets have been adapted for a vast number of signal-processing applications due to the amount of information that can be extracted from a signal. In this work, a comparative study on the efficiency of continuous wavelet transform (CWT) as a signal processing tool in univariate regression and a pre-processing tool in multivariate analysis using partial least square (CWT-PLS) was conducted. These were applied to complex spectral signals of ternary and quaternary mixtures. CWT-PLS method succeeded in the simultaneous determination of a quaternary mixture of drotaverine (DRO), caffeine (CAF), paracetamol (PAR) and p-aminophenol (PAP, the major impurity of paracetamol). While, the univariate CWT failed to simultaneously determine the quaternary mixture components and was able to determine only PAR and PAP, the ternary mixtures of DRO, CAF, and PAR and CAF, PAR, and PAP. During the calculations of CWT, different wavelet families were tested. The univariate CWT method was validated according to the ICH guidelines. While for the development of the CWT-PLS model a calibration set was prepared by means of an orthogonal experimental design and their absorption spectra were recorded and processed by CWT. The CWT-PLS model was constructed by regression between the wavelet coefficients and concentration matrices and validation was performed by both cross validation and external validation sets. Both methods were successfully applied for determination of the studied drugs in pharmaceutical formulations.

  5. Computational biology of genome expression and regulation--a review of microarray bioinformatics.

    PubMed

    Wang, Junbai

    2008-01-01

    Microarray technology is being used widely in various biomedical research areas; the corresponding microarray data analysis is an essential step toward the best utilizing of array technologies. Here we review two components of the microarray data analysis: a low level of microarray data analysis that emphasizes the designing, the quality control, and the preprocessing of microarray experiments, then a high level of microarray data analysis that focuses on the domain-specific microarray applications such as tumor classification, biomarker prediction, analyzing array CGH experiments, and reverse engineering of gene expression networks. Additionally, we will review the recent development of building a predictive model in genome expression and regulation studies. This review may help biologists grasp a basic knowledge of microarray bioinformatics as well as its potential impact on the future evolvement of biomedical research fields.

  6. Aptamer Microarrays

    SciTech Connect

    Angel-Syrett, Heather; Collett, Jim; Ellington, Andrew D.

    2009-01-02

    In vitro selection can yield specific, high-affinity aptamers. We and others have devised methods for the automated selection of aptamers, and have begun to use these reagents for the construction of arrays. Arrayed aptamers have proven to be almost as sensitive as their solution phase counterparts, and when ganged together can provide both specific and general diagnostic signals for proteins and other analytes. We describe here technical details regarding the production and processing of aptamer microarrays, including blocking, washing, drying, and scanning. We will also discuss the challenges involved in developing standardized and reproducible methods for binding and quantitating protein targets. While signals from fluorescent analytes or sandwiches are typically captured, it has proven possible for immobilized aptamers to be uniquely coupled to amplification methods not available to protein reagents, thus allowing for protein-binding signals to be greatly amplified. Into the future, many of the biosensor methods described in this book can potentially be adapted to array formats, thus further expanding the utility of and applications for aptamer arrays.

  7. PREPROCESSING MAGNETIC FIELDS WITH CHROMOSPHERIC LONGITUDINAL FIELDS

    SciTech Connect

    Yamamoto, Tetsuya T.; Kusano, K.

    2012-06-20

    Nonlinear force-free field (NLFFF) extrapolation is a powerful tool for the modeling of the magnetic field in the solar corona. However, since the photospheric magnetic field does not in general satisfy the force-free condition, some kind of processing is required to assimilate data into the model. In this paper, we report the results of new preprocessing for the NLFFF extrapolation. Through this preprocessing, we expect to obtain magnetic field data similar to those in the chromosphere. In our preprocessing, we add a new term concerning chromospheric longitudinal fields into the optimization function proposed by Wiegelmann et al. We perform a parameter survey of six free parameters to find minimum force- and torque-freeness with the simulated-annealing method. Analyzed data are a photospheric vector magnetogram of AR 10953 observed with the Hinode spectropolarimeter and a chromospheric longitudinal magnetogram observed with SOLIS spectropolarimeter. It is found that some preprocessed fields show the smallest force- and torque-freeness and are very similar to the chromospheric longitudinal fields. On the other hand, other preprocessed fields show noisy maps, although the force- and torque-freeness are of the same order. By analyzing preprocessed noisy maps in the wave number space, we found that small and large wave number components balance out on the force-free index. We also discuss our iteration limit of the simulated-annealing method and magnetic structure broadening in the chromosphere.

  8. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    PubMed

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  9. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories

    PubMed Central

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-01-01

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp. PMID:27657141

  10. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    PubMed

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-01-01

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp. PMID:27657141

  11. Effects of preprocessing Landsat MSS data on derived features

    NASA Technical Reports Server (NTRS)

    Parris, T. M.; Cicone, R. C.

    1983-01-01

    Important to the use of multitemporal Landsat MSS data for earth resources monitoring, such as agricultural inventories, is the ability to minimize the effects of varying atmospheric and satellite viewing conditions, while extracting physically meaningful features from the data. In general, the approaches to the preprocessing problem have been derived from either physical or statistical models. This paper compares three proposed algorithms; XSTAR haze correction, Color Normalization, and Multiple Acquisition Mean Level Adjustment. These techniques represent physical, statistical, and hybrid physical-statistical models, respectively. The comparisons are made in the context of three feature extraction techniques; the Tasseled Cap, the Cate Color Cube. and Normalized Difference.

  12. An Automated, Adaptive Framework for Optimizing Preprocessing Pipelines in Task-Based Functional MRI.

    PubMed

    Churchill, Nathan W; Spring, Robyn; Afshin-Pour, Babak; Dong, Fan; Strother, Stephen C

    2015-01-01

    BOLD fMRI is sensitive to blood-oxygenation changes correlated with brain function; however, it is limited by relatively weak signal and significant noise confounds. Many preprocessing algorithms have been developed to control noise and improve signal detection in fMRI. Although the chosen set of preprocessing and analysis steps (the "pipeline") significantly affects signal detection, pipelines are rarely quantitatively validated in the neuroimaging literature, due to complex preprocessing interactions. This paper outlines and validates an adaptive resampling framework for evaluating and optimizing preprocessing choices by optimizing data-driven metrics of task prediction and spatial reproducibility. Compared to standard "fixed" preprocessing pipelines, this optimization approach significantly improves independent validation measures of within-subject test-retest, and between-subject activation overlap, and behavioural prediction accuracy. We demonstrate that preprocessing choices function as implicit model regularizers, and that improvements due to pipeline optimization generalize across a range of simple to complex experimental tasks and analysis models. Results are shown for brief scanning sessions (<3 minutes each), demonstrating that with pipeline optimization, it is possible to obtain reliable results and brain-behaviour correlations in relatively small datasets.

  13. An Automated, Adaptive Framework for Optimizing Preprocessing Pipelines in Task-Based Functional MRI

    PubMed Central

    Churchill, Nathan W.; Spring, Robyn; Afshin-Pour, Babak; Dong, Fan; Strother, Stephen C.

    2015-01-01

    BOLD fMRI is sensitive to blood-oxygenation changes correlated with brain function; however, it is limited by relatively weak signal and significant noise confounds. Many preprocessing algorithms have been developed to control noise and improve signal detection in fMRI. Although the chosen set of preprocessing and analysis steps (the “pipeline”) significantly affects signal detection, pipelines are rarely quantitatively validated in the neuroimaging literature, due to complex preprocessing interactions. This paper outlines and validates an adaptive resampling framework for evaluating and optimizing preprocessing choices by optimizing data-driven metrics of task prediction and spatial reproducibility. Compared to standard “fixed” preprocessing pipelines, this optimization approach significantly improves independent validation measures of within-subject test-retest, and between-subject activation overlap, and behavioural prediction accuracy. We demonstrate that preprocessing choices function as implicit model regularizers, and that improvements due to pipeline optimization generalize across a range of simple to complex experimental tasks and analysis models. Results are shown for brief scanning sessions (<3 minutes each), demonstrating that with pipeline optimization, it is possible to obtain reliable results and brain-behaviour correlations in relatively small datasets. PMID:26161667

  14. Effect of Preprocessing for Result of Independent component analysis.

    PubMed

    Zhang, Yun

    2005-01-01

    In recent years, scientists, doctors in the field of biomedical engineering and researchers of the correlated fields have been concentrating on study of activities of bioelectricity of different cortex fields of human brain on the condition of different evocable and cognitive stimulations, and try to test human psychology and physiology, and control exterior environment. Independent component analysis is a tool that can help people to distinguish and understand various EEG signals. To the signal that we know very little, we can get very good explanation by using independent component analysis. In this paper a new algorithm is introduced that is adapted to the preprocessing of data that is dealt with by independent component analysis. This algorithm can not only accelerates the decomposing speed of independent component, but also can get the higher amplitude of extraction of steady-state visual evoked potentials.

  15. Groundtruth approach to accurate quantitation of fluorescence microarrays

    SciTech Connect

    Mascio-Kegelmeyer, L; Tomascik-Cheeseman, L; Burnett, M S; van Hummelen, P; Wyrobek, A J

    2000-12-01

    To more accurately measure fluorescent signals from microarrays, we calibrated our acquisition and analysis systems by using groundtruth samples comprised of known quantities of red and green gene-specific DNA probes hybridized to cDNA targets. We imaged the slides with a full-field, white light CCD imager and analyzed them with our custom analysis software. Here we compare, for multiple genes, results obtained with and without preprocessing (alignment, color crosstalk compensation, dark field subtraction, and integration time). We also evaluate the accuracy of various image processing and analysis techniques (background subtraction, segmentation, quantitation and normalization). This methodology calibrates and validates our system for accurate quantitative measurement of microarrays. Specifically, we show that preprocessing the images produces results significantly closer to the known ground-truth for these samples.

  16. Microarrays, Integrated Analytical Systems

    NASA Astrophysics Data System (ADS)

    Combinatorial chemistry is used to find materials that form sensor microarrays. This book discusses the fundamentals, and then proceeds to the many applications of microarrays, from measuring gene expression (DNA microarrays) to protein-protein interactions, peptide chemistry, carbodhydrate chemistry, electrochemical detection, and microfluidics.

  17. Preprocessing Moist Lignocellulosic Biomass for Biorefinery Feedstocks

    SciTech Connect

    Neal Yancey; Christopher T. Wright; Craig Conner; J. Richard Hess

    2009-06-01

    Biomass preprocessing is one of the primary operations in the feedstock assembly system of a lignocellulosic biorefinery. Preprocessing is generally accomplished using industrial grinders to format biomass materials into a suitable biorefinery feedstock for conversion to ethanol and other bioproducts. Many factors affect machine efficiency and the physical characteristics of preprocessed biomass. For example, moisture content of the biomass as received from the point of production has a significant impact on overall system efficiency and can significantly affect the characteristics (particle size distribution, flowability, storability, etc.) of the size-reduced biomass. Many different grinder configurations are available on the market, each with advantages under specific conditions. Ultimately, the capacity and/or efficiency of the grinding process can be enhanced by selecting the grinder configuration that optimizes grinder performance based on moisture content and screen size. This paper discusses the relationships of biomass moisture with respect to preprocessing system performance and product physical characteristics and compares data obtained on corn stover, switchgrass, and wheat straw as model feedstocks during Vermeer HG 200 grinder testing. During the tests, grinder screen configuration and biomass moisture content were varied and tested to provide a better understanding of their relative impact on machine performance and the resulting feedstock physical characteristics and uniformity relative to each crop tested.

  18. Efficient Preprocessing technique using Web log mining

    NASA Astrophysics Data System (ADS)

    Raiyani, Sheetal A.; jain, Shailendra

    2012-11-01

    Web Usage Mining can be described as the discovery and Analysis of user access pattern through mining of log files and associated data from a particular websites. No. of visitors interact daily with web sites around the world. enormous amount of data are being generated and these information could be very prize to the company in the field of accepting Customerís behaviors. In this paper a complete preprocessing style having data cleaning, user and session Identification activities to improve the quality of data. Efficient preprocessing technique one of the User Identification which is key issue in preprocessing technique phase is to identify the Unique web users. Traditional User Identification is based on the site structure, being supported by using some heuristic rules, for use of this reduced the efficiency of user identification solve this difficulty we introduced proposed Technique DUI (Distinct User Identification) based on IP address ,Agent and Session time ,Referred pages on desired session time. Which can be used in counter terrorism, fraud detection and detection of unusual access of secure data, as well as through detection of regular access behavior of users improve the overall designing and performance of upcoming access of preprocessing results.

  19. Manufacturing of microarrays.

    PubMed

    Petersen, David W; Kawasaki, Ernest S

    2007-01-01

    DNA microarray technology has become a powerful tool in the arsenal of the molecular biologist. Capitalizing on high precision robotics and the wealth of DNA sequences annotated from the genomes of a large number of organisms, the manufacture of microarrays is now possible for the average academic laboratory with the funds and motivation. Microarray production requires attention to both biological and physical resources, including DNA libraries, robotics, and qualified personnel. While the fabrication of microarrays is a very labor-intensive process, production of quality microarrays individually tailored on a project-by-project basis will help researchers shed light on future scientific questions.

  20. Complex and magnitude-only preprocessing of 2D and 3D BOLD fMRI data at 7 T.

    PubMed

    Barry, Robert L; Strother, Stephen C; Gore, John C

    2012-03-01

    A challenge to ultra high field functional magnetic resonance imaging is the predominance of noise associated with physiological processes unrelated to tasks of interest. This degradation in data quality may be partially reversed using a series of preprocessing algorithms designed to retrospectively estimate and remove the effects of these noise sources. However, such algorithms are routinely validated only in isolation, and thus consideration of their efficacies within realistic preprocessing pipelines and on different data sets is often overlooked. We investigate the application of eight possible combinations of three pseudo-complementary preprocessing algorithms - phase regression, Stockwell transform filtering, and retrospective image correction - to suppress physiological noise in 2D and 3D functional data at 7 T. The performance of each preprocessing pipeline was evaluated using data-driven metrics of reproducibility and prediction. The optimal preprocessing pipeline for both 2D and 3D functional data included phase regression, Stockwell transform filtering, and retrospective image correction. This result supports the hypothesis that a complex preprocessing pipeline is preferable to a magnitude-only pipeline, and suggests that functional magnetic resonance imaging studies should retain complex images and externally monitor subjects' respiratory and cardiac cycles so that these supplementary data may be used to retrospectively reduce noise and enhance overall data quality.

  1. The Effects of Pre-processing Strategies for Pediatric Cochlear Implant Recipients

    PubMed Central

    Rakszawski, Bernadette; Wright, Rose; Cadieux, Jamie H.; Davidson, Lisa S.; Brenner, Christine

    2016-01-01

    Background Cochlear implants (CIs) have been shown to improve children’s speech recognition over traditional amplification when severe to profound sensorineural hearing loss is present. Despite improvements, understanding speech at low-level intensities or in the presence of background noise remains difficult. In an effort to improve speech understanding in challenging environments, Cochlear Ltd. offers pre-processing strategies that apply various algorithms prior to mapping the signal to the internal array. Two of these strategies include Autosensitivity Control™ (ASC) and Adaptive Dynamic Range Optimization (ADRO®). Based on previous research, the manufacturer’s default pre-processing strategy for pediatrics’ everyday programs combines ASC+ADRO®. Purpose The purpose of this study is to compare pediatric speech perception performance across various pre-processing strategies while applying a specific programming protocol utilizing increased threshold (T) levels to ensure access to very low-level sounds. Research Design This was a prospective, cross-sectional, observational study. Participants completed speech perception tasks in four pre-processing conditions: no pre-processing, ADRO®, ASC, ASC+ADRO®. Study Sample Eleven pediatric Cochlear Ltd. cochlear implant users were recruited: six bilateral, one unilateral, and four bimodal. Intervention Four programs, with the participants’ everyday map, were loaded into the processor with different pre-processing strategies applied in each of the four positions: no pre-processing, ADRO®, ASC, and ASC+ADRO®. Data Collection and Analysis Participants repeated CNC words presented at 50 and 70 dB SPL in quiet and HINT sentences presented adaptively with competing R-Space noise at 60 and 70 dB SPL. Each measure was completed as participants listened with each of the four pre-processing strategies listed above. Test order and condition were randomized. A repeated-measures analysis of variance (ANOVA) was used to

  2. Advances in Image Pre-Processing to Improve Automated 3d Reconstruction

    NASA Astrophysics Data System (ADS)

    Ballabeni, A.; Apollonio, F. I.; Gaiani, M.; Remondino, F.

    2015-02-01

    Tools and algorithms for automated image processing and 3D reconstruction purposes have become more and more available, giving the possibility to process any dataset of unoriented and markerless images. Typically, dense 3D point clouds (or texture 3D polygonal models) are produced at reasonable processing time. In this paper, we evaluate how the radiometric pre-processing of image datasets (particularly in RAW format) can help in improving the performances of state-of-the-art automated image processing tools. Beside a review of common pre-processing methods, an efficient pipeline based on color enhancement, image denoising, RGB to Gray conversion and image content enrichment is presented. The performed tests, partly reported for sake of space, demonstrate how an effective image pre-processing, which considers the entire dataset in analysis, can improve the automated orientation procedure and dense 3D point cloud reconstruction, even in case of poor texture scenarios.

  3. The preprocessing of multispectral data. II. [of Landsat satellite

    NASA Technical Reports Server (NTRS)

    Quiel, F.

    1976-01-01

    It is pointed out that a correction of atmospheric effects is an important requirement for a full utilization of the possibilities provided by preprocessing techniques. The most significant characteristics of original and preprocessed data are considered, taking into account the solution of classification problems by means of the preprocessing procedure. Improvements obtainable with different preprocessing techniques are illustrated with the aid of examples involving Landsat data regarding an area in Colorado.

  4. Reliable RANSAC Using a Novel Preprocessing Model

    PubMed Central

    Wang, Xiaoyan; Zhang, Hui; Liu, Sheng

    2013-01-01

    Geometric assumption and verification with RANSAC has become a crucial step for corresponding to local features due to its wide applications in biomedical feature analysis and vision computing. However, conventional RANSAC is very time-consuming due to redundant sampling times, especially dealing with the case of numerous matching pairs. This paper presents a novel preprocessing model to explore a reduced set with reliable correspondences from initial matching dataset. Both geometric model generation and verification are carried out on this reduced set, which leads to considerable speedups. Afterwards, this paper proposes a reliable RANSAC framework using preprocessing model, which was implemented and verified using Harris and SIFT features, respectively. Compared with traditional RANSAC, experimental results show that our method is more efficient. PMID:23509601

  5. Microarrays in hematology.

    PubMed

    Walker, Josef; Flower, Darren; Rigley, Kevin

    2002-01-01

    Microarrays are fast becoming routine tools for the high-throughput analysis of gene expression in a wide range of biologic systems, including hematology. Although a number of approaches can be taken when implementing microarray-based studies, all are capable of providing important insights into biologic function. Although some technical issues have not been resolved, microarrays will continue to make a significant impact on hematologically important research. PMID:11753074

  6. Antibiotic treatment algorithm development based on a microarray nucleic acid assay for rapid bacterial identification and resistance determination from positive blood cultures.

    PubMed

    Rödel, Jürgen; Karrasch, Matthias; Edel, Birgit; Stoll, Sylvia; Bohnert, Jürgen; Löffler, Bettina; Saupe, Angela; Pfister, Wolfgang

    2016-03-01

    Rapid diagnosis of bloodstream infections remains a challenge for the early targeting of an antibiotic therapy in sepsis patients. In recent studies, the reliability of the Nanosphere Verigene Gram-positive and Gram-negative blood culture (BC-GP and BC-GN) assays for the rapid identification of bacteria and resistance genes directly from positive BCs has been demonstrated. In this work, we have developed a model to define treatment recommendations by combining Verigene test results with knowledge on local antibiotic resistance patterns of bacterial pathogens. The data of 275 positive BCs were analyzed. Two hundred sixty-three isolates (95.6%) were included in the Verigene assay panels, and 257 isolates (93.5%) were correctly identified. The agreement of the detection of resistance genes with subsequent phenotypic susceptibility testing was 100%. The hospital antibiogram was used to develop a treatment algorithm on the basis of Verigene results that may contribute to a faster patient management. PMID:26712265

  7. Acquisition and preprocessing of LANDSAT data

    NASA Technical Reports Server (NTRS)

    Horn, T. N.; Brown, L. E.; Anonsen, W. H. (Principal Investigator)

    1979-01-01

    The original configuration of the GSFC data acquisition, preprocessing, and transmission subsystem, designed to provide LANDSAT data inputs to the LACIE system at JSC, is described. Enhancements made to support LANDSAT -2, and modifications for LANDSAT -3 are discussed. Registration performance throughout the 3 year period of LACIE operations satisfied the 1 pixel root-mean-square requirements established in 1974, with more than two of every three attempts at data registration proving successful, notwithstanding cosmetic faults or content inadequacies to which the process is inherently susceptible. The cloud/snow rejection rate experienced throughout the last 3 years has approached 50%, as expected in most LANDSAT data use situations.

  8. Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines.

    PubMed

    del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J; Raboso, Mariano

    2015-06-17

    Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation-based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking-to reduce the dimensions of images-and binarization-to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements.

  9. Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines

    PubMed Central

    del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J.; Raboso, Mariano

    2015-01-01

    Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation—based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking—to reduce the dimensions of images—and binarization—to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements. PMID:26091392

  10. Microarray Analysis in Glioblastomas

    PubMed Central

    Bhawe, Kaumudi M.; Aghi, Manish K.

    2016-01-01

    Microarray analysis in glioblastomas is done using either cell lines or patient samples as starting material. A survey of the current literature points to transcript-based microarrays and immunohistochemistry (IHC)-based tissue microarrays as being the preferred methods of choice in cancers of neurological origin. Microarray analysis may be carried out for various purposes including the following: To correlate gene expression signatures of glioblastoma cell lines or tumors with response to chemotherapy (DeLay et al., Clin Cancer Res 18(10):2930–2942, 2012)To correlate gene expression patterns with biological features like proliferation or invasiveness of the glioblastoma cells (Jiang et al., PLoS One 8(6):e66008, 2013)To discover new tumor classificatory systems based on gene expression signature, and to correlate therapeutic response and prognosis with these signatures (Huse et al., Annu Rev Med 64(1):59–70, 2013; Verhaak et al., Cancer Cell 17(1):98–110, 2010) While investigators can sometimes use archived tumor gene expression data available from repositories such as the NCBI Gene Expression Omnibus to answer their questions, new arrays must often be run to adequately answer specific questions. Here, we provide a detailed description of microarray methodologies, how to select the appropriate methodology for a given question, and analytical strategies that can be used. Experimental methodology for protein microarrays is outside the scope of this chapter, but basic sample preparation techniques for transcript-based microarrays are included here. PMID:26113463

  11. Microarray Analysis in Glioblastomas.

    PubMed

    Bhawe, Kaumudi M; Aghi, Manish K

    2016-01-01

    Microarray analysis in glioblastomas is done using either cell lines or patient samples as starting material. A survey of the current literature points to transcript-based microarrays and immunohistochemistry (IHC)-based tissue microarrays as being the preferred methods of choice in cancers of neurological origin. Microarray analysis may be carried out for various purposes including the following: i. To correlate gene expression signatures of glioblastoma cell lines or tumors with response to chemotherapy (DeLay et al., Clin Cancer Res 18(10):2930-2942, 2012). ii. To correlate gene expression patterns with biological features like proliferation or invasiveness of the glioblastoma cells (Jiang et al., PLoS One 8(6):e66008, 2013). iii. To discover new tumor classificatory systems based on gene expression signature, and to correlate therapeutic response and prognosis with these signatures (Huse et al., Annu Rev Med 64(1):59-70, 2013; Verhaak et al., Cancer Cell 17(1):98-110, 2010). While investigators can sometimes use archived tumor gene expression data available from repositories such as the NCBI Gene Expression Omnibus to answer their questions, new arrays must often be run to adequately answer specific questions. Here, we provide a detailed description of microarray methodologies, how to select the appropriate methodology for a given question, and analytical strategies that can be used. Experimental methodology for protein microarrays is outside the scope of this chapter, but basic sample preparation techniques for transcript-based microarrays are included here. PMID:26113463

  12. How to pre-process Raman spectra for reliable and stable models?

    PubMed

    Bocklitz, Thomas; Walter, Angela; Hartmann, Katharina; Rösch, Petra; Popp, Jürgen

    2011-10-17

    Raman spectroscopy in combination with chemometrics is gaining more and more importance for answering biological questions. This results from the fact that Raman spectroscopy is non-invasive, marker-free and water is not corrupting Raman spectra significantly. However, Raman spectra contain despite Raman fingerprint information other contributions like fluorescence background, Gaussian noise, cosmic spikes and other effects dependent on experimental parameters, which have to be removed prior to the analysis, in order to ensure that the analysis is based on the Raman measurements and not on other effects. Here we present a comprehensive study of the influence of pre-processing procedures on statistical models. We will show that a large amount of possible and physically meaningful pre-processing procedures leads to bad results. Furthermore a method based on genetic algorithms (GAs) is introduced, which chooses the spectral pre-processing according to the carried out analysis task without trying all possible pre-processing approaches (grid-search). This was demonstrated for the two most common tasks, namely for a multivariate calibration model and for two classification models. However, the presented approach can be applied in general, if there is a computational measure, which can be optimized. The suggested GA procedure results in models, which have a higher precision and are more stable against corrupting effects.

  13. Study of data preprocess for HJ-1A satellite HSI image

    NASA Astrophysics Data System (ADS)

    Gao, Hail-liang; Gu, Xing-fa; Yu, Tao; He, Hua-ying; Zhu, Ling-ya; Wang, Feng

    2015-08-01

    Hyper Spectral Imager (HSI) is the first Chinese space-borne hyperspectral sensor aboard the HJ-1A satellite. We have developed a data preprocess flow for HSI images, which includes destriping, atmospheric correction and spectral filtering. In this paper, the product level of HSI image was introduced in the beginning, and a destriping method for HSI level 2 images was proposed. Then an atmospheric correction method based on radiative transfer mechanism was summarized to retrieve ground reflectance from HSI image. Furthermore, a new spectral filter method for ground reflectance spectra after atmospheric correction was proposed based on reference ground spectral database. Lastly, a HSI image acquired over Lake Dali in Inner Mongolia was used to evaluate the effect of the preprocess method. The HSI image after destriping was compared with the original HSI image, which shows that the stripe noise has been removed effectively. Both un-smoothed reflectance spectra and smoothed spectra using the preprocess method proposed in this paper are compared with the reflectance spectral derived with the well-known FLAASH method. The results show that the spectra become much smoother after the application of the spectral filtered algorithm. It was also found that the spectra using this new preprocessing method have similar results as that of the FLAASH method.

  14. User microprogrammable processors for high data rate telemetry preprocessing

    NASA Technical Reports Server (NTRS)

    Pugsley, J. H.; Ogrady, E. P.

    1973-01-01

    The use of microprogrammable processors for the preprocessing of high data rate satellite telemetry is investigated. The following topics are discussed along with supporting studies: (1) evaluation of commercial microprogrammable minicomputers for telemetry preprocessing tasks; (2) microinstruction sets for telemetry preprocessing; and (3) the use of multiple minicomputers to achieve high data processing. The simulation of small microprogrammed processors is discussed along with examples of microprogrammed processors.

  15. Microarrays in Glycoproteomics Research

    PubMed Central

    Yue, Tingting; Haab, Brian B.

    2009-01-01

    Microarrays have been extremely useful for investigating binding interactions among diverse types of molecular species, with the main advantage being the ability to examine many interactions using small amount of samples and reagents. Microarrays are increasingly being used to advance research in the field of glycobiology, which is the study of the nature and function and carbohydrates in health and disease. Several types of microarrays are being used in the study of glycans and proteins in glycobiology, including glycan arrays to study the recognition of carbohydrates, lectin arrays to determine carbohydrate expression on purified proteins or on cells, and antibody arrays to examine the variation in particular glycan structures on specific proteins. This review will cover the technology and applications of these types of microarrays, as well as their use for obtaining complementary information on various aspects of glycobiology. PMID:19389548

  16. Functional Protein Microarray Technology

    PubMed Central

    Hu, Shaohui; Xie, Zhi; Qian, Jiang; Blackshaw, Seth; Zhu, Heng

    2010-01-01

    Functional protein microarrays are emerging as a promising new tool for large-scale and high-throughput studies. In this article, we will review their applications in basic proteomics research, where various types of assays have been developed to probe binding activities to other biomolecules, such as proteins, DNA, RNA, small molecules, and glycans. We will also report recent progress of using functional protein microarrays in profiling protein posttranslational modifications, including phosphorylation, ubiquitylation, acetylation, and nitrosylation. Finally, we will discuss potential of functional protein microarrays in biomarker identification and clinical diagnostics. We strongly believe that functional protein microarrays will soon become an indispensible and invaluable tool in proteomics research and systems biology. PMID:20872749

  17. A preprocessing tool for removing artifact from cardiac RR interval recordings using three-dimensional spatial distribution mapping.

    PubMed

    Stapelberg, Nicolas J C; Neumann, David L; Shum, David H K; McConnell, Harry; Hamilton-Craig, Ian

    2016-04-01

    Artifact is common in cardiac RR interval data that is recorded for heart rate variability (HRV) analysis. A novel algorithm for artifact detection and interpolation in RR interval data is described. It is based on spatial distribution mapping of RR interval magnitude and relationships to adjacent values in three dimensions. The characteristics of normal physiological RR intervals and artifact intervals were established using 24-h recordings from 20 technician-assessed human cardiac recordings. The algorithm was incorporated into a preprocessing tool and validated using 30 artificial RR (ARR) interval data files, to which known quantities of artifact (0.5%, 1%, 2%, 3%, 5%, 7%, 10%) were added. The impact of preprocessing ARR files with 1% added artifact was also assessed using 10 time domain and frequency domain HRV metrics. The preprocessing tool was also used to preprocess 69 24-h human cardiac recordings. The tool was able to remove artifact from technician-assessed human cardiac recordings (sensitivity 0.84, SD = 0.09, specificity of 1.00, SD = 0.01) and artificial data files. The removal of artifact had a low impact on time domain and frequency domain HRV metrics (ranging from 0% to 2.5% change in values). This novel preprocessing tool can be used with human 24-h cardiac recordings to remove artifact while minimally affecting physiological data and therefore having a low impact on HRV measures of that data.

  18. DNA Microarray Technology

    SciTech Connect

    WERNER-WASHBURNE, MARGARET; DAVIDSON, GEORGE S.

    2002-01-01

    Collaboration between Sandia National Laboratories and the University of New Mexico Biology Department resulted in the capability to train students in microarray techniques and the interpretation of data from microarray experiments. These studies provide for a better understanding of the role of stationary phase and the gene regulation involved in exit from stationary phase, which may eventually have important clinical implications. Importantly, this research trained numerous students and is the basis for three new Ph.D. projects.

  19. DNA microarrays in neuropsychopharmacology.

    PubMed

    Marcotte, E R; Srivastava, L K; Quirion, R

    2001-08-01

    Recent advances in experimental genomics, coupled with the wealth of sequence information available for a variety of organisms, have the potential to transform the way pharmacological research is performed. At present, high-density DNA microarrays allow researchers to quickly and accurately quantify gene-expression changes in a massively parallel manner. Although now well established in other biomedical fields, such as cancer and genetics research, DNA microarrays have only recently begun to make significant inroads into pharmacology. To date, the major focus in this field has been on the general application of DNA microarrays to toxicology and drug discovery and design. This review summarizes the major microarray findings of relevance to neuropsychopharmacology, as a prelude to the design and analysis of future basic and clinical microarray experiments. The ability of DNA microarrays to monitor gene expression simultaneously in a large-scale format is helping to usher in a post-genomic age, where simple constructs about the role of nature versus nurture are being replaced by a functional understanding of gene expression in living organisms. PMID:11479006

  20. Preprocessing and Analysis of Digitized ECGs

    NASA Astrophysics Data System (ADS)

    Villalpando, L. E. Piña; Kurmyshev, E.; Ramírez, S. Luna; Leal, L. Delgado

    2008-08-01

    In this work we propose a methodology and programs in MatlabTM that perform the preprocessing and analysis of the derivative D1 of ECGs. The program makes the correction to isoelectric line for each beat, calculates the average cardiac frequency and its standard deviation, generates a file of amplitude of P, Q and T waves, as well as the segments and intervals important of each beat. Software makes the normalization of beats to a standard rate of 80 beats per minute, the superposition of beats is done centering R waves, before and after normalizing the amplitude of each beat. The data and graphics provide relevant information to the doctor for diagnosis. In addition, some results are displayed similar to those presented by a Holter recording.

  1. Measurement data preprocessing in a radar-based system for monitoring of human movements

    NASA Astrophysics Data System (ADS)

    Morawski, Roman Z.; Miȩkina, Andrzej; Bajurko, Paweł R.

    2015-02-01

    The importance of research on new technologies that could be employed in care services for elderly people is highlighted. The need to examine the applicability of various sensor systems for non-invasive monitoring of the movements and vital bodily functions, such as heart beat or breathing rhythm, of elderly persons in their home environment is justified. An extensive overview of the literature concerning existing monitoring techniques is provided. A technological potential behind radar sensors is indicated. A new class of algorithms for preprocessing of measurement data from impulse radar sensors, when applied for elderly people monitoring, is proposed. Preliminary results of numerical experiments performed on those algorithms are demonstrated.

  2. Data preprocessing methods of FT-NIR spectral data for the classification cooking oil

    NASA Astrophysics Data System (ADS)

    Ruah, Mas Ezatul Nadia Mohd; Rasaruddin, Nor Fazila; Fong, Sim Siong; Jaafar, Mohd Zuli

    2014-12-01

    This recent work describes the data pre-processing method of FT-NIR spectroscopy datasets of cooking oil and its quality parameters with chemometrics method. Pre-processing of near-infrared (NIR) spectral data has become an integral part of chemometrics modelling. Hence, this work is dedicated to investigate the utility and effectiveness of pre-processing algorithms namely row scaling, column scaling and single scaling process with Standard Normal Variate (SNV). The combinations of these scaling methods have impact on exploratory analysis and classification via Principle Component Analysis plot (PCA). The samples were divided into palm oil and non-palm cooking oil. The classification model was build using FT-NIR cooking oil spectra datasets in absorbance mode at the range of 4000cm-1-14000cm-1. Savitzky Golay derivative was applied before developing the classification model. Then, the data was separated into two sets which were training set and test set by using Duplex method. The number of each class was kept equal to 2/3 of the class that has the minimum number of sample. Then, the sample was employed t-statistic as variable selection method in order to select which variable is significant towards the classification models. The evaluation of data pre-processing were looking at value of modified silhouette width (mSW), PCA and also Percentage Correctly Classified (%CC). The results show that different data processing strategies resulting to substantial amount of model performances quality. The effects of several data pre-processing i.e. row scaling, column standardisation and single scaling process with Standard Normal Variate indicated by mSW and %CC. At two PCs model, all five classifier gave high %CC except Quadratic Distance Analysis.

  3. Identification of Pou5f1, Sox2, and Nanog downstream target genes with statistical confidence by applying a novel algorithm to time course microarray and genome-wide chromatin immunoprecipitation data

    PubMed Central

    Sharov, Alexei A; Masui, Shinji; Sharova, Lioudmila V; Piao, Yulan; Aiba, Kazuhiro; Matoba, Ryo; Xin, Li; Niwa, Hitoshi; Ko, Minoru SH

    2008-01-01

    Background Target genes of a transcription factor (TF) Pou5f1 (Oct3/4 or Oct4), which is essential for pluripotency maintenance and self-renewal of embryonic stem (ES) cells, have previously been identified based on their response to Pou5f1 manipulation and occurrence of Chromatin-immunoprecipitation (ChIP)-binding sites in promoters. However, many responding genes with binding sites may not be direct targets because response may be mediated by other genes and ChIP-binding site may not be functional in terms of transcription regulation. Results To reduce the number of false positives, we propose to separate responding genes into groups according to direction, magnitude, and time of response, and to apply the false discovery rate (FDR) criterion to each group individually. Using this novel algorithm with stringent statistical criteria (FDR < 0.2) to a compendium of published and new microarray data (3, 6, 12, and 24 hr after Pou5f1 suppression) and published ChIP data, we identified 420 tentative target genes (TTGs) for Pou5f1. The majority of TTGs (372) were down-regulated after Pou5f1 suppression, indicating that the Pou5f1 functions as an activator of gene expression when it binds to promoters. Interestingly, many activated genes are potent suppressors of transcription, which include polycomb genes, zinc finger TFs, chromatin remodeling factors, and suppressors of signaling. Similar analysis showed that Sox2 and Nanog also function mostly as transcription activators in cooperation with Pou5f1. Conclusion We have identified the most reliable sets of direct target genes for key pluripotency genes – Pou5f1, Sox2, and Nanog, and found that they predominantly function as activators of downstream gene expression. Thus, most genes related to cell differentiation are suppressed indirectly. PMID:18522731

  4. Data Analysis Strategies for Protein Microarrays

    PubMed Central

    Díez, Paula; Dasilva, Noelia; González-González, María; Matarraz, Sergio; Casado-Vela, Juan; Orfao, Alberto; Fuentes, Manuel

    2012-01-01

    Microarrays constitute a new platform which allows the discovery and characterization of proteins. According to different features, such as content, surface or detection system, there are many types of protein microarrays which can be applied for the identification of disease biomarkers and the characterization of protein expression patterns. However, the analysis and interpretation of the amount of information generated by microarrays remain a challenge. Further data analysis strategies are essential to obtain representative and reproducible results. Therefore, the experimental design is key, since the number of samples and dyes, among others aspects, would define the appropriate analysis method to be used. In this sense, several algorithms have been proposed so far to overcome analytical difficulties derived from fluorescence overlapping and/or background noise. Each kind of microarray is developed to fulfill a specific purpose. Therefore, the selection of appropriate analytical and data analysis strategies is crucial to achieve successful biological conclusions. In the present review, we focus on current algorithms and main strategies for data interpretation.

  5. Data Analysis Strategies for Protein Microarrays

    PubMed Central

    Díez, Paula; Dasilva, Noelia; González-González, María; Matarraz, Sergio; Casado-Vela, Juan; Orfao, Alberto; Fuentes, Manuel

    2012-01-01

    Microarrays constitute a new platform which allows the discovery and characterization of proteins. According to different features, such as content, surface or detection system, there are many types of protein microarrays which can be applied for the identification of disease biomarkers and the characterization of protein expression patterns. However, the analysis and interpretation of the amount of information generated by microarrays remain a challenge. Further data analysis strategies are essential to obtain representative and reproducible results. Therefore, the experimental design is key, since the number of samples and dyes, among others aspects, would define the appropriate analysis method to be used. In this sense, several algorithms have been proposed so far to overcome analytical difficulties derived from fluorescence overlapping and/or background noise. Each kind of microarray is developed to fulfill a specific purpose. Therefore, the selection of appropriate analytical and data analysis strategies is crucial to achieve successful biological conclusions. In the present review, we focus on current algorithms and main strategies for data interpretation. PMID:27605336

  6. SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read

    PubMed Central

    2010-01-01

    Background High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. Results SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. Conclusions SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts. PMID:20089148

  7. Nanotechnologies in protein microarrays.

    PubMed

    Krizkova, Sona; Heger, Zbynek; Zalewska, Marta; Moulick, Amitava; Adam, Vojtech; Kizek, Rene

    2015-01-01

    Protein microarray technology became an important research tool for study and detection of proteins, protein-protein interactions and a number of other applications. The utilization of nanoparticle-based materials and nanotechnology-based techniques for immobilization allows us not only to extend the surface for biomolecule immobilization resulting in enhanced substrate binding properties, decreased background signals and enhanced reporter systems for more sensitive assays. Generally in contemporarily developed microarray systems, multiple nanotechnology-based techniques are combined. In this review, applications of nanoparticles and nanotechnologies in creating protein microarrays, proteins immobilization and detection are summarized. We anticipate that advanced nanotechnologies can be exploited to expand promising fields of proteins identification, monitoring of protein-protein or drug-protein interactions, or proteins structures. PMID:26039143

  8. Base resolution methylome profiling: considerations in platform selection, data preprocessing and analysis

    PubMed Central

    Sun, Zhifu; Cunningham, Julie; Slager, Susan; Kocher, Jean-Pierre

    2015-01-01

    Bisulfite treatment-based methylation microarray (mainly Illumina 450K Infinium array) and next-generation sequencing (reduced representation bisulfite sequencing, Agilent SureSelect Human Methyl-Seq, NimbleGen SeqCap Epi CpGiant or whole-genome bisulfite sequencing) are commonly used for base resolution DNA methylome research. Although multiple tools and methods have been developed and used for the data preprocessing and analysis, confusions remains for these platforms including how and whether the 450k array should be normalized; which platform should be used to better fit researchers’ needs; and which statistical models would be more appropriate for differential methylation analysis. This review presents the commonly used platforms and compares the pros and cons of each in methylome profiling. We then discuss approaches to study design, data normalization, bias correction and model selection for differentially methylated individual CpGs and regions. PMID:26366945

  9. Base resolution methylome profiling: considerations in platform selection, data preprocessing and analysis.

    PubMed

    Sun, Zhifu; Cunningham, Julie; Slager, Susan; Kocher, Jean-Pierre

    2015-08-01

    Bisulfite treatment-based methylation microarray (mainly Illumina 450K Infinium array) and next-generation sequencing (reduced representation bisulfite sequencing, Agilent SureSelect Human Methyl-Seq, NimbleGen SeqCap Epi CpGiant or whole-genome bisulfite sequencing) are commonly used for base resolution DNA methylome research. Although multiple tools and methods have been developed and used for the data preprocessing and analysis, confusions remains for these platforms including how and whether the 450k array should be normalized; which platform should be used to better fit researchers' needs; and which statistical models would be more appropriate for differential methylation analysis. This review presents the commonly used platforms and compares the pros and cons of each in methylome profiling. We then discuss approaches to study design, data normalization, bias correction and model selection for differentially methylated individual CpGs and regions.

  10. A perceptual preprocess method for 3D-HEVC

    NASA Astrophysics Data System (ADS)

    Shi, Yawen; Wang, Yongfang; Wang, Yubing

    2015-08-01

    A perceptual preprocessing method for 3D-HEVC coding is proposed in the paper. Firstly we proposed a new JND model, which accounts for luminance contrast masking effect, spatial masking effect, and temporal masking effect, saliency characteristic as well as depth information. We utilize spectral residual approach to obtain the saliency map and built a visual saliency factor based on saliency map. In order to distinguish the sensitivity of objects in different depth. We segment each texture frame into foreground and background by a automatic threshold selection algorithm using corresponding depth information, and then built a depth weighting factor. A JND modulation factor is built with a linear combined with visual saliency factor and depth weighting factor to adjust the JND threshold. Then, we applied the proposed JND model to 3D-HEVC for residual filtering and distortion coefficient processing. The filtering process is that the residual value will be set to zero if the JND threshold is greater than residual value, or directly subtract the JND threshold from residual value if JND threshold is less than residual value. Experiment results demonstrate that the proposed method can achieve average bit rate reduction of 15.11%, compared to the original coding scheme with HTM12.1, while maintains the same subjective quality.

  11. AMIC@: All MIcroarray Clusterings @ once.

    PubMed

    Geraci, Filippo; Pellegrini, Marco; Renda, M Elena

    2008-07-01

    The AMIC@ Web Server offers a light-weight multi-method clustering engine for microarray gene-expression data. AMIC@ is a highly interactive tool that stresses user-friendliness and robustness by adopting AJAX technology, thus allowing an effective interleaved execution of different clustering algorithms and inspection of results. Among the salient features AMIC@ offers, there are: (i) automatic file format detection, (ii) suggestions on the number of clusters using a variant of the stability-based method of Tibshirani et al. (iii) intuitive visual inspection of the data via heatmaps and (iv) measurements of the clustering quality using cluster homogeneity. Large data sets can be processed efficiently by selecting algorithms (such as FPF-SB and k-Boost), specifically designed for this purpose. In case of very large data sets, the user can opt for a batch-mode use of the system by means of the Clustering wizard that runs all algorithms at once and delivers the results via email. AMIC@ is freely available and open to all users with no login requirement at the following URL http://bioalgo.iit.cnr.it/amica.

  12. Enhanced bone structural analysis through pQCT image preprocessing.

    PubMed

    Cervinka, T; Hyttinen, J; Sievanen, H

    2010-05-01

    Several factors, including preprocessing of the image, can affect the reliability of pQCT-measured bone traits, such as cortical area and trabecular density. Using repeated scans of four different liquid phantoms and repeated in vivo scans of distal tibiae from 25 subjects, the performance of two novel preprocessing methods, based on the down-sampling of grayscale intensity histogram and the statistical approximation of image data, was compared to 3 x 3 and 5 x 5 median filtering. According to phantom measurements, the signal to noise ratio in the raw pQCT images (XCT 3000) was low ( approximately 20dB) which posed a challenge for preprocessing. Concerning the cortical analysis, the reliability coefficient (R) was 67% for the raw image and increased to 94-97% after preprocessing without apparent preference for any method. Concerning the trabecular density, the R-values were already high ( approximately 99%) in the raw images leaving virtually no room for improvement. However, some coarse structural patterns could be seen in the preprocessed images in contrast to a disperse distribution of density levels in the raw image. In conclusion, preprocessing cannot suppress the high noise level to the extent that the analysis of mean trabecular density is essentially improved, whereas preprocessing can enhance cortical bone analysis and also facilitate coarse structural analyses of the trabecular region.

  13. Software for pre-processing Illumina next-generation sequencing short read sequences

    PubMed Central

    2014-01-01

    Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference

  14. Microarrays for Undergraduate Classes

    ERIC Educational Resources Information Center

    Hancock, Dale; Nguyen, Lisa L.; Denyer, Gareth S.; Johnston, Jill M.

    2006-01-01

    A microarray experiment is presented that, in six laboratory sessions, takes undergraduate students from the tissue sample right through to data analysis. The model chosen, the murine erythroleukemia cell line, can be easily cultured in sufficient quantities for class use. Large changes in gene expression can be induced in these cells by…

  15. Enhancing Interdisciplinary Mathematics and Biology Education: A Microarray Data Analysis Course Bridging These Disciplines

    PubMed Central

    Evans, Irene M.

    2010-01-01

    BIO2010 put forth the goal of improving the mathematical educational background of biology students. The analysis and interpretation of microarray high-dimensional data can be very challenging and is best done by a statistician and a biologist working and teaching in a collaborative manner. We set up such a collaboration and designed a course on microarray data analysis. We started using Genome Consortium for Active Teaching (GCAT) materials and Microarray Genome and Clustering Tool software and added R statistical software along with Bioconductor packages. In response to student feedback, one microarray data set was fully analyzed in class, starting from preprocessing to gene discovery to pathway analysis using the latter software. A class project was to conduct a similar analysis where students analyzed their own data or data from a published journal paper. This exercise showed the impact that filtering, preprocessing, and different normalization methods had on gene inclusion in the final data set. We conclude that this course achieved its goals to equip students with skills to analyze data from a microarray experiment. We offer our insight about collaborative teaching as well as how other faculty might design and implement a similar interdisciplinary course. PMID:20810954

  16. Analysis of microarray experiments of gene expression profiling

    PubMed Central

    Tarca, Adi L.; Romero, Roberto; Draghici, Sorin

    2008-01-01

    The study of gene expression profiling of cells and tissue has become a major tool for discovery in medicine. Microarray experiments allow description of genome-wide expression changes in health and disease. The results of such experiments are expected to change the methods employed in the diagnosis and prognosis of disease in obstetrics and gynecology. Moreover, an unbiased and systematic study of gene expression profiling should allow the establishment of a new taxonomy of disease for obstetric and gynecologic syndromes. Thus, a new era is emerging in which reproductive processes and disorders could be characterized using molecular tools and fingerprinting. The design, analysis, and interpretation of microarray experiments require specialized knowledge that is not part of the standard curriculum of our discipline. This article describes the types of studies that can be conducted with microarray experiments (class comparison, class prediction, class discovery). We discuss key issues pertaining to experimental design, data preprocessing, and gene selection methods. Common types of data representation are illustrated. Potential pitfalls in the interpretation of microarray experiments, as well as the strengths and limitations of this technology, are highlighted. This article is intended to assist clinicians in appraising the quality of the scientific evidence now reported in the obstetric and gynecologic literature. PMID:16890548

  17. A survey of visual preprocessing and shape representation techniques

    NASA Technical Reports Server (NTRS)

    Olshausen, Bruno A.

    1988-01-01

    Many recent theories and methods proposed for visual preprocessing and shape representation are summarized. The survey brings together research from the fields of biology, psychology, computer science, electrical engineering, and most recently, neural networks. It was motivated by the need to preprocess images for a sparse distributed memory (SDM), but the techniques presented may also prove useful for applying other associative memories to visual pattern recognition. The material of this survey is divided into three sections: an overview of biological visual processing; methods of preprocessing (extracting parts of shape, texture, motion, and depth); and shape representation and recognition (form invariance, primitives and structural descriptions, and theories of attention).

  18. Comparing Binaural Pre-processing Strategies I

    PubMed Central

    Krawczyk-Becker, Martin; Marquardt, Daniel; Völker, Christoph; Hu, Hongmei; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Ernst, Stephan M. A.; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias

    2015-01-01

    In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios. PMID:26721920

  19. Characterizing the continuously acquired cardiovascular time series during hemodialysis, using median hybrid filter preprocessing noise reduction

    PubMed Central

    Wilson, Scott; Bowyer, Andrea; Harrap, Stephen B

    2015-01-01

    The clinical characterization of cardiovascular dynamics during hemodialysis (HD) has important pathophysiological implications in terms of diagnostic, cardiovascular risk assessment, and treatment efficacy perspectives. Currently the diagnosis of significant intradialytic systolic blood pressure (SBP) changes among HD patients is imprecise and opportunistic, reliant upon the presence of hypotensive symptoms in conjunction with coincident but isolated noninvasive brachial cuff blood pressure (NIBP) readings. Considering hemodynamic variables as a time series makes a continuous recording approach more desirable than intermittent measures; however, in the clinical environment, the data signal is susceptible to corruption due to both impulsive and Gaussian-type noise. Signal preprocessing is an attractive solution to this problem. Prospectively collected continuous noninvasive SBP data over the short-break intradialytic period in ten patients was preprocessed using a novel median hybrid filter (MHF) algorithm and compared with 50 time-coincident pairs of intradialytic NIBP measures from routine HD practice. The median hybrid preprocessing technique for continuously acquired cardiovascular data yielded a dynamic regression without significant noise and artifact, suitable for high-level profiling of time-dependent SBP behavior. Signal accuracy is highly comparable with standard NIBP measurement, with the added clinical benefit of dynamic real-time hemodynamic information. PMID:25678827

  20. Flexibility and utility of pre-processing methods in converting STXM setups for ptychography - Final Paper

    SciTech Connect

    Fromm, Catherine

    2015-08-20

    Ptychography is an advanced diffraction based imaging technique that can achieve resolution of 5nm and below. It is done by scanning a sample through a beam of focused x-rays using discrete yet overlapping scan steps. Scattering data is collected on a CCD camera, and the phase of the scattered light is reconstructed with sophisticated iterative algorithms. Because the experimental setup is similar, ptychography setups can be created by retrofitting existing STXM beam lines with new hardware. The other challenge comes in the reconstruction of the collected scattering images. Scattering data must be adjusted and packaged with experimental parameters to calibrate the reconstruction software. The necessary pre-processing of data prior to reconstruction is unique to each beamline setup, and even the optical alignments used on that particular day. Pre-processing software must be developed to be flexible and efficient in order to allow experiments appropriate control and freedom in the analysis of their hard-won data. This paper will describe the implementation of pre-processing software which successfully connects data collection steps to reconstruction steps, letting the user accomplish accurate and reliable ptychography.

  1. Characterizing the continuously acquired cardiovascular time series during hemodialysis, using median hybrid filter preprocessing noise reduction.

    PubMed

    Wilson, Scott; Bowyer, Andrea; Harrap, Stephen B

    2015-01-01

    The clinical characterization of cardiovascular dynamics during hemodialysis (HD) has important pathophysiological implications in terms of diagnostic, cardiovascular risk assessment, and treatment efficacy perspectives. Currently the diagnosis of significant intradialytic systolic blood pressure (SBP) changes among HD patients is imprecise and opportunistic, reliant upon the presence of hypotensive symptoms in conjunction with coincident but isolated noninvasive brachial cuff blood pressure (NIBP) readings. Considering hemodynamic variables as a time series makes a continuous recording approach more desirable than intermittent measures; however, in the clinical environment, the data signal is susceptible to corruption due to both impulsive and Gaussian-type noise. Signal preprocessing is an attractive solution to this problem. Prospectively collected continuous noninvasive SBP data over the short-break intradialytic period in ten patients was preprocessed using a novel median hybrid filter (MHF) algorithm and compared with 50 time-coincident pairs of intradialytic NIBP measures from routine HD practice. The median hybrid preprocessing technique for continuously acquired cardiovascular data yielded a dynamic regression without significant noise and artifact, suitable for high-level profiling of time-dependent SBP behavior. Signal accuracy is highly comparable with standard NIBP measurement, with the added clinical benefit of dynamic real-time hemodynamic information.

  2. Multi-channel high-speed CMOS image acquisition and pre-processing system

    NASA Astrophysics Data System (ADS)

    Sun, Chun-feng; Yuan, Feng; Ding, Zhen-liang

    2008-10-01

    A new multi-channel high-speed CMOS image acquisition and pre-processing system is designed to realize the image acquisition, data transmission, time sequential control and simple image processing by high-speed CMOS image sensor. The modular structure design, LVDS and ping-pong cache techniques used during the designed image data acquisition sub-system design ensure the real-time data acquisition and transmission. Furthermore, a new histogram equalization algorithm of adaptive threshold value based on the reassignment of redundant gray level is incorporated in the image preprocessing module of FPGA. The iterative method is used in the course of setting threshold value, and a redundant graylevel is redistributed rationally according to the proportional gray level interval. The over-enhancement of background is restrained and the feasibility of mergence of foreground details is reduced. The experimental certificates show that the system can be used to realize the image acquisition, transmission, memory and pre-processing to 590MPixels/s data size, and make for the design and realization of the subsequent system.

  3. Solid Earth ARISTOTELES mission data preprocessing simulation of gravity gradiometer

    NASA Astrophysics Data System (ADS)

    Avanzi, G.; Stolfa, R.; Versini, B.

    Data preprocessing of the ARISTOTELES mission, which measures the Earth gravity gradient in a near polar orbit, was studied. The mission measures the gravity field at sea level through indirect measurements performed on the orbit, so that the evaluation steps consist in processing data from GRADIO accelerometer measurements. Due to the physical phenomena involved in the data collection experiment, it is possible to isolate at an initial stage a preprocessing of the gradiometer data based only on GRADIO measurements and not needing a detailed knowledge of the attitude and attitude rate sensors output. This preprocessing produces intermediate quantities used in future stages of the reduction. Software was designed and run to evaluate for this level of data reduction the achievable accuracy as a function of knowledge on instrument and satellite status parameters. The architecture of this element of preprocessing is described.

  4. Automated preprocessing of spaceborne SAR data

    NASA Technical Reports Server (NTRS)

    Curlander, J. C.; Wu, C.; Pang, A.

    1982-01-01

    An efficient algorithm has been developed for estimation of the echo phase delay in spaceborne synthetic aperture radar (SAR) data. This algorithm utilizes the spacecraft ephemeris data and the radar echo data to produce estimates of two parameters: (1) the centroid of the Doppler frequency spectrum f(d) and (2) the Doppler frequency rate. Results are presented from tests conducted with Seasat SAR data. The test data indicates that estimation accuracies of 3 Hz for f(d) and 0.3 Hz/sec for the Doppler frequency rate are attainable. The clutterlock and autofocus techniques used for estimation of f(d) and the Doppler frequency rate, respectively are discussed and the algorithm developed for optimal implementation of these techniques is presented.

  5. Transform preprocessing for neural networks for object recogniition and localization with sonar

    NASA Astrophysics Data System (ADS)

    Barshan, Billur; Ayrulu, Birsel

    2003-04-01

    We investigate the pre-processing of sonar signals prior to using neural networks for robust differentiation of commonly encountered features in indoor environments. Amplitude and time-of-flight measurement patterns acquired from a real sonar system are pre-processed using various techniques including wavelet transforms, Fourier and fractional Fourier transforms, and Kohonen's self-organizing feature map. Modular and non-modular neural network structures trained with the back-propagation and generating-shrinking algorithms are used to incorporate learning in the identification of parameter relations for target primitives. Networks trained with the generating-shrinking algorithm demonstrate better generalization and interpolation capability and faster convergence rate. The use of neural networks trained with the back-propagation algorithm, usually with fractional Fourier transform or wavelet pre-processing results in near perfect differentiation, around 85% correct range estimation and around 95% correct azimuth estimation, which would be satisfactory in a wide range of applications. Neural networks can differentiate more targets, employing only a single sensor node, with a higher correct differentiation percentage than achieved with previously reported methods employing multiple sensor nodes. The success of the neural network approach shows that the sonar signals do contain sufficient information to differentiate a considerable number of target types, but the previously reported methods are unable to resolve this identifying information. This work can find application in areas where recognition of patterns hidden in sonar signals is required. Some examples are system control based on acoustic signal detection and identification, map building, navigation, obstacle avoidance, and target-tracking applications for mobile robots and other intelligent systems.

  6. Microarrays under the microscope

    PubMed Central

    Wildsmith, S E; Elcock, F J

    2001-01-01

    Microarray technology is a rapidly advancing area, which is gaining popularity in many biological disciplines from drug target identification to predictive toxicology. Over the past few years, there has been a dramatic increase in the number of methods and techniques available for carrying out this form of gene expression analysis. The techniques and associated peripherals, such as slide types, deposition methods, robotics, and scanning equipment, are undergoing constant improvement, helping to drive the technology forward in terms of robustness and ease of use. These rapid developments, combined with the number of options available and the associated hyperbole, can prove daunting for the new user. This review aims to guide the researcher through the various steps of conducting microarray experiments, from initial strategy to analysing the data, with critical examination of the benefits and disadvantages along the way. PMID:11212888

  7. Navigating public microarray databases.

    PubMed

    Penkett, Christopher J; Bähler, Jürg

    2004-01-01

    With the ever-escalating amount of data being produced by genome-wide microarray studies, it is of increasing importance that these data are captured in public databases so that researchers can use this information to complement and enhance their own studies. Many groups have set up databases of expression data, ranging from large repositories, which are designed to comprehensively capture all published data, through to more specialized databases. The public repositories, such as ArrayExpress at the European Bioinformatics Institute contain complete datasets in raw format in addition to processed data, whilst the specialist databases tend to provide downstream analysis of normalized data from more focused studies and data sources. Here we provide a guide to the use of these public microarray resources.

  8. Improving tissue segmentation of human brain MRI through preprocessing by the Gegenbauer reconstruction method.

    PubMed

    Archibald, Rick; Chen, Kewei; Gelb, Anne; Renaut, Rosemary

    2003-09-01

    The Gegenbauer image reconstruction method, previously shown to improve the quality of magnetic resonance images, is utilized in this study as a segmentation preprocessing step. It is demonstrated that, for all simulated and real magnetic resonance images used in this study, the Gegenbauer reconstruction method improves the accuracy of segmentation. Although it is more desirable to use the k-space data for the Gegenbauer reconstruction method, only information acquired from MR images is necessary for the reconstruction, making the procedure completely self-contained and viable for all human brain segmentation algorithms. PMID:14527609

  9. Comparing Binaural Pre-processing Strategies III: Speech Intelligibility of Normal-Hearing and Hearing-Impaired Listeners.

    PubMed

    Völker, Christoph; Warzybok, Anna; Ernst, Stephan M A

    2015-01-01

    A comprehensive evaluation of eight signal pre-processing strategies, including directional microphones, coherence filters, single-channel noise reduction, binaural beamformers, and their combinations, was undertaken with normal-hearing (NH) and hearing-impaired (HI) listeners. Speech reception thresholds (SRTs) were measured in three noise scenarios (multitalker babble, cafeteria noise, and single competing talker). Predictions of three common instrumental measures were compared with the general perceptual benefit caused by the algorithms. The individual SRTs measured without pre-processing and individual benefits were objectively estimated using the binaural speech intelligibility model. Ten listeners with NH and 12 HI listeners participated. The participants varied in age and pure-tone threshold levels. Although HI listeners required a better signal-to-noise ratio to obtain 50% intelligibility than listeners with NH, no differences in SRT benefit from the different algorithms were found between the two groups. With the exception of single-channel noise reduction, all algorithms showed an improvement in SRT of between 2.1 dB (in cafeteria noise) and 4.8 dB (in single competing talker condition). Model predictions with binaural speech intelligibility model explained 83% of the measured variance of the individual SRTs in the no pre-processing condition. Regarding the benefit from the algorithms, the instrumental measures were not able to predict the perceptual data in all tested noise conditions. The comparable benefit observed for both groups suggests a possible application of noise reduction schemes for listeners with different hearing status. Although the model can predict the individual SRTs without pre-processing, further development is necessary to predict the benefits obtained from the algorithms at an individual level.

  10. Spectrum Preprocessing in the OPAD System

    NASA Technical Reports Server (NTRS)

    Katsinis, Constantine

    1998-01-01

    conversion of the software into the real-time, production version. Specifically, a section of the software devoted to the preprocessing of the spectra has been converted into the C language. In addition, parts of this software which may be improved have been identified, and recommendations are given to improve the functionality and ease of use of the new version.

  11. Design of radial basis function neural network classifier realized with the aid of data preprocessing techniques: design and analysis

    NASA Astrophysics Data System (ADS)

    Oh, Sung-Kwun; Kim, Wook-Dong; Pedrycz, Witold

    2016-05-01

    In this paper, we introduce a new architecture of optimized Radial Basis Function neural network classifier developed with the aid of fuzzy clustering and data preprocessing techniques and discuss its comprehensive design methodology. In the preprocessing part, the Linear Discriminant Analysis (LDA) or Principal Component Analysis (PCA) algorithm forms a front end of the network. The transformed data produced here are used as the inputs of the network. In the premise part, the Fuzzy C-Means (FCM) algorithm determines the receptive field associated with the condition part of the rules. The connection weights of the classifier are of functional nature and come as polynomial functions forming the consequent part. The Particle Swarm Optimization algorithm optimizes a number of essential parameters needed to improve the accuracy of the classifier. Those optimized parameters include the type of data preprocessing, the dimensionality of the feature vectors produced by the LDA (or PCA), the number of clusters (rules), the fuzzification coefficient used in the FCM algorithm and the orders of the polynomials of networks. The performance of the proposed classifier is reported for several benchmarking data-sets and is compared with the performance of other classifiers reported in the previous studies.

  12. Comparing Binaural Pre-processing Strategies II

    PubMed Central

    Hu, Hongmei; Krawczyk-Becker, Martin; Marquardt, Daniel; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Bomke, Katrin; Plotz, Karsten; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias

    2015-01-01

    Several binaural audio signal enhancement algorithms were evaluated with respect to their potential to improve speech intelligibility in noise for users of bilateral cochlear implants (CIs). 50% speech reception thresholds (SRT50) were assessed using an adaptive procedure in three distinct, realistic noise scenarios. All scenarios were highly nonstationary, complex, and included a significant amount of reverberation. Other aspects, such as the perfectly frontal target position, were idealized laboratory settings, allowing the algorithms to perform better than in corresponding real-world conditions. Eight bilaterally implanted CI users, wearing devices from three manufacturers, participated in the study. In all noise conditions, a substantial improvement in SRT50 compared to the unprocessed signal was observed for most of the algorithms tested, with the largest improvements generally provided by binaural minimum variance distortionless response (MVDR) beamforming algorithms. The largest overall improvement in speech intelligibility was achieved by an adaptive binaural MVDR in a spatially separated, single competing talker noise scenario. A no-pre-processing condition and adaptive differential microphones without a binaural link served as the two baseline conditions. SRT50 improvements provided by the binaural MVDR beamformers surpassed the performance of the adaptive differential microphones in most cases. Speech intelligibility improvements predicted by instrumental measures were shown to account for some but not all aspects of the perceptually obtained SRT50 improvements measured in bilaterally implanted CI users. PMID:26721921

  13. Severe acute respiratory syndrome diagnostics using a coronavirus protein microarray.

    PubMed

    Zhu, Heng; Hu, Shaohui; Jona, Ghil; Zhu, Xiaowei; Kreiswirth, Nate; Willey, Barbara M; Mazzulli, Tony; Liu, Guozhen; Song, Qifeng; Chen, Peng; Cameron, Mark; Tyler, Andrea; Wang, Jian; Wen, Jie; Chen, Weijun; Compton, Susan; Snyder, Michael

    2006-03-14

    To monitor severe acute respiratory syndrome (SARS) infection, a coronavirus protein microarray that harbors proteins from SARS coronavirus (SARS-CoV) and five additional coronaviruses was constructed. These microarrays were used to screen approximately 400 Canadian sera from the SARS outbreak, including samples from confirmed SARS-CoV cases, respiratory illness patients, and healthcare professionals. A computer algorithm that uses multiple classifiers to predict samples from SARS patients was developed and used to predict 206 sera from Chinese fever patients. The test assigned patients into two distinct groups: those with antibodies to SARS-CoV and those without. The microarray also identified patients with sera reactive against other coronavirus proteins. Our results correlated well with an indirect immunofluorescence test and demonstrated that viral infection can be monitored for many months after infection. We show that protein microarrays can serve as a rapid, sensitive, and simple tool for large-scale identification of viral-specific antibodies in sera.

  14. EARLINET Single Calculus Chain - technical - Part 1: Pre-processing of raw lidar data

    NASA Astrophysics Data System (ADS)

    D'Amico, Giuseppe; Amodeo, Aldo; Mattis, Ina; Freudenthaler, Volker; Pappalardo, Gelsomina

    2016-02-01

    In this paper we describe an automatic tool for the pre-processing of aerosol lidar data called ELPP (EARLINET Lidar Pre-Processor). It is one of two calculus modules of the EARLINET Single Calculus Chain (SCC), the automatic tool for the analysis of EARLINET data. ELPP is an open source module that executes instrumental corrections and data handling of the raw lidar signals, making the lidar data ready to be processed by the optical retrieval algorithms. According to the specific lidar configuration, ELPP automatically performs dead-time correction, atmospheric and electronic background subtraction, gluing of lidar signals, and trigger-delay correction. Moreover, the signal-to-noise ratio of the pre-processed signals can be improved by means of configurable time integration of the raw signals and/or spatial smoothing. ELPP delivers the statistical uncertainties of the final products by means of error propagation or Monte Carlo simulations. During the development of ELPP, particular attention has been payed to make the tool flexible enough to handle all lidar configurations currently used within the EARLINET community. Moreover, it has been designed in a modular way to allow an easy extension to lidar configurations not yet implemented. The primary goal of ELPP is to enable the application of quality-assured procedures in the lidar data analysis starting from the raw lidar data. This provides the added value of full traceability of each delivered lidar product. Several tests have been performed to check the proper functioning of ELPP. The whole SCC has been tested with the same synthetic data sets, which were used for the EARLINET algorithm inter-comparison exercise. ELPP has been successfully employed for the automatic near-real-time pre-processing of the raw lidar data measured during several EARLINET inter-comparison campaigns as well as during intense field campaigns.

  15. EARLINET Single Calculus Chain - technical - Part 1: Pre-processing of raw lidar data

    NASA Astrophysics Data System (ADS)

    D'Amico, G.; Amodeo, A.; Mattis, I.; Freudenthaler, V.; Pappalardo, G.

    2015-10-01

    In this paper we describe an automatic tool for the pre-processing of lidar data called ELPP (EARLINET Lidar Pre-Processor). It is one of two calculus modules of the EARLINET Single Calculus Chain (SCC), the automatic tool for the analysis of EARLINET data. The ELPP is an open source module that executes instrumental corrections and data handling of the raw lidar signals, making the lidar data ready to be processed by the optical retrieval algorithms. According to the specific lidar configuration, the ELPP automatically performs dead-time correction, atmospheric and electronic background subtraction, gluing of lidar signals, and trigger-delay correction. Moreover, the signal-to-noise ratio of the pre-processed signals can be improved by means of configurable time integration of the raw signals and/or spatial smoothing. The ELPP delivers the statistical uncertainties of the final products by means of error propagation or Monte Carlo simulations. During the development of the ELPP module, particular attention has been payed to make the tool flexible enough to handle all lidar configurations currently used within the EARLINET community. Moreover, it has been designed in a modular way to allow an easy extension to lidar configurations not yet implemented. The primary goal of the ELPP module is to enable the application of quality-assured procedures in the lidar data analysis starting from the raw lidar data. This provides the added value of full traceability of each delivered lidar product. Several tests have been performed to check the proper functioning of the ELPP module. The whole SCC has been tested with the same synthetic data sets, which were used for the EARLINET algorithm inter-comparison exercise. The ELPP module has been successfully employed for the automatic near-real-time pre-processing of the raw lidar data measured during several EARLINET inter-comparison campaigns as well as during intense field campaigns.

  16. Surface chemistries for antibody microarrays

    SciTech Connect

    Seurynck-Servoss, Shannon L.; Baird, Cheryl L.; Rodland, Karin D.; Zangar, Richard C.

    2007-05-01

    Enzyme-linked immunosorbent assay (ELISA) microarrays promise to be a powerful tool for the detection of disease biomarkers. The original technology for printing ELISA microarray chips and capturing antibodies on slides was derived from the DNA microarray field. However, due to the need to maintain antibody structure and function when immobilized, surface chemistries used for DNA microarrays are not always appropriate for ELISA microarrays. In order to identify better surface chemistries for antibody capture, a number of commercial companies and academic research groups have developed new slide types that could improve antibody function in microarray applications. In this review we compare and contrast the commercially available slide chemistries, as well as highlight some promising recent advances in the field.

  17. Tiling Microarray Analysis Tools

    SciTech Connect

    Nix, Davis Austin

    2005-05-04

    TiMAT is a package of 23 command line Java applications for use in the analysis of Affymetrix tiled genomic microarray data. TiMAT enables: 1) Rebuilding the genome annotation for entire tiled arrays (repeat filtering, chromosomal coordinate assignment). 2) Post processing of oligo intensity values (quantile normalization, median scaling, PMMM transformation), 3) Significance testing (Wilcoxon rank sum and signed rank tests, intensity difference and ratio tests) and Interval refinement (filtering based on multiple statistics, overlap comparisons), 4) Data visualization (detailed thumbnail/zoomed view with Interval Plots and data export to Affymetrix's Integrated Genome Browser) and Data reports (spreadsheet summaries and detailed profiles)

  18. Data preprocessing method for liquid chromatography-mass spectrometry based metabolomics.

    PubMed

    Wei, Xiaoli; Shi, Xue; Kim, Seongho; Zhang, Li; Patrick, Jeffrey S; Binkley, Joe; McClain, Craig; Zhang, Xiang

    2012-09-18

    A set of data preprocessing algorithms for peak detection and peak list alignment are reported for analysis of liquid chromatography-mass spectrometry (LC-MS)-based metabolomics data. For spectrum deconvolution, peak picking is achieved at the selected ion chromatogram (XIC) level. To estimate and remove the noise in XICs, each XIC is first segmented into several peak groups based on the continuity of scan number, and the noise level is estimated by all the XIC signals, except the regions potentially with presence of metabolite ion peaks. After removing noise, the peaks of molecular ions are detected using both the first and the second derivatives, followed by an efficient exponentially modified Gaussian-based peak deconvolution method for peak fitting. A two-stage alignment algorithm is also developed, where the retention times of all peaks are first transferred into the z-score domain and the peaks are aligned based on the measure of their mixture scores after retention time correction using a partial linear regression. Analysis of a set of spike-in LC-MS data from three groups of samples containing 16 metabolite standards mixed with metabolite extract from mouse livers demonstrates that the developed data preprocessing method performs better than two of the existing popular data analysis packages, MZmine2.6 and XCMS(2), for peak picking, peak list alignment, and quantification.

  19. CLUM: a cluster program for analyzing microarray data.

    PubMed

    Irigoien, I; Fernandez, E; Vives, S; Arenas, C

    2008-08-01

    Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems. Cluster analysis has proven to be a very useful tool for investigating the structure of microarray data. This paper presents a program for clustering microarray data, which is based on the so call path-distance. The algorithm gives in each step a partition in two clusters and no prior assumptions on the structure of clusters are required. It assigns each object (gene or sample) to only one cluster and gives the global optimum for the function that quantifies the adequacy of a given partition of the sample into k clusters. The program was tested on experimental data sets, showing the robustness of the algorithm. PMID:18825964

  20. Ecotoxicogenomics: Microarray interlaboratory comparability.

    PubMed

    Vidal-Dorsch, Doris E; Bay, Steven M; Moore, Shelly; Layton, Blythe; Mehinto, Alvine C; Vulpe, Chris D; Brown-Augustine, Marianna; Loguinov, Alex; Poynton, Helen; Garcia-Reyero, Natàlia; Perkins, Edward J; Escalon, Lynn; Denslow, Nancy D; Cristina, Colli-Dula R; Doan, Tri; Shukradas, Shweta; Bruno, Joy; Brown, Lorraine; Van Agglen, Graham; Jackman, Paula; Bauer, Megan

    2016-02-01

    Transcriptomic analysis can complement traditional ecotoxicology data by providing mechanistic insight, and by identifying sub-lethal organismal responses and contaminant classes underlying observed toxicity. Before transcriptomic information can be used in monitoring and risk assessment, it is necessary to determine its reproducibility and detect key steps impacting the reliable identification of differentially expressed genes. A custom 15K-probe microarray was used to conduct transcriptomics analyses across six laboratories with estuarine amphipods exposed to cyfluthrin-spiked or control sediments (10 days). Two sample types were generated, one consisted of total RNA extracts (Ex) from exposed and control samples (extracted by one laboratory) and the other consisted of exposed and control whole body amphipods (WB) from which each laboratory extracted RNA. Our findings indicate that gene expression microarray results are repeatable. Differentially expressed data had a higher degree of repeatability across all laboratories in samples with similar RNA quality (Ex) when compared to WB samples with more variable RNA quality. Despite such variability a subset of genes were consistently identified as differentially expressed across all laboratories and sample types. We found that the differences among the individual laboratory results can be attributed to several factors including RNA quality and technical expertise, but the overall results can be improved by following consistent protocols and with appropriate training.

  1. Integrating data from heterogeneous DNA microarray platforms.

    PubMed

    Valente, Eduardo; Rocha, Miguel

    2015-01-01

    DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus. PMID:26673932

  2. The Genopolis Microarray Database

    PubMed Central

    Splendiani, Andrea; Brandizi, Marco; Even, Gael; Beretta, Ottavio; Pavelka, Norman; Pelizzola, Mattia; Mayhaus, Manuel; Foti, Maria; Mauri, Giancarlo; Ricciardi-Castagnoli, Paola

    2007-01-01

    Background Gene expression databases are key resources for microarray data management and analysis and the importance of a proper annotation of their content is well understood. Public repositories as well as microarray database systems that can be implemented by single laboratories exist. However, there is not yet a tool that can easily support a collaborative environment where different users with different rights of access to data can interact to define a common highly coherent content. The scope of the Genopolis database is to provide a resource that allows different groups performing microarray experiments related to a common subject to create a common coherent knowledge base and to analyse it. The Genopolis database has been implemented as a dedicated system for the scientific community studying dendritic and macrophage cells functions and host-parasite interactions. Results The Genopolis Database system allows the community to build an object based MIAME compliant annotation of their experiments and to store images, raw and processed data from the Affymetrix GeneChip® platform. It supports dynamical definition of controlled vocabularies and provides automated and supervised steps to control the coherence of data and annotations. It allows a precise control of the visibility of the database content to different sub groups in the community and facilitates exports of its content to public repositories. It provides an interactive users interface for data analysis: this allows users to visualize data matrices based on functional lists and sample characterization, and to navigate to other data matrices defined by similarity of expression values as well as functional characterizations of genes involved. A collaborative environment is also provided for the definition and sharing of functional annotation by users. Conclusion The Genopolis Database supports a community in building a common coherent knowledge base and analyse it. This fills a gap between a local

  3. An Introduction to MAMA (Meta-Analysis of MicroArray data) System.

    PubMed

    Zhang, Zhe; Fenstermacher, David

    2005-01-01

    Analyzing microarray data across multiple experiments has been proven advantageous. To support this kind of analysis, we are developing a software system called MAMA (Meta-Analysis of MicroArray data). MAMA utilizes a client-server architecture with a relational database on the server-side for the storage of microarray datasets collected from various resources. The client-side is an application running on the end user's computer that allows the user to manipulate microarray data and analytical results locally. MAMA implementation will integrate several analytical methods, including meta-analysis within an open-source framework offering other developers the flexibility to plug in additional statistical algorithms.

  4. DNA Microarray-Based Diagnostics.

    PubMed

    Marzancola, Mahsa Gharibi; Sedighi, Abootaleb; Li, Paul C H

    2016-01-01

    The DNA microarray technology is currently a useful biomedical tool which has been developed for a variety of diagnostic applications. However, the development pathway has not been smooth and the technology has faced some challenges. The reliability of the microarray data and also the clinical utility of the results in the early days were criticized. These criticisms added to the severe competition from other techniques, such as next-generation sequencing (NGS), impacting the growth of microarray-based tests in the molecular diagnostic market.Thanks to the advances in the underlying technologies as well as the tremendous effort offered by the research community and commercial vendors, these challenges have mostly been addressed. Nowadays, the microarray platform has achieved sufficient standardization and method validation as well as efficient probe printing, liquid handling and signal visualization. Integration of various steps of the microarray assay into a harmonized and miniaturized handheld lab-on-a-chip (LOC) device has been a goal for the microarray community. In this respect, notable progress has been achieved in coupling the DNA microarray with the liquid manipulation microsystem as well as the supporting subsystem that will generate the stand-alone LOC device.In this chapter, we discuss the major challenges that microarray technology has faced in its almost two decades of development and also describe the solutions to overcome the challenges. In addition, we review the advancements of the technology, especially the progress toward developing the LOC devices for DNA diagnostic applications.

  5. A brief introduction to tiling microarrays: principles, concepts, and applications.

    PubMed

    Lemetre, Christophe; Zhang, Zhengdong D

    2013-01-01

    Technological achievements have always contributed to the advancement of biomedical research. It has never been more so than in recent times, when the development and application of innovative cutting-edge technologies have transformed biology into a data-rich quantitative science. This stunning revolution in biology primarily ensued from the emergence of microarrays over two decades ago. The completion of whole-genome sequencing projects and the advance in microarray manufacturing technologies enabled the development of tiling microarrays, which gave unprecedented genomic coverage. Since their first description, several types of application of tiling arrays have emerged, each aiming to tackle a different biological problem. Although numerous algorithms have already been developed to analyze microarray data, new method development is still needed not only for better performance but also for integration of available microarray data sets, which without doubt constitute one of the largest collections of biological data ever generated. In this chapter we first introduce the principles behind the emergence and the development of tiling microarrays, and then discuss with some examples how they are used to investigate different biological problems.

  6. Microarray oligonucleotide probe designer (MOPeD): A web service

    PubMed Central

    Patel, Viren C; Mondal, Kajari; Shetty, Amol Carl; Horner, Vanessa L; Bedoyan, Jirair K; Martin, Donna; Caspary, Tamara; Cutler, David J; Zwick, Michael E

    2011-01-01

    Methods of genomic selection that combine high-density oligonucleotide microarrays with next-generation DNA sequencing allow investigators to characterize genomic variation in selected portions of complex eukaryotic genomes. Yet choosing which specific oligonucleotides to be use can pose a major technical challenge. To address this issue, we have developed a software package called MOPeD (Microarray Oligonucleotide Probe Designer), which automates the process of designing genomic selection microarrays. This web-based software allows individual investigators to design custom genomic selection microarrays optimized for synthesis with Roche NimbleGen’s maskless photolithography. Design parameters include uniqueness of the probe sequences, melting temperature, hairpin formation, and the presence of single nucleotide polymorphisms. We generated probe databases for the human, mouse, and rhesus macaque genomes and conducted experimental validation of MOPeD-designed microarrays in human samples by sequencing the human X chromosome exome, where relevant sequence metrics indicated superior performance relative to a microarray designed by the Roche NimbleGen proprietary algorithm. We also performed validation in the mouse to identify known mutations contained within a 487-kb region from mouse chromosome 16, the mouse chromosome 16 exome (1.7 Mb), and the mouse chromosome 12 exome (3.3 Mb). Our results suggest that the open source MOPeD software package and website (http://moped.genetics.emory.edu/) will make a valuable resource for investigators in their sequence-based studies of complex eukaryotic genomes. PMID:21379402

  7. Living-Cell Microarrays

    PubMed Central

    Yarmush, Martin L.; King, Kevin R.

    2011-01-01

    Living cells are remarkably complex. To unravel this complexity, living-cell assays have been developed that allow delivery of experimental stimuli and measurement of the resulting cellular responses. High-throughput adaptations of these assays, known as living-cell microarrays, which are based on microtiter plates, high-density spotting, microfabrication, and microfluidics technologies, are being developed for two general applications: (a) to screen large-scale chemical and genomic libraries and (b) to systematically investigate the local cellular microenvironment. These emerging experimental platforms offer exciting opportunities to rapidly identify genetic determinants of disease, to discover modulators of cellular function, and to probe the complex and dynamic relationships between cells and their local environment. PMID:19413510

  8. Tiling Microarray Analysis Tools

    2005-05-04

    TiMAT is a package of 23 command line Java applications for use in the analysis of Affymetrix tiled genomic microarray data. TiMAT enables: 1) Rebuilding the genome annotation for entire tiled arrays (repeat filtering, chromosomal coordinate assignment). 2) Post processing of oligo intensity values (quantile normalization, median scaling, PMMM transformation), 3) Significance testing (Wilcoxon rank sum and signed rank tests, intensity difference and ratio tests) and Interval refinement (filtering based on multiple statistics, overlap comparisons),more » 4) Data visualization (detailed thumbnail/zoomed view with Interval Plots and data export to Affymetrix's Integrated Genome Browser) and Data reports (spreadsheet summaries and detailed profiles)« less

  9. The minimal preprocessing pipelines for the Human Connectome Project.

    PubMed

    Glasser, Matthew F; Sotiropoulos, Stamatios N; Wilson, J Anthony; Coalson, Timothy S; Fischl, Bruce; Andersson, Jesper L; Xu, Junqian; Jbabdi, Saad; Webster, Matthew; Polimeni, Jonathan R; Van Essen, David C; Jenkinson, Mark

    2013-10-15

    The Human Connectome Project (HCP) faces the challenging task of bringing multiple magnetic resonance imaging (MRI) modalities together in a common automated preprocessing framework across a large cohort of subjects. The MRI data acquired by the HCP differ in many ways from data acquired on conventional 3 Tesla scanners and often require newly developed preprocessing methods. We describe the minimal preprocessing pipelines for structural, functional, and diffusion MRI that were developed by the HCP to accomplish many low level tasks, including spatial artifact/distortion removal, surface generation, cross-modal registration, and alignment to standard space. These pipelines are specially designed to capitalize on the high quality data offered by the HCP. The final standard space makes use of a recently introduced CIFTI file format and the associated grayordinate spatial coordinate system. This allows for combined cortical surface and subcortical volume analyses while reducing the storage and processing requirements for high spatial and temporal resolution data. Here, we provide the minimum image acquisition requirements for the HCP minimal preprocessing pipelines and additional advice for investigators interested in replicating the HCP's acquisition protocols or using these pipelines. Finally, we discuss some potential future improvements to the pipelines.

  10. OPSN: The IMS COMSYS 1 and 2 Data Preprocessing System.

    ERIC Educational Resources Information Center

    Yu, John

    The Instructional Management System (IMS) developed by the Southwest Regional Laboratory (SWRL) processes student and teacher-generated data through the use of an optical scanner that produces a magnetic tape (Scan Tape) for input to IMS. A series of computer routines, OPSN, preprocesses the Scan Tape and prepares the data for transmission to the…

  11. An effective measured data preprocessing method in electrical impedance tomography.

    PubMed

    Yu, Chenglong; Yue, Shihong; Wang, Jianpei; Wang, Huaxiang

    2014-01-01

    As an advanced process detection technology, electrical impedance tomography (EIT) has widely been paid attention to and studied in the industrial fields. But the EIT techniques are greatly limited to the low spatial resolutions. This problem may result from the incorrect preprocessing of measuring data and lack of general criterion to evaluate different preprocessing processes. In this paper, an EIT data preprocessing method is proposed by all rooting measured data and evaluated by two constructed indexes based on all rooted EIT measured data. By finding the optimums of the two indexes, the proposed method can be applied to improve the EIT imaging spatial resolutions. In terms of a theoretical model, the optimal rooting times of the two indexes range in [0.23, 0.33] and in [0.22, 0.35], respectively. Moreover, these factors that affect the correctness of the proposed method are generally analyzed. The measuring data preprocessing is necessary and helpful for any imaging process. Thus, the proposed method can be generally and widely used in any imaging process. Experimental results validate the two proposed indexes.

  12. An Effective Measured Data Preprocessing Method in Electrical Impedance Tomography

    PubMed Central

    Yu, Chenglong; Yue, Shihong; Wang, Jianpei; Wang, Huaxiang

    2014-01-01

    As an advanced process detection technology, electrical impedance tomography (EIT) has widely been paid attention to and studied in the industrial fields. But the EIT techniques are greatly limited to the low spatial resolutions. This problem may result from the incorrect preprocessing of measuring data and lack of general criterion to evaluate different preprocessing processes. In this paper, an EIT data preprocessing method is proposed by all rooting measured data and evaluated by two constructed indexes based on all rooted EIT measured data. By finding the optimums of the two indexes, the proposed method can be applied to improve the EIT imaging spatial resolutions. In terms of a theoretical model, the optimal rooting times of the two indexes range in [0.23, 0.33] and in [0.22, 0.35], respectively. Moreover, these factors that affect the correctness of the proposed method are generally analyzed. The measuring data preprocessing is necessary and helpful for any imaging process. Thus, the proposed method can be generally and widely used in any imaging process. Experimental results validate the two proposed indexes. PMID:25165735

  13. Fuzzy logic for elimination of redundant information of microarray data.

    PubMed

    Huerta, Edmundo Bonilla; Duval, Béatrice; Hao, Jin-Kao

    2008-06-01

    Gene subset selection is essential for classification and analysis of microarray data. However, gene selection is known to be a very difficult task since gene expression data not only have high dimensionalities, but also contain redundant information and noises. To cope with these difficulties, this paper introduces a fuzzy logic based pre-processing approach composed of two main steps. First, we use fuzzy inference rules to transform the gene expression levels of a given dataset into fuzzy values. Then we apply a similarity relation to these fuzzy values to define fuzzy equivalence groups, each group containing strongly similar genes. Dimension reduction is achieved by considering for each group of similar genes a single representative based on mutual information. To assess the usefulness of this approach, extensive experimentations were carried out on three well-known public datasets with a combined classification model using three statistic filters and three classifiers. PMID:18973862

  14. Microarray platform for omics analysis

    NASA Astrophysics Data System (ADS)

    Mecklenburg, Michael; Xie, Bin

    2001-09-01

    Microarray technology has revolutionized genetic analysis. However, limitations in genome analysis has lead to renewed interest in establishing 'omic' strategies. As we enter the post-genomic era, new microarray technologies are needed to address these new classes of 'omic' targets, such as proteins, as well as lipids and carbohydrates. We have developed a microarray platform that combines self- assembling monolayers with the biotin-streptavidin system to provide a robust, versatile immobilization scheme. A hydrophobic film is patterned on the surface creating an array of tension wells that eliminates evaporation effects thereby reducing the shear stress to which biomolecules are exposed to during immobilization. The streptavidin linker layer makes it possible to adapt and/or develop microarray based assays using virtually any class of biomolecules including: carbohydrates, peptides, antibodies, receptors, as well as them ore traditional DNA based arrays. Our microarray technology is designed to furnish seamless compatibility across the various 'omic' platforms by providing a common blueprint for fabricating and analyzing arrays. The prototype microarray uses a microscope slide footprint patterned with 2 by 96 flat wells. Data on the microarray platform will be presented.

  15. Development, Characterization and Experimental Validation of a Cultivated Sunflower (Helianthus annuus L.) Gene Expression Oligonucleotide Microarray

    PubMed Central

    Fernandez, Paula; Soria, Marcelo; Blesa, David; DiRienzo, Julio; Moschen, Sebastian; Rivarola, Maximo; Clavijo, Bernardo Jose; Gonzalez, Sergio; Peluffo, Lucila; Príncipi, Dario; Dosio, Guillermo; Aguirrezabal, Luis; García-García, Francisco; Conesa, Ana; Hopp, Esteban; Dopazo, Joaquín; Heinz, Ruth Amelia; Paniego, Norma

    2012-01-01

    Oligonucleotide-based microarrays with accurate gene coverage represent a key strategy for transcriptional studies in orphan species such as sunflower, H. annuus L., which lacks full genome sequences. The goal of this study was the development and functional annotation of a comprehensive sunflower unigene collection and the design and validation of a custom sunflower oligonucleotide-based microarray. A large scale EST (>130,000 ESTs) curation, assembly and sequence annotation was performed using Blast2GO (www.blast2go.de). The EST assembly comprises 41,013 putative transcripts (12,924 contigs and 28,089 singletons). The resulting Sunflower Unigen Resource (SUR version 1.0) was used to design an oligonucleotide-based Agilent microarray for cultivated sunflower. This microarray includes a total of 42,326 features: 1,417 Agilent controls, 74 control probes for sunflower replicated 10 times (740 controls) and 40,169 different non-control probes. Microarray performance was validated using a model experiment examining the induction of senescence by water deficit. Pre-processing and differential expression analysis of Agilent microarrays was performed using the Bioconductor limma package. The analyses based on p-values calculated by eBayes (p<0.01) allowed the detection of 558 differentially expressed genes between water stress and control conditions; from these, ten genes were further validated by qPCR. Over-represented ontologies were identified using FatiScan in the Babelomics suite. This work generated a curated and trustable sunflower unigene collection, and a custom, validated sunflower oligonucleotide-based microarray using Agilent technology. Both the curated unigene collection and the validated oligonucleotide microarray provide key resources for sunflower genome analysis, transcriptional studies, and molecular breeding for crop improvement. PMID:23110046

  16. Development, characterization and experimental validation of a cultivated sunflower (Helianthus annuus L.) gene expression oligonucleotide microarray.

    PubMed

    Fernandez, Paula; Soria, Marcelo; Blesa, David; DiRienzo, Julio; Moschen, Sebastian; Rivarola, Maximo; Clavijo, Bernardo Jose; Gonzalez, Sergio; Peluffo, Lucila; Príncipi, Dario; Dosio, Guillermo; Aguirrezabal, Luis; García-García, Francisco; Conesa, Ana; Hopp, Esteban; Dopazo, Joaquín; Heinz, Ruth Amelia; Paniego, Norma

    2012-01-01

    Oligonucleotide-based microarrays with accurate gene coverage represent a key strategy for transcriptional studies in orphan species such as sunflower, H. annuus L., which lacks full genome sequences. The goal of this study was the development and functional annotation of a comprehensive sunflower unigene collection and the design and validation of a custom sunflower oligonucleotide-based microarray. A large scale EST (>130,000 ESTs) curation, assembly and sequence annotation was performed using Blast2GO (www.blast2go.de). The EST assembly comprises 41,013 putative transcripts (12,924 contigs and 28,089 singletons). The resulting Sunflower Unigen Resource (SUR version 1.0) was used to design an oligonucleotide-based Agilent microarray for cultivated sunflower. This microarray includes a total of 42,326 features: 1,417 Agilent controls, 74 control probes for sunflower replicated 10 times (740 controls) and 40,169 different non-control probes. Microarray performance was validated using a model experiment examining the induction of senescence by water deficit. Pre-processing and differential expression analysis of Agilent microarrays was performed using the Bioconductor limma package. The analyses based on p-values calculated by eBayes (p<0.01) allowed the detection of 558 differentially expressed genes between water stress and control conditions; from these, ten genes were further validated by qPCR. Over-represented ontologies were identified using FatiScan in the Babelomics suite. This work generated a curated and trustable sunflower unigene collection, and a custom, validated sunflower oligonucleotide-based microarray using Agilent technology. Both the curated unigene collection and the validated oligonucleotide microarray provide key resources for sunflower genome analysis, transcriptional studies, and molecular breeding for crop improvement. PMID:23110046

  17. Tactile on-chip pre-processing with techniques from artificial retinas

    NASA Astrophysics Data System (ADS)

    Maldonado-Lopez, R.; Vidal-Verdu, F.; Linan, G.; Roca, E.; Rodriguez-Vazquez, A.

    2005-06-01

    The interest in tactile sensors is increasing as their use in complex unstructured environments is demanded, like in telepresence, minimal invasive surgery, robotics etc. The matrix of pressure data these devices provide can be managed with many image processing algorithms to extract the required information. However, as in the case of vision chips or artificial retinas, problems arise when the array size and the computation complexity increase. Having a look to the skin, the information collected by every mechanoreceptor is not carried to the brain for its processing, but some complex pre-processing is performed to fit the limited throughput of the nervous system. This is specially important for high bandwidth demanding tasks. Experimental works report that neural response of skin mechanoreceptors encodes the change in local shape from an offset level rather than the absolute force or pressure distributions. This is also the behavior of the retina, which implements a spatio-temporal averaging. We propose the same strategy in tactile preprocessing, and we show preliminary results when it faces the detection of the slip, which involves fast real-time processing.

  18. Detect Key Gene Information in Classification of Microarray Data

    NASA Astrophysics Data System (ADS)

    Liu, Yihui

    2008-12-01

    We detect key information of high-dimensional microarray profiles based on wavelet analysis and genetic algorithm. Firstly, wavelet transform is employed to extract approximation coefficients at 2nd level, which remove noise and reduce dimensionality. Genetic algorithm (GA) is performed to select the optimized features. Experiments are performed on four datasets, and experimental results prove that approximation coefficients are efficient way to characterize the microarray data. Furthermore, in order to detect the key genes in the classification of cancer tissue, we reconstruct the approximation part of gene profiles based on orthogonal approximation coefficients. The significant genes are selected based on reconstructed approximation information using genetic algorithm. Experiments prove that good performance of classification is achieved based on the selected key genes.

  19. Microarray Analysis of Microbial Weathering

    NASA Astrophysics Data System (ADS)

    Olsson-Francis, K.; van Houdt, R.; Leys, N.; Mergeay, M.; Cockell, C. S.

    2010-04-01

    Microarray analysis of the heavy metal resistant bacterium, Cupriavidus metallidurans CH34 was used to investigate the genes involved in the weathering. The results demonstrated that large porin and membrane transporter genes were unregulated.

  20. DNA microarray data and contextual analysis of correlation graphs

    PubMed Central

    Rougemont, Jacques; Hingamp, Pascal

    2003-01-01

    Background DNA microarrays are used to produce large sets of expression measurements from which specific biological information is sought. Their analysis requires efficient and reliable algorithms for dimensional reduction, classification and annotation. Results We study networks of co-expressed genes obtained from DNA microarray experiments. The mathematical concept of curvature on graphs is used to group genes or samples into clusters to which relevant gene or sample annotations are automatically assigned. Application to publicly available yeast and human lymphoma data demonstrates the reliability of the method in spite of its simplicity, especially with respect to the small number of parameters involved. Conclusions We provide a method for automatically determining relevant gene clusters among the many genes monitored with microarrays. The automatic annotations and the graphical interface improve the readability of the data. A C++ implementation, called Trixy, is available from . PMID:12720549

  1. [Protein microarrays and personalized medicine].

    PubMed

    Yu, Xiabo; Schneiderhan-Marra, Nicole; Joos, Thomas O

    2011-01-01

    Over the last 10 years, DNA microarrays have achieved a robust analytical performance, enabling their use for analyzing the whole transcriptome or for screening thousands of single-nucleotide polymorphisms in a single experiment. DNA microarrays allow scientists to correlate gene expression signatures with disease progression, to screen for disease-specific mutations, and to treat patients according to their individual genetic profiles; however, the real key is proteins and their manifold functions. It is necessary to achieve a greater understanding of not only protein function and abundance but also their role in the development of diseases. Protein concentrations have been shown to reflect the physiological and pathologic state of an organ, tissue, or cells far more directly than DNA, and proteins can be profiled effectively with protein microarrays, which require only a small amount of sample material. Protein microarrays have become wellestablished tools in basic and applied research, and the first products have already entered the in vitro diagnostics market. This review focuses on protein microarray applications for biomarker discovery and validation, disease diagnosis, and use within the area of personalized medicine. Protein microarrays have proved to be reliable research tools in screening for a multitude of parameters with only a minimal quantity of sample and have enormous potential in applications for diagnostic and personalized medicine.

  2. Linguistic Preprocessing and Tagging for Problem Report Trend Analysis

    NASA Technical Reports Server (NTRS)

    Beil, Robert J.; Malin, Jane T.

    2012-01-01

    Mr. Robert Beil, Systems Engineer at Kennedy Space Center (KSC), requested the NASA Engineering and Safety Center (NESC) develop a prototype tool suite that combines complementary software technology used at Johnson Space Center (JSC) and KSC for problem report preprocessing and semantic tag extraction, to improve input to data mining and trend analysis. This document contains the outcome of the assessment and the Findings, Observations and NESC Recommendations.

  3. Integration of geometric modeling and advanced finite element preprocessing

    NASA Technical Reports Server (NTRS)

    Shephard, Mark S.; Finnigan, Peter M.

    1987-01-01

    The structure to a geometry based finite element preprocessing system is presented. The key features of the system are the use of geometric operators to support all geometric calculations required for analysis model generation, and the use of a hierarchic boundary based data structure for the major data sets within the system. The approach presented can support the finite element modeling procedures used today as well as the fully automated procedures under development.

  4. Data preprocessing method for fluorescence molecular tomography using a priori information provided by CT.

    PubMed

    Fu, Jianwei; Yang, Xiaoquan; Meng, Yuanzheng; Luo, Qingming; Gong, Hui

    2012-01-01

    The combined system of micro-CT and fluorescence molecular tomography (FMT) offers a new tool to provide anatomical and functional information of small animals in a single study. To take advantages of the combined system, a data preprocessing method is proposed to extract the valid data for FMT reconstruction algorithms using a priori information provided by CT. The boundary information of the animal and animal holder is extracted from reconstructed CT volume data. A ray tracing method is used to trace the path of the excitation beam, calculate the locations and directions of the optional sources and determine whether the optional sources are valid. To accurately calculate the projections of the detectors on optical images and judge their validity, a combination of perspective projection and inverse ray tracing method are adopted to offer optimal performance. The imaging performance of the combined system with the presented method is validated through experimental rat imaging.

  5. Improving Drift Correction by Double Projection Preprocessing in Gas Sensor Arrays

    NASA Astrophysics Data System (ADS)

    Padilla, M.; Perera, A.; Montoliu, I.; Chaudry, A.; Persaud, K.; Marco, S.

    2009-05-01

    It is well known that gas chemical sensors are strongly affected by drift. Drift consist on changes in sensors responses along the time, which make that initial statistical models for gas or odor recognition become useless after a period of time of about weeks. Gas sensor arrays based instruments periodically need calibrations that are expensive and laborious. Many different statistical methods have been proposed to extend time between recalibrations. In this work, a simple preprocessing technique based on a double projection is proposed as a prior step to a posterior drift correction algorithm (in this particular case, Direct Orthogonal Signal Correction). This method highly improves the time stability of data in relation with the one obtained by using only such drift correction method. The performance of this technique will be evaluated on a dataset composed by measurements of three analytes by a polymer sensor array along ten months.

  6. Review of feed forward neural network classification preprocessing techniques

    NASA Astrophysics Data System (ADS)

    Asadi, Roya; Kareem, Sameem Abdul

    2014-06-01

    The best feature of artificial intelligent Feed Forward Neural Network (FFNN) classification models is learning of input data through their weights. Data preprocessing and pre-training are the contributing factors in developing efficient techniques for low training time and high accuracy of classification. In this study, we investigate and review the powerful preprocessing functions of the FFNN models. Currently initialization of the weights is at random which is the main source of problems. Multilayer auto-encoder networks as the latest technique like other related techniques is unable to solve the problems. Weight Linear Analysis (WLA) is a combination of data pre-processing and pre-training to generate real weights through the use of normalized input values. The FFNN model by using the WLA increases classification accuracy and improve training time in a single epoch without any training cycle, the gradient of the mean square error function, updating the weights. The results of comparison and evaluation show that the WLA is a powerful technique in the FFNN classification area yet.

  7. Image pre-processing for optimizing automated photogrammetry performances

    NASA Astrophysics Data System (ADS)

    Guidi, G.; Gonizzi, S.; Micoli, L. L.

    2014-05-01

    The purpose of this paper is to analyze how optical pre-processing with polarizing filters and digital pre-processing with HDR imaging, may improve the automated 3D modeling pipeline based on SFM and Image Matching, with special emphasis on optically non-cooperative surfaces of shiny or dark materials. Because of the automatic detection of homologous points, the presence of highlights due to shiny materials, or nearly uniform dark patches produced by low reflectance materials, may produce erroneous matching involving wrong 3D point estimations, and consequently holes and topological errors on the mesh originated by the associated dense 3D cloud. This is due to the limited dynamic range of the 8 bit digital images that are matched each other for generating 3D data. The same 256 levels can be more usefully employed if the actual dynamic range is compressed, avoiding luminance clipping on the darker and lighter image areas. Such approach is here considered both using optical filtering and HDR processing with tone mapping, with experimental evaluation on different Cultural Heritage objects characterized by non-cooperative optical behavior. Three test images of each object have been captured from different positions, changing the shooting conditions (filter/no-filter) and the image processing (no processing/HDR processing), in order to have the same 3 camera orientations with different optical and digital pre-processing, and applying the same automated process to each photo set.

  8. Optimization of miRNA-seq data preprocessing.

    PubMed

    Tam, Shirley; Tsao, Ming-Sound; McPherson, John D

    2015-11-01

    The past two decades of microRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regulators of many biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platform of choice for the discovery and quantification of miRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstream analyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn from downstream analyses. Using a spike-in dilution study, we evaluated the effects of several general-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. We make practical recommendations on the optimal preprocessing methods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments.

  9. Analysis-Driven Lossy Compression of DNA Microarray Images.

    PubMed

    Hernández-Cabronero, Miguel; Blanes, Ian; Pinho, Armando J; Marcellin, Michael W; Serra-Sagristà, Joan

    2016-02-01

    DNA microarrays are one of the fastest-growing new technologies in the field of genetic research, and DNA microarray images continue to grow in number and size. Since analysis techniques are under active and ongoing development, storage, transmission and sharing of DNA microarray images need be addressed, with compression playing a significant role. However, existing lossless coding algorithms yield only limited compression performance (compression ratios below 2:1), whereas lossy coding methods may introduce unacceptable distortions in the analysis process. This work introduces a novel Relative Quantizer (RQ), which employs non-uniform quantization intervals designed for improved compression while bounding the impact on the DNA microarray analysis. This quantizer constrains the maximum relative error introduced into quantized imagery, devoting higher precision to pixels critical to the analysis process. For suitable parameter choices, the resulting variations in the DNA microarray analysis are less than half of those inherent to the experimental variability. Experimental results reveal that appropriate analysis can still be performed for average compression ratios exceeding 4.5:1.

  10. Classification of large microarray datasets using fast random forest construction.

    PubMed

    Manilich, Elena A; Özsoyoğlu, Z Meral; Trubachev, Valeriy; Radivoyevitch, Tomas

    2011-04-01

    Random forest is an ensemble classification algorithm. It performs well when most predictive variables are noisy and can be used when the number of variables is much larger than the number of observations. The use of bootstrap samples and restricted subsets of attributes makes it more powerful than simple ensembles of trees. The main advantage of a random forest classifier is its explanatory power: it measures variable importance or impact of each factor on a predicted class label. These characteristics make the algorithm ideal for microarray data. It was shown to build models with high accuracy when tested on high-dimensional microarray datasets. Current implementations of random forest in the machine learning and statistics community, however, limit its usability for mining over large datasets, as they require that the entire dataset remains permanently in memory. We propose a new framework, an optimized implementation of a random forest classifier, which addresses specific properties of microarray data, takes computational complexity of a decision tree algorithm into consideration, and shows excellent computing performance while preserving predictive accuracy. The implementation is based on reducing overlapping computations and eliminating dependency on the size of main memory. The implementation's excellent computational performance makes the algorithm useful for interactive data analyses and data mining.

  11. Comparison of multivariate preprocessing techniques as applied to electronic tongue based pattern classification for black tea.

    PubMed

    Palit, Mousumi; Tudu, Bipan; Bhattacharyya, Nabarun; Dutta, Ankur; Dutta, Pallab Kumar; Jana, Arun; Bandyopadhyay, Rajib; Chatterjee, Anutosh

    2010-08-18

    In an electronic tongue, preprocessing on raw data precedes pattern analysis and choice of the appropriate preprocessing technique is crucial for the performance of the pattern classifier. While attempting to classify different grades of black tea using a voltammetric electronic tongue, different preprocessing techniques have been explored and a comparison of their performances is presented in this paper. The preprocessing techniques are compared first by a quantitative measurement of separability followed by principle component analysis; and then two different supervised pattern recognition models based on neural networks are used to evaluate the performance of the preprocessing techniques.

  12. Segmentation of prostate cancer tissue microarray images

    NASA Astrophysics Data System (ADS)

    Cline, Harvey E.; Can, Ali; Padfield, Dirk

    2006-02-01

    Prostate cancer is diagnosed by histopathology interpretation of hematoxylin and eosin (H and E)-stained tissue sections. Gland and nuclei distributions vary with the disease grade. The morphological features vary with the advance of cancer where the epithelial regions grow into the stroma. An efficient pathology slide image analysis method involved using a tissue microarray with known disease stages. Digital 24-bit RGB images were acquired for each tissue element on the slide with both 10X and 40X objectives. Initial segmentation at low magnification was accomplished using prior spectral characteristics from a training tissue set composed of four tissue clusters; namely, glands, epithelia, stroma and nuclei. The segmentation method was automated by using the training RGB values as an initial guess and iterating the averaging process 10 times to find the four cluster centers. Labels were assigned to the nearest cluster center in red-blue spectral feature space. An automatic threshold algorithm separated the glands from the tissue. A visual pseudo color representation of 60 segmented tissue microarray image was generated where white, pink, red, blue colors represent glands, epithelia, stroma and nuclei, respectively. The higher magnification images provided refined nuclei morphology. The nuclei were detected with a RGB color space principle component analysis that resulted in a grey scale image. The shape metrics such as compactness, elongation, minimum and maximum diameters were calculated based on the eigenvalues of the best-fitting ellipses to the nuclei.

  13. Inferring genetic networks from microarray data.

    SciTech Connect

    May, Elebeoba Eni; Davidson, George S.; Martin, Shawn Bryan; Werner-Washburne, Margaret C.; Faulon, Jean-Loup Michel

    2004-06-01

    In theory, it should be possible to infer realistic genetic networks from time series microarray data. In practice, however, network discovery has proved problematic. The three major challenges are: (1) inferring the network; (2) estimating the stability of the inferred network; and (3) making the network visually accessible to the user. Here we describe a method, tested on publicly available time series microarray data, which addresses these concerns. The inference of genetic networks from genome-wide experimental data is an important biological problem which has received much attention. Approaches to this problem have typically included application of clustering algorithms [6]; the use of Boolean networks [12, 1, 10]; the use of Bayesian networks [8, 11]; and the use of continuous models [21, 14, 19]. Overviews of the problem and general approaches to network inference can be found in [4, 3]. Our approach to network inference is similar to earlier methods in that we use both clustering and Boolean network inference. However, we have attempted to extend the process to better serve the end-user, the biologist. In particular, we have incorporated a system to assess the reliability of our network, and we have developed tools which allow interactive visualization of the proposed network.

  14. The Effect of LC-MS Data Preprocessing Methods on the Selection of Plasma Biomarkers in Fed vs. Fasted Rats.

    PubMed

    Gürdeniz, Gözde; Kristensen, Mette; Skov, Thomas; Dragsted, Lars O

    2012-01-18

    The metabolic composition of plasma is affected by time passed since the last meal and by individual variation in metabolite clearance rates. Rat plasma in fed and fasted states was analyzed with liquid chromatography quadrupole-time-of-flight mass spectrometry (LC-QTOF) for an untargeted investigation of these metabolite patterns. The dataset was used to investigate the effect of data preprocessing on biomarker selection using three different softwares, MarkerLynxTM, MZmine, XCMS along with a customized preprocessing method that performs binning of m/z channels followed by summation through retention time. Direct comparison of selected features representing the fed or fasted state showed large differences between the softwares. Many false positive markers were obtained from custom data preprocessing compared with dedicated softwares while MarkerLynxTM provided better coverage of markers. However, marker selection was more reliable with the gap filling (or peak finding) algorithms present in MZmine and XCMS. Further identification of the putative markers revealed that many of the differences between the markers selected were due to variations in features representing adducts or daughter ions of the same metabolites or of compounds from the same chemical subclasses, e.g., lyso-phosphatidylcholines (LPCs) and lyso-phosphatidylethanolamines (LPEs). We conclude that despite considerable differences in the performance of the preprocessing tools we could extract the same biological information by any of them. Carnitine, branched-chain amino acids, LPCs and LPEs were identified by all methods as markers of the fed state whereas acetylcarnitine was abundant during fasting in rats.

  15. Comparing Bacterial DNA Microarray Fingerprints

    SciTech Connect

    Willse, Alan R.; Chandler, Darrell P.; White, Amanda M.; Protic, Miroslava; Daly, Don S.; Wunschel, Sharon C.

    2005-08-15

    Detecting subtle genetic differences between microorganisms is an important problem in molecular epidemiology and microbial forensics. In a typical investigation, gel electrophoresis is used to compare randomly amplified DNA fragments between microbial strains, where the patterns of DNA fragment sizes are proxies for a microbe's genotype. The limited genomic sample captured on a gel is often insufficient to discriminate nearly identical strains. This paper examines the application of microarray technology to DNA fingerprinting as a high-resolution alternative to gel-based methods. The so-called universal microarray, which uses short oligonucleotide probes that do not target specific genes or species, is intended to be applicable to all microorganisms because it does not require prior knowledge of genomic sequence. In principle, closely related strains can be distinguished if the number of probes on the microarray is sufficiently large, i.e., if the genome is sufficiently sampled. In practice, we confront noisy data, imperfectly matched hybridizations, and a high-dimensional inference problem. We describe the statistical problems of microarray fingerprinting, outline similarities with and differences from more conventional microarray applications, and illustrate the statistical fingerprinting problem for 10 closely related strains from three Bacillus species, and 3 strains from non-Bacillus species.

  16. A new approach to pre-processing digital image for wavelet-based watermark

    NASA Astrophysics Data System (ADS)

    Agreste, Santa; Andaloro, Guido

    2008-11-01

    The growth of the Internet has increased the phenomenon of digital piracy, in multimedia objects, like software, image, video, audio and text. Therefore it is strategic to individualize and to develop methods and numerical algorithms, which are stable and have low computational cost, that will allow us to find a solution to these problems. We describe a digital watermarking algorithm for color image protection and authenticity: robust, not blind, and wavelet-based. The use of Discrete Wavelet Transform is motivated by good time-frequency features and a good match with Human Visual System directives. These two combined elements are important for building an invisible and robust watermark. Moreover our algorithm can work with any image, thanks to the step of pre-processing of the image that includes resize techniques that adapt to the size of the original image for Wavelet transform. The watermark signal is calculated in correlation with the image features and statistic properties. In the detection step we apply a re-synchronization between the original and watermarked image according to the Neyman-Pearson statistic criterion. Experimentation on a large set of different images has been shown to be resistant against geometric, filtering, and StirMark attacks with a low rate of false alarm.

  17. Protein microarrays: prospects and problems.

    PubMed

    Kodadek, T

    2001-02-01

    Protein microarrays are potentially powerful tools in biochemistry and molecular biology. Two types of protein microarrays are defined. One, termed a protein function array, will consist of thousands of native proteins immobilized in a defined pattern. Such arrays can be utilized for massively parallel testing of protein function, hence the name. The other type is termed a protein-detecting array. This will consist of large numbers of arrayed protein-binding agents. These arrays will allow for expression profiling to be done at the protein level. In this article, some of the major technological challenges to the development of protein arrays are discussed, along with potential solutions.

  18. Preprocessing and parameterizing bioimpedance spectroscopy measurements by singular value decomposition.

    PubMed

    Nejadgholi, Isar; Caytak, Herschel; Bolic, Miodrag; Batkin, Izmail; Shirmohammadi, Shervin

    2015-05-01

    In several applications of bioimpedance spectroscopy, the measured spectrum is parameterized by being fitted into the Cole equation. However, the extracted Cole parameters seem to be inconsistent from one measurement session to another, which leads to a high standard deviation of extracted parameters. This inconsistency is modeled with a source of random variations added to the voltage measurement carried out in the time domain. These random variations may originate from biological variations that are irrelevant to the evidence that we are investigating. Yet, they affect the voltage measured by using a bioimpedance device based on which magnitude and phase of impedance are calculated.By means of simulated data, we showed that Cole parameters are highly affected by this type of variation. We further showed that singular value decomposition (SVD) is an effective tool for parameterizing bioimpedance measurements, which results in more consistent parameters than Cole parameters. We propose to apply SVD as a preprocessing method to reconstruct denoised bioimpedance measurements. In order to evaluate the method, we calculated the relative difference between parameters extracted from noisy and clean simulated bioimpedance spectra. Both mean and standard deviation of this relative difference are shown to effectively decrease when Cole parameters are extracted from preprocessed data in comparison to being extracted from raw measurements.We evaluated the performance of the proposed method in distinguishing three arm positions, for a set of experiments including eight subjects. It is shown that Cole parameters of different positions are not distinguishable when extracted from raw measurements. However, one arm position can be distinguished based on SVD scores. Moreover, all three positions are shown to be distinguished by two parameters, R0/R∞ and Fc, when Cole parameters are extracted from preprocessed measurements. These results suggest that SVD could be considered as an

  19. Preprocessing and Analysis of LC-MS-Based Proteomic Data.

    PubMed

    Tsai, Tsung-Heng; Wang, Minkun; Ressom, Habtom W

    2016-01-01

    Liquid chromatography coupled with mass spectrometry (LC-MS) has been widely used for profiling protein expression levels. This chapter is focused on LC-MS data preprocessing, which is a crucial step in the analysis of LC-MS based proteomics. We provide a high-level overview, highlight associated challenges, and present a step-by-step example for analysis of data from LC-MS based untargeted proteomic study. Furthermore, key procedures and relevant issues with the subsequent analysis by multiple reaction monitoring (MRM) are discussed.

  20. Thermodynamic Post-Processing versus GC-Content Pre-Processing for DNA Codes Satisfying the Hamming Distance and Reverse-Complement Constraints.

    PubMed

    Tulpan, Dan; Smith, Derek H; Montemanni, Roberto

    2014-01-01

    Stochastic, meta-heuristic and linear construction algorithms for the design of DNA strands satisfying Hamming distance and reverse-complement constraints often use a GC-content constraint to pre-process the DNA strands. Since GC-content is a poor predictor of DNA strand hybridization strength the strands can be filtered by post-processing using thermodynamic calculations. An alternative approach is considered here, where the algorithms are modified to remove consideration of GC-content and rely on post-processing alone to obtain large sets of DNA strands with satisfactory melting temperatures. The two approaches (pre-processing GC-content and post-processing melting temperatures) are compared and are shown to be complementary when large DNA sets are desired. In particular, the second approach can give significant improvements when linear constructions are used.

  1. Relevant and significant supervised gene clusters for microarray cancer classification.

    PubMed

    Maji, Pradipta; Das, Chandra

    2012-06-01

    An important application of microarray data in functional genomics is to classify samples according to their gene expression profiles such as to classify cancer versus normal samples or to classify different types or subtypes of cancer. One of the major tasks with gene expression data is to find co-regulated gene groups whose collective expression is strongly associated with sample categories. In this regard, a gene clustering algorithm is proposed to group genes from microarray data. It directly incorporates the information of sample categories in the grouping process for finding groups of co-regulated genes with strong association to the sample categories, yielding a supervised gene clustering algorithm. The average expression of the genes from each cluster acts as its representative. Some significant representatives are taken to form the reduced feature set to build the classifiers for cancer classification. The mutual information is used to compute both gene-gene redundancy and gene-class relevance. The performance of the proposed method, along with a comparison with existing methods, is studied on six cancer microarray data sets using the predictive accuracy of naive Bayes classifier, K-nearest neighbor rule, and support vector machine. An important finding is that the proposed algorithm is shown to be effective for identifying biologically significant gene clusters with excellent predictive capability. PMID:22552589

  2. Validation of MIMGO: a method to identify differentially expressed GO terms in a microarray dataset

    PubMed Central

    2012-01-01

    Background We previously proposed an algorithm for the identification of GO terms that commonly annotate genes whose expression is upregulated or downregulated in some microarray data compared with in other microarray data. We call these “differentially expressed GO terms” and have named the algorithm “matrix-assisted identification method of differentially expressed GO terms” (MIMGO). MIMGO can also identify microarray data in which genes annotated with a differentially expressed GO term are upregulated or downregulated. However, MIMGO has not yet been validated on a real microarray dataset using all available GO terms. Findings We combined Gene Set Enrichment Analysis (GSEA) with MIMGO to identify differentially expressed GO terms in a yeast cell cycle microarray dataset. GSEA followed by MIMGO (GSEA + MIMGO) correctly identified (p < 0.05) microarray data in which genes annotated to differentially expressed GO terms are upregulated. We found that GSEA + MIMGO was slightly less effective than, or comparable to, GSEA (Pearson), a method that uses Pearson’s correlation as a metric, at detecting true differentially expressed GO terms. However, unlike other methods including GSEA (Pearson), GSEA + MIMGO can comprehensively identify the microarray data in which genes annotated with a differentially expressed GO term are upregulated or downregulated. Conclusions MIMGO is a reliable method to identify differentially expressed GO terms comprehensively. PMID:23232071

  3. Microarray Developed on Plastic Substrates.

    PubMed

    Bañuls, María-José; Morais, Sergi B; Tortajada-Genaro, Luis A; Maquieira, Ángel

    2016-01-01

    There is a huge potential interest to use synthetic polymers as versatile solid supports for analytical microarraying. Chemical modification of polycarbonate (PC) for covalent immobilization of probes, micro-printing of protein or nucleic acid probes, development of indirect immunoassay, and development of hybridization protocols are described and discussed. PMID:26614067

  4. Microfluidic microarray systems and methods thereof

    SciTech Connect

    West, Jay A. A.; Hukari, Kyle W.; Hux, Gary A.

    2009-04-28

    Disclosed are systems that include a manifold in fluid communication with a microfluidic chip having a microarray, an illuminator, and a detector in optical communication with the microarray. Methods for using these systems for biological detection are also disclosed.

  5. [The net analyte preprocessing combined with radial basis partial least squares regression applied in noninvasive measurement of blood glucose].

    PubMed

    Li, Qing-Bo; Huang, Zheng-Wei

    2014-02-01

    In order to improve the prediction accuracy of quantitative analysis model in the near-infrared spectroscopy of blood glucose, this paper, by combining net analyte preprocessing (NAP) algorithm and radial basis functions partial least squares (RBFPLS) regression, builds a nonlinear model building method which is suitable for glucose measurement of human, named as NAP-RBFPLS. First, NAP is used to pre-process the near-infrared spectroscopy of blood glucose, in order to effectively extract the information which only relates to glucose signal from the original near-infrared spectra, so that it could effectively weaken the occasional correlation problems of the glucose changes and the interference factors which are caused by the absorption of water, albumin, hemoglobin, fat and other components of the blood in human body, the change of temperature of human body, the drift of measuring instruments, the changes of measuring environment, and the changes of measuring conditions; and then a nonlinear quantitative analysis model is built with the near-infrared spectroscopy data after NAP, in order to solve the nonlinear relationship between glucose concentrations and near-infrared spectroscopy which is caused by body strong scattering. In this paper, the new method is compared with other three quantitative analysis models building on partial least squares (PLS), net analyte preprocessing partial least squares (NAP-PLS) and RBFPLS respectively. At last, the experimental results show that the nonlinear calibration model, developed by combining NAP algorithm and RBFPLS regression, which was put forward in this paper, greatly improves the prediction accuracy of prediction sets, and what has been proved in this paper is that the nonlinear model building method will produce practical applications for the research of non-invasive detection techniques on human glucose concentrations.

  6. Technical Advances of the Recombinant Antibody Microarray Technology Platform for Clinical Immunoproteomics

    PubMed Central

    Delfani, Payam; Dexlin Mellby, Linda; Nordström, Malin; Holmér, Andreas; Ohlsson, Mattias; Borrebaeck, Carl A. K.; Wingren, Christer

    2016-01-01

    In the quest for deciphering disease-associated biomarkers, high-performing tools for multiplexed protein expression profiling of crude clinical samples will be crucial. Affinity proteomics, mainly represented by antibody-based microarrays, have during recent years been established as a proteomic tool providing unique opportunities for parallelized protein expression profiling. But despite the progress, several main technical features and assay procedures remains to be (fully) resolved. Among these issues, the handling of protein microarray data, i.e. the biostatistics parts, is one of the key features to solve. In this study, we have therefore further optimized, validated, and standardized our in-house designed recombinant antibody microarray technology platform. To this end, we addressed the main remaining technical issues (e.g. antibody quality, array production, sample labelling, and selected assay conditions) and most importantly key biostatistics subjects (e.g. array data pre-processing and biomarker panel condensation). This represents one of the first antibody array studies in which these key biostatistics subjects have been studied in detail. Here, we thus present the next generation of the recombinant antibody microarray technology platform designed for clinical immunoproteomics. PMID:27414037

  7. Technical Advances of the Recombinant Antibody Microarray Technology Platform for Clinical Immunoproteomics.

    PubMed

    Delfani, Payam; Dexlin Mellby, Linda; Nordström, Malin; Holmér, Andreas; Ohlsson, Mattias; Borrebaeck, Carl A K; Wingren, Christer

    2016-01-01

    In the quest for deciphering disease-associated biomarkers, high-performing tools for multiplexed protein expression profiling of crude clinical samples will be crucial. Affinity proteomics, mainly represented by antibody-based microarrays, have during recent years been established as a proteomic tool providing unique opportunities for parallelized protein expression profiling. But despite the progress, several main technical features and assay procedures remains to be (fully) resolved. Among these issues, the handling of protein microarray data, i.e. the biostatistics parts, is one of the key features to solve. In this study, we have therefore further optimized, validated, and standardized our in-house designed recombinant antibody microarray technology platform. To this end, we addressed the main remaining technical issues (e.g. antibody quality, array production, sample labelling, and selected assay conditions) and most importantly key biostatistics subjects (e.g. array data pre-processing and biomarker panel condensation). This represents one of the first antibody array studies in which these key biostatistics subjects have been studied in detail. Here, we thus present the next generation of the recombinant antibody microarray technology platform designed for clinical immunoproteomics.

  8. Radar data pre-processing for reliable rain field estimation

    NASA Astrophysics Data System (ADS)

    Daliakopoulos, Ioannis N.; Tsanis, Ioannis K.

    2010-05-01

    A comparative analysis of different pre-processing methods applied to radar data for the minimization of the uncertainty of the produced Z-R relationship is conducted. The study focuses on measurements from 3 ground precipitation stations which are located in close proximity to the Souda Bay C-Band radar in Crete, Greece. While precipitation and reflectivity measurements were both collected in almost synchronized 10 minute intervals, uncertainties related to timing issues are discussed and measurements are aggregated to various scales up to 12 hours. Reflectivity measurements are also transformed and resampled in space, from polar coordinates to regular grids of 500 to 5000m resolution. The tradeoffs of both spatial and temporal transformation are discussed. Data is also filtered for noise using simple thresholding, the Wiener filter and combinations of both methods. The effects of the three pre-processing procedures are studied with respect to the final fit of the data to acceptable Z-R equations for the generation of reliable precipitation fields.

  9. The Microarray Revolution: Perspectives from Educators

    ERIC Educational Resources Information Center

    Brewster, Jay L.; Beason, K. Beth; Eckdahl, Todd T.; Evans, Irene M.

    2004-01-01

    In recent years, microarray analysis has become a key experimental tool, enabling the analysis of genome-wide patterns of gene expression. This review approaches the microarray revolution with a focus upon four topics: 1) the early development of this technology and its application to cancer diagnostics; 2) a primer of microarray research,…

  10. A comparative study on preprocessing techniques in diabetic retinopathy retinal images: illumination correction and contrast enhancement.

    PubMed

    Rasta, Seyed Hossein; Partovi, Mahsa Eisazadeh; Seyedarabi, Hadi; Javadzadeh, Alireza

    2015-01-01

    To investigate the effect of preprocessing techniques including contrast enhancement and illumination correction on retinal image quality, a comparative study was carried out. We studied and implemented a few illumination correction and contrast enhancement techniques on color retinal images to find out the best technique for optimum image enhancement. To compare and choose the best illumination correction technique we analyzed the corrected red and green components of color retinal images statistically and visually. The two contrast enhancement techniques were analyzed using a vessel segmentation algorithm by calculating the sensitivity and specificity. The statistical evaluation of the illumination correction techniques were carried out by calculating the coefficients of variation. The dividing method using the median filter to estimate background illumination showed the lowest Coefficients of variations in the red component. The quotient and homomorphic filtering methods after the dividing method presented good results based on their low Coefficients of variations. The contrast limited adaptive histogram equalization increased the sensitivity of the vessel segmentation algorithm up to 5% in the same amount of accuracy. The contrast limited adaptive histogram equalization technique has a higher sensitivity than the polynomial transformation operator as a contrast enhancement technique for vessel segmentation. Three techniques including the dividing method using the median filter to estimate background, quotient based and homomorphic filtering were found as the effective illumination correction techniques based on a statistical evaluation. Applying the local contrast enhancement technique, such as CLAHE, for fundus images presented good potentials in enhancing the vasculature segmentation.

  11. Design and implementation of a preprocessing system for a sodium lidar

    NASA Technical Reports Server (NTRS)

    Voelz, D. G.; Sechrist, C. F., Jr.

    1983-01-01

    A preprocessing system, designed and constructed for use with the University of Illinois sodium lidar system, was developed to increase the altitude resolution and range of the lidar system and also to decrease the processing burden of the main lidar computer. The preprocessing system hardware and the software required to implement the system are described. Some preliminary results of an airborne sodium lidar experiment conducted with the preprocessing system installed in the sodium lidar are presented.

  12. Simple and Effective Way for Data Preprocessing Selection Based on Design of Experiments.

    PubMed

    Gerretzen, Jan; Szymańska, Ewa; Jansen, Jeroen J; Bart, Jacob; van Manen, Henk-Jan; van den Heuvel, Edwin R; Buydens, Lutgarde M C

    2015-12-15

    The selection of optimal preprocessing is among the main bottlenecks in chemometric data analysis. Preprocessing currently is a burden, since a multitude of different preprocessing methods is available for, e.g., baseline correction, smoothing, and alignment, but it is not clear beforehand which method(s) should be used for which data set. The process of preprocessing selection is often limited to trial-and-error and is therefore considered somewhat subjective. In this paper, we present a novel, simple, and effective approach for preprocessing selection. The defining feature of this approach is a design of experiments. On the basis of the design, model performance of a few well-chosen preprocessing methods, and combinations thereof (called strategies) is evaluated. Interpretation of the main effects and interactions subsequently enables the selection of an optimal preprocessing strategy. The presented approach is applied to eight different spectroscopic data sets, covering both calibration and classification challenges. We show that the approach is able to select a preprocessing strategy which improves model performance by at least 50% compared to the raw data; in most cases, it leads to a strategy very close to the true optimum. Our approach makes preprocessing selection fast, insightful, and objective.

  13. Simple and Effective Way for Data Preprocessing Selection Based on Design of Experiments.

    PubMed

    Gerretzen, Jan; Szymańska, Ewa; Jansen, Jeroen J; Bart, Jacob; van Manen, Henk-Jan; van den Heuvel, Edwin R; Buydens, Lutgarde M C

    2015-12-15

    The selection of optimal preprocessing is among the main bottlenecks in chemometric data analysis. Preprocessing currently is a burden, since a multitude of different preprocessing methods is available for, e.g., baseline correction, smoothing, and alignment, but it is not clear beforehand which method(s) should be used for which data set. The process of preprocessing selection is often limited to trial-and-error and is therefore considered somewhat subjective. In this paper, we present a novel, simple, and effective approach for preprocessing selection. The defining feature of this approach is a design of experiments. On the basis of the design, model performance of a few well-chosen preprocessing methods, and combinations thereof (called strategies) is evaluated. Interpretation of the main effects and interactions subsequently enables the selection of an optimal preprocessing strategy. The presented approach is applied to eight different spectroscopic data sets, covering both calibration and classification challenges. We show that the approach is able to select a preprocessing strategy which improves model performance by at least 50% compared to the raw data; in most cases, it leads to a strategy very close to the true optimum. Our approach makes preprocessing selection fast, insightful, and objective. PMID:26632985

  14. Ontology-Based Analysis of Microarray Data.

    PubMed

    Giuseppe, Agapito; Milano, Marianna

    2016-01-01

    The importance of semantic-based methods and algorithms for the analysis and management of biological data is growing for two main reasons. From a biological side, knowledge contained in ontologies is more and more accurate and complete, from a computational side, recent algorithms are using in a valuable way such knowledge. Here we focus on semantic-based management and analysis of protein interaction networks referring to all the approaches of analysis of protein-protein interaction data that uses knowledge encoded into biological ontologies. Semantic approaches for studying high-throughput data have been largely used in the past to mine genomic and expression data. Recently, the emergence of network approaches for investigating molecular machineries has stimulated in a parallel way the introduction of semantic-based techniques for analysis and management of network data. The application of these computational approaches to the study of microarray data can broad the application scenario of them and simultaneously can help the understanding of disease development and progress.

  15. Study on Construction of a Medical X-Ray Direct Digital Radiography System and Hybrid Preprocessing Methods

    PubMed Central

    Ren, Yong; Wu, Sheng; Wang, Mijian; Cen, Zhongjie

    2014-01-01

    We construct a medical X-ray direct digital radiography (DDR) system based on a CCD (charge-coupled devices) camera. For the original images captured from X-ray exposure, computer first executes image flat-field correction and image gamma correction, and then carries out image contrast enhancement. A hybrid image contrast enhancement algorithm which is based on sharp frequency localization-contourlet transform (SFL-CT) and contrast limited adaptive histogram equalization (CLAHE), is proposed and verified by the clinical DDR images. Experimental results show that, for the medical X-ray DDR images, the proposed comprehensive preprocessing algorithm can not only greatly enhance the contrast and detail information, but also improve the resolution capability of DDR system. PMID:25013452

  16. Microarray technology for use in molecular epidemiology.

    PubMed

    Vernon, Suzanne D; Whistler, Toni

    2007-01-01

    Microarrays are a powerful laboratory tool for the simultaneous assessment of the activity of thousands genes. Remarkable advances in biological sample collection, preparation and automation of hybridization have enabled the application of microarray technology to large, population-based studies. Now, microarrays have the potential to serve as screening tools for the detection of altered gene expression activity that might contribute to diseases in human populations. Reproducible and reliable microarray results depend on multiple factors. In this chapter, biological sample parameters are introduced that should be considered for any microarray experiment. Then, the microarray technology that we have successfully applied to limited biological sample from all our molecular epidemiology studies is detailed. This reproducible and reliable approach for using microarrays should be applicable to any biological questions asked.

  17. Microarray analysis in pulmonary hypertension.

    PubMed

    Hoffmann, Julia; Wilhelm, Jochen; Olschewski, Andrea; Kwapiszewska, Grazyna

    2016-07-01

    Microarrays are a powerful and effective tool that allows the detection of genome-wide gene expression differences between controls and disease conditions. They have been broadly applied to investigate the pathobiology of diverse forms of pulmonary hypertension, namely group 1, including patients with idiopathic pulmonary arterial hypertension, and group 3, including pulmonary hypertension associated with chronic lung diseases such as chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis. To date, numerous human microarray studies have been conducted to analyse global (lung homogenate samples), compartment-specific (laser capture microdissection), cell type-specific (isolated primary cells) and circulating cell (peripheral blood) expression profiles. Combined, they provide important information on development, progression and the end-stage disease. In the future, system biology approaches, expression of noncoding RNAs that regulate coding RNAs, and direct comparison between animal models and human disease might be of importance. PMID:27076594

  18. Phenotypic MicroRNA Microarrays

    PubMed Central

    Kwon, Yong-Jun; Heo, Jin Yeong; Kim, Hi Chul; Kim, Jin Yeop; Liuzzi, Michel; Soloveva, Veronica

    2013-01-01

    Microarray technology has become a very popular approach in cases where multiple experiments need to be conducted repeatedly or done with a variety of samples. In our lab, we are applying our high density spots microarray approach to microscopy visualization of the effects of transiently introduced siRNA or cDNA on cellular morphology or phenotype. In this publication, we are discussing the possibility of using this micro-scale high throughput process to study the role of microRNAs in the biology of selected cellular models. After reverse-transfection of microRNAs and siRNA, the cellular phenotype generated by microRNAs regulated NF-κB expression comparably to the siRNA. The ability to print microRNA molecules for reverse transfection into cells is opening up the wide horizon for the phenotypic high content screening of microRNA libraries using cellular disease models.

  19. Self-Assembling Protein Microarrays

    NASA Astrophysics Data System (ADS)

    Ramachandran, Niroshan; Hainsworth, Eugenie; Bhullar, Bhupinder; Eisenstein, Samuel; Rosen, Benjamin; Lau, Albert Y.; C. Walter, Johannes; LaBaer, Joshua

    2004-07-01

    Protein microarrays provide a powerful tool for the study of protein function. However, they are not widely used, in part because of the challenges in producing proteins to spot on the arrays. We generated protein microarrays by printing complementary DNAs onto glass slides and then translating target proteins with mammalian reticulocyte lysate. Epitope tags fused to the proteins allowed them to be immobilized in situ. This obviated the need to purify proteins, avoided protein stability problems during storage, and captured sufficient protein for functional studies. We used the technology to map pairwise interactions among 29 human DNA replication initiation proteins, recapitulate the regulation of Cdt1 binding to select replication proteins, and map its geminin-binding domain.

  20. Microarray analysis in pulmonary hypertension

    PubMed Central

    Hoffmann, Julia; Wilhelm, Jochen; Olschewski, Andrea

    2016-01-01

    Microarrays are a powerful and effective tool that allows the detection of genome-wide gene expression differences between controls and disease conditions. They have been broadly applied to investigate the pathobiology of diverse forms of pulmonary hypertension, namely group 1, including patients with idiopathic pulmonary arterial hypertension, and group 3, including pulmonary hypertension associated with chronic lung diseases such as chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis. To date, numerous human microarray studies have been conducted to analyse global (lung homogenate samples), compartment-specific (laser capture microdissection), cell type-specific (isolated primary cells) and circulating cell (peripheral blood) expression profiles. Combined, they provide important information on development, progression and the end-stage disease. In the future, system biology approaches, expression of noncoding RNAs that regulate coding RNAs, and direct comparison between animal models and human disease might be of importance. PMID:27076594

  1. Hyperspectral microarray scanning: impact on the accuracy and reliability of gene expression data

    PubMed Central

    Timlin, Jerilyn A; Haaland, David M; Sinclair, Michael B; Aragon, Anthony D; Martinez, M Juanita; Werner-Washburne, Margaret

    2005-01-01

    Background Commercial microarray scanners and software cannot distinguish between spectrally overlapping emission sources, and hence cannot accurately identify or correct for emissions not originating from the labeled cDNA. We employed our hyperspectral microarray scanner coupled with multivariate data analysis algorithms that independently identify and quantitate emissions from all sources to investigate three artifacts that reduce the accuracy and reliability of microarray data: skew toward the green channel, dye separation, and variable background emissions. Results Here we demonstrate that several common microarray artifacts resulted from the presence of emission sources other than the labeled cDNA that can dramatically alter the accuracy and reliability of the array data. The microarrays utilized in this study were representative of a wide cross-section of the microarrays currently employed in genomic research. These findings reinforce the need for careful attention to detail to recognize and subsequently eliminate or quantify the presence of extraneous emissions in microarray images. Conclusion Hyperspectral scanning together with multivariate analysis offers a unique and detailed understanding of the sources of microarray emissions after hybridization. This opportunity to simultaneously identify and quantitate contaminant and background emissions in microarrays markedly improves the reliability and accuracy of the data and permits a level of quality control of microarray emissions previously unachievable. Using these tools, we can not only quantify the extent and contribution of extraneous emission sources to the signal, but also determine the consequences of failing to account for them and gain the insight necessary to adjust preparation protocols to prevent such problems from occurring. PMID:15888208

  2. Radar image preprocessing. [of SEASAT-A SAR data

    NASA Technical Reports Server (NTRS)

    Frost, V. S.; Stiles, J. A.; Holtzman, J. C.; Held, D. N.

    1980-01-01

    Standard image processing techniques are not applicable to radar images because of the coherent nature of the sensor. Therefore there is a need to develop preprocessing techniques for radar images which will then allow these standard methods to be applied. A random field model for radar image data is developed. This model describes the image data as the result of a multiplicative-convolved process. Standard techniques, those based on additive noise and homomorphic processing are not directly applicable to this class of sensor data. Therefore, a minimum mean square error (MMSE) filter was designed to treat this class of sensor data. The resulting filter was implemented in an adaptive format to account for changes in local statistics and edges. A radar image processing technique which provides the MMSE estimate inside homogeneous areas and tends to preserve edge structure was the result of this study. Digitally correlated Seasat-A synthetic aperture radar (SAR) imagery was used to test the technique.

  3. Statistics in experimental design, preprocessing, and analysis of proteomics data.

    PubMed

    Jung, Klaus

    2011-01-01

    High-throughput experiments in proteomics, such as 2-dimensional gel electrophoresis (2-DE) and mass spectrometry (MS), yield usually high-dimensional data sets of expression values for hundreds or thousands of proteins which are, however, observed on only a relatively small number of biological samples. Statistical methods for the planning and analysis of experiments are important to avoid false conclusions and to receive tenable results. In this chapter, the most frequent experimental designs for proteomics experiments are illustrated. In particular, focus is put on studies for the detection of differentially regulated proteins. Furthermore, issues of sample size planning, statistical analysis of expression levels as well as methods for data preprocessing are covered.

  4. Data acquisition and preprocessing techniques for remote sensing field research

    NASA Technical Reports Server (NTRS)

    Biehl, L. L.; Robinson, B. F.

    1983-01-01

    A crops and soils data base has been developed at Purdue University's Laboratory for Applications of Remote Sensing using spectral and agronomic measurements made by several government and university researchers. The data are being used to (1) quantitatively determine the relationships of spectral and agronomic characteristics of crops and soils, (2) define future sensor systems, and (3) develop advanced data analysis techniques. Researchers follow defined data acquisition and preprocessing techniques to provide fully annotated and calibrated sets of spectral, agronomic, and meteorological data. These procedures enable the researcher to combine his data with that acquired by other researchers for remote sensing research. The key elements or requirements for developing a field research data base of spectral data that can be transported across sites and years are appropriate experiment design, accurate spectral data calibration, defined field procedures, and through experiment documentation.

  5. Microarrays, antiobesity and the liver

    PubMed Central

    Castro-Chávez, Fernando

    2013-01-01

    In this review, the microarray technology and especially oligonucleotide arrays are exemplified with a practical example taken from the perilipin−/− mice and using the dChip software, available for non-lucrative purposes. It was found that the liver of perilipin−/− mice was healthy and normal, even under high-fat diet when compared with the results published for the scd1−/− mice, which under high-fat diets had a darker liver, suggestive of hepatic steatosis. Scd1 is required for the biosynthesis of monounsaturated fatty acids and plays a key role in the hepatic synthesis of triglycerides and of very-low-density lipoproteins. Both models of obesity resistance share many similar phenotypic antiobesity features, however, the perilipin−/− mice had a significant downregulation of stearoyl CoA desaturases scd1 and scd2 in its white adipose tissue, but a normal level of both genes inside the liver, even under high-fat diet. Here, different microarray methodologies are discussed, and also some of the most recent discoveries and perspectives regarding the use of microarrays, with an emphasis on obesity gene expression, and a personal remark on my findings of increased expression for hemoglobin transcripts and other hemo related genes (hemo-like), and for leukocyte like (leuko-like) genes inside the white adipose tissue of the perilipin−/− mice. In conclusion, microarrays have much to offer in comparative studies such as those in antiobesity, and also they are methodologies adequate for new astounding molecular discoveries [free full text of this article PMID:15657555

  6. Preprocessing of Satellite Data for Urban Object Extraction

    NASA Astrophysics Data System (ADS)

    Krauß, T.

    2015-03-01

    Very high resolution (VHR) DSMs (digital surface models) derived from stereo- or multi-stereo images from current VHR satellites like WorldView-2 or Pléiades can be produced up to the ground sampling distance (GSD) of the sensors in the range of 50 cm to 1 m. From such DSMs the digital terrain model (DTM) representing the ground and also a so called nDEM (normalized digital elevation model) describing the height of objects above the ground can be derived. In parallel these sensors deliver multispectral imagery which can be used for a spectral classification of the imagery. Fusion of the multispectral classification and the nDEM allows a simple classification and detection of urban objects. In further processing steps these detected urban objects can be modeled and exported in a suitable description language like CityGML. In this work we present the pre-processing steps up to the classification and detection of the urban objects. The modeling is not part of this work. The pre-processing steps described here cover briefly the coregistration of the input images and the generation of the DSM. In more detail the improvement of the DSM, the extraction of the DTM and nDEM, the multispectral classification and the object detection and extraction are explained. The methods described are applied to two test regions from two satellites: First the center of Munich acquired by WorldView-2 and second the center of Melbourne acquired by Pĺeiades. From both acquisitions a stereo-pair from the panchromatic bands is used for creation of the DSM and the pan-sharpened multispectral images are used for spectral classification. Finally the quality of the detected urban objects is discussed.

  7. Lectin microarrays for glycomic analysis.

    PubMed

    Gupta, Garima; Surolia, Avadhesha; Sampathkumar, Srinivasa-Gopalan

    2010-08-01

    Glycomics is the study of comprehensive structural elucidation and characterization of all glycoforms found in nature and their dynamic spatiotemporal changes that are associated with biological processes. Glycocalyx of mammalian cells actively participate in cell-cell, cell-matrix, and cell-pathogen interactions, which impact embryogenesis, growth and development, homeostasis, infection and immunity, signaling, malignancy, and metabolic disorders. Relative to genomics and proteomics, glycomics is just growing out of infancy with great potential in biomedicine for biomarker discovery, diagnosis, and treatment. However, the immense diversity and complexity of glycan structures and their multiple modes of interactions with proteins pose great challenges for development of analytical tools for delineating structure function relationships and understanding glyco-code. Several tools are being developed for glycan profiling based on chromatography, mass spectrometry, glycan microarrays, and glyco-informatics. Lectins, which have long been used in glyco-immunology, printed on a microarray provide a versatile platform for rapid high throughput analysis of glycoforms of biological samples. Herein, we summarize technological advances in lectin microarrays and critically review their impact on glycomics analysis. Challenges remain in terms of expansion to include nonplant derived lectins, standardization for routine clinical use, development of recombinant lectins, and exploration of plant kingdom for discovery of novel lectins. PMID:20726799

  8. Lectin microarrays for glycomic analysis.

    PubMed

    Gupta, Garima; Surolia, Avadhesha; Sampathkumar, Srinivasa-Gopalan

    2010-08-01

    Glycomics is the study of comprehensive structural elucidation and characterization of all glycoforms found in nature and their dynamic spatiotemporal changes that are associated with biological processes. Glycocalyx of mammalian cells actively participate in cell-cell, cell-matrix, and cell-pathogen interactions, which impact embryogenesis, growth and development, homeostasis, infection and immunity, signaling, malignancy, and metabolic disorders. Relative to genomics and proteomics, glycomics is just growing out of infancy with great potential in biomedicine for biomarker discovery, diagnosis, and treatment. However, the immense diversity and complexity of glycan structures and their multiple modes of interactions with proteins pose great challenges for development of analytical tools for delineating structure function relationships and understanding glyco-code. Several tools are being developed for glycan profiling based on chromatography, mass spectrometry, glycan microarrays, and glyco-informatics. Lectins, which have long been used in glyco-immunology, printed on a microarray provide a versatile platform for rapid high throughput analysis of glycoforms of biological samples. Herein, we summarize technological advances in lectin microarrays and critically review their impact on glycomics analysis. Challenges remain in terms of expansion to include nonplant derived lectins, standardization for routine clinical use, development of recombinant lectins, and exploration of plant kingdom for discovery of novel lectins.

  9. New supervised alignment method as a preprocessing tool for chromatographic data in metabolomic studies.

    PubMed

    Struck, Wiktoria; Wiczling, Paweł; Waszczuk-Jankowska, Małgorzata; Kaliszan, Roman; Markuszewski, Michał Jan

    2012-09-21

    The purpose of this work was to develop a new aligning algorithm called supervised alignment and to compare its performance with the correlation optimized warping. The supervised alignment is based on a "supervised" selection of a few common peaks presented on each chromatogram. The selected peaks are aligned based on a difference in the retention time of the selected analytes in the sample and the reference chromatogram. The retention times of the fragments between known peaks are subsequently linearly interpolated. The performance of the proposed algorithm has been tested on a series of simulated and experimental chromatograms. The simulated chromatograms comprised analytes with a systematic or random retention time shifts. The experimental chromatographic (RP-HPLC) data have been obtained during the analysis of nucleosides from 208 urine samples and consists of both the systematic and random displacements. All the data sets have been aligned using the correlation optimized warping and the supervised alignment. The time required to complete the alignment, the overall complexity of both algorithms, and its performance measured by the average correlation coefficients are compared to assess performance of tested methods. In the case of systematic shifts, both methods lead to the successful alignment. However, for random shifts, the correlation optimized warping in comparison to the supervised alignment requires more time (few hours versus few minutes) and the quality of the alignment described as correlation coefficient of the newly aligned matrix is worse 0.8593 versus 0.9629. For the experimental dataset supervised alignment successfully aligns 208 samples using 10 prior identified peaks. The knowledge about retention times of few analytes' in the data sets is necessary to perform the supervised alignment for both systematic and random shifts. The supervised alignment method is faster, more effective and simpler preprocessing method than the correlation optimized

  10. Ensemble preprocessing of near-infrared (NIR) spectra for multivariate calibration.

    PubMed

    Xu, Lu; Zhou, Yan-Ping; Tang, Li-Juan; Wu, Hai-Long; Jiang, Jian-Hui; Shen, Guo-Li; Yu, Ru-Qin

    2008-06-01

    Preprocessing of raw near-infrared (NIR) spectral data is indispensable in multivariate calibration when the measured spectra are subject to significant noises, baselines and other undesirable factors. However, due to the lack of sufficient prior information and an incomplete knowledge of the raw data, NIR spectra preprocessing in multivariate calibration is still trial and error. How to select a proper method depends largely on both the nature of the data and the expertise and experience of the practitioners. This might limit the applications of multivariate calibration in many fields, where researchers are not very familiar with the characteristics of many preprocessing methods unique in chemometrics and have difficulties to select the most suitable methods. Another problem is many preprocessing methods, when used alone, might degrade the data in certain aspects or lose some useful information while improving certain qualities of the data. In order to tackle these problems, this paper proposes a new concept of data preprocessing, ensemble preprocessing method, where partial least squares (PLSs) models built on differently preprocessed data are combined by Monte Carlo cross validation (MCCV) stacked regression. Little or no prior information of the data and expertise are required. Moreover, fusion of complementary information obtained by different preprocessing methods often leads to a more stable and accurate calibration model. The investigation of two real data sets has demonstrated the advantages of the proposed method.

  11. Workflows for microarray data processing in the Kepler environment

    PubMed Central

    2012-01-01

    Background Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. Results We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or

  12. Automatic selection of preprocessing methods for improving predictions on mass spectrometry protein profiles.

    PubMed

    Pelikan, Richard C; Hauskrecht, Milos

    2010-11-13

    Mass spectrometry proteomic profiling has potential to be a useful clinical screening tool. One obstacle is providing a standardized method for preprocessing the noisy raw data. We have developed a system for automatically determining a set of preprocessing methods among several candidates. Our system's automated nature relieves the analyst of the need to be knowledgeable about which methods to use on any given dataset. Each stage of preprocessing is approached with many competing methods. We introduce metrics which are used to balance each method's attempts to correct noise versus preserving valuable discriminative information. We demonstrate the benefit of our preprocessing system on several SELDI and MALDI mass spectrometry datasets. Downstream classification is improved when using our system to preprocess the data.

  13. Algorithm Reveals Sinusoidal Component Of Noisy Signal

    NASA Technical Reports Server (NTRS)

    Kwok, Lloyd C.

    1991-01-01

    Algorithm performs simple statistical analysis of noisy signal to yield preliminary indication of whether or not signal contains sinusoidal component. Suitable for preprocessing or preliminary analysis of vibrations, fluctuations in pressure, and other signals that include large random components. Implemented on personal computer by easy-to-use program.

  14. Integrated Amplification Microarrays for Infectious Disease Diagnostics

    PubMed Central

    Chandler, Darrell P.; Bryant, Lexi; Griesemer, Sara B.; Gu, Rui; Knickerbocker, Christopher; Kukhtin, Alexander; Parker, Jennifer; Zimmerman, Cynthia; George, Kirsten St.; Cooney, Christopher G.

    2012-01-01

    This overview describes microarray-based tests that combine solution-phase amplification chemistry and microarray hybridization within a single microfluidic chamber. The integrated biochemical approach improves microarray workflow for diagnostic applications by reducing the number of steps and minimizing the potential for sample or amplicon cross-contamination. Examples described herein illustrate a basic, integrated approach for DNA and RNA genomes, and a simple consumable architecture for incorporating wash steps while retaining an entirely closed system. It is anticipated that integrated microarray biochemistry will provide an opportunity to significantly reduce the complexity and cost of microarray consumables, equipment, and workflow, which in turn will enable a broader spectrum of users to exploit the intrinsic multiplexing power of microarrays for infectious disease diagnostics.

  15. Classifying temporal microarray data by selecting informative genes.

    PubMed

    Lou, Qiang; Obradovic, Zoran

    2013-06-01

    In order to more accurately predict an individual's health status, in clinical applications it is often important to perform analysis of high-dimensional gene expression data that varies with time. A major challenge in predicting from such temporal microarray data is that the number of biomarkers used as features is typically much larger than the number of labeled subjects. One way to address this challenge is to perform feature selection as a preprocessing step and then apply a classification method on selected features. However, traditional feature selection methods cannot handle multivariate temporal data without applying techniques that flatten temporal data into a single matrix in advance. In this study, a feature selection filter that can directly select informative features from temporal gene expression data is proposed. In our approach, we measure the distance between multivariate temporal data from two subjects. Based on this distance, we define the objective function of temporal margin based feature selection to maximize each subject's temporal margin in its own relevant subspace. The experimental results on synthetic and two real flu data sets provide evidence that our method outperforms the alternatives, which flatten the temporal data in advance.

  16. Image preprocessing method for particle image velocimetry (PIV) image interrogation near a fluid-solid surface

    NASA Astrophysics Data System (ADS)

    Zhu, Yiding; Jia, Lichao; Bai, Ye; Yuan, Huijing; Lee, Cunbiao

    2014-11-01

    Accurate particle image velocimetry (PIV) measurements near the moving wall are a great challenge. The problem is compounded by the very large in-plane displacement on PIV images commonly encountered in measurements of the high speed flow. An improved image preprocessing method is presented in this paper. A wall detection technique is used first to qualify the wall position and the movement of the solid body. Virtual particle images are imposed in the solid region, of which the displacements are evaluated by the body movement. The estimation near the wall is then smoothed by data from both sides of the shear layer to reduce the large random uncertainties. Interrogations in the following iterative steps then converge to the correct results to provide accurate predictions for particle tracking velocimetries (PTV). Significant improvement is seen in Monte Carlo simulations and experimental tests such as measurements near a flapping flag or compressor plates. The algorithm also successfully extracted the small flow structures of the 2nd mode wave in the hypersonic boundary layer from PIV images with low signal-noise-ratios(SNR) when the traditional method was not successful.

  17. Classifying human voices by using hybrid SFX time-series preprocessing and ensemble feature selection.

    PubMed

    Fong, Simon; Lan, Kun; Wong, Raymond

    2013-01-01

    Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC.

  18. Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection

    PubMed Central

    Wong, Raymond

    2013-01-01

    Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. PMID:24288684

  19. Multisensor data fusion algorithm development

    SciTech Connect

    Yocky, D.A.; Chadwick, M.D.; Goudy, S.P.; Johnson, D.K.

    1995-12-01

    This report presents a two-year LDRD research effort into multisensor data fusion. We approached the problem by addressing the available types of data, preprocessing that data, and developing fusion algorithms using that data. The report reflects these three distinct areas. First, the possible data sets for fusion are identified. Second, automated registration techniques for imagery data are analyzed. Third, two fusion techniques are presented. The first fusion algorithm is based on the two-dimensional discrete wavelet transform. Using test images, the wavelet algorithm is compared against intensity modulation and intensity-hue-saturation image fusion algorithms that are available in commercial software. The wavelet approach outperforms the other two fusion techniques by preserving spectral/spatial information more precisely. The wavelet fusion algorithm was also applied to Landsat Thematic Mapper and SPOT panchromatic imagery data. The second algorithm is based on a linear-regression technique. We analyzed the technique using the same Landsat and SPOT data.

  20. Surface free energy and microarray deposition technology.

    PubMed

    McHale, Glen

    2007-03-01

    Microarray techniques use a combinatorial approach to assess complex biochemical interactions. The fundamental goal is simultaneous, large-scale experimentation analogous to the automation achieved in the semiconductor industry. However, microarray deposition inherently involves liquids contacting solid substrates. Liquid droplet shapes are determined by surface and interfacial tension forces, and flows during drying. This article looks at how surface free energy and wetting considerations may influence the accuracy and reliability of spotted microarray experiments.

  1. Localization of spatially distributed brain sources after a tensor-based preprocessing of interictal epileptic EEG data.

    PubMed

    Albera, L; Becker, H; Karfoul, A; Gribonval, R; Kachenoura, A; Bensaid, S; Senhadji, L; Hernandez, A; Merlet, I

    2015-01-01

    This paper addresses the localization of spatially distributed sources from interictal epileptic electroencephalographic data after a tensor-based preprocessing. Justifying the Canonical Polyadic (CP) model of the space-time-frequency and space-time-wave-vector tensors is not an easy task when two or more extended sources have to be localized. On the other hand, the occurrence of several amplitude modulated spikes originating from the same epileptic region can be used to build a space-time-spike tensor from the EEG data. While the CP model of this tensor appears more justified, the exact computation of its loading matrices can be limited by the presence of highly correlated sources or/and a strong background noise. An efficient extended source localization scheme after the tensor-based preprocessing has then to be set up. Different strategies are thus investigated and compared on realistic simulated data: the "disk algorithm" using a precomputed dictionary of circular patches, a standardized Tikhonov regularization and a fused LASSO scheme.

  2. Automatic design of decision-tree algorithms with evolutionary algorithms.

    PubMed

    Barros, Rodrigo C; Basgalupp, Márcio P; de Carvalho, André C P L F; Freitas, Alex A

    2013-01-01

    This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability to provide an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated over more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI datasets and 10 microarray gene expression datasets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART.

  3. CNV-ROC: A cost effective, computer-aided analytical performance evaluator of chromosomal microarrays.

    PubMed

    Goodman, Corey W; Major, Heather J; Walls, William D; Sheffield, Val C; Casavant, Thomas L; Darbro, Benjamin W

    2015-04-01

    Chromosomal microarrays (CMAs) are routinely used in both research and clinical laboratories; yet, little attention has been given to the estimation of genome-wide true and false negatives during the assessment of these assays and how such information could be used to calibrate various algorithmic metrics to improve performance. Low-throughput, locus-specific methods such as fluorescence in situ hybridization (FISH), quantitative PCR (qPCR), or multiplex ligation-dependent probe amplification (MLPA) preclude rigorous calibration of various metrics used by copy number variant (CNV) detection algorithms. To aid this task, we have established a comparative methodology, CNV-ROC, which is capable of performing a high throughput, low cost, analysis of CMAs that takes into consideration genome-wide true and false negatives. CNV-ROC uses a higher resolution microarray to confirm calls from a lower resolution microarray and provides for a true measure of genome-wide performance metrics at the resolution offered by microarray testing. CNV-ROC also provides for a very precise comparison of CNV calls between two microarray platforms without the need to establish an arbitrary degree of overlap. Comparison of CNVs across microarrays is done on a per-probe basis and receiver operator characteristic (ROC) analysis is used to calibrate algorithmic metrics, such as log2 ratio threshold, to enhance CNV calling performance. CNV-ROC addresses a critical and consistently overlooked aspect of analytical assessments of genome-wide techniques like CMAs which is the measurement and use of genome-wide true and false negative data for the calculation of performance metrics and comparison of CNV profiles between different microarray experiments.

  4. CNV-ROC: A cost effective, computer-aided analytical performance evaluator of chromosomal microarrays

    PubMed Central

    Goodman, Corey W.; Major, Heather J.; Walls, William D.; Sheffield, Val C.; Casavant, Thomas L.; Darbro, Benjamin W.

    2016-01-01

    Chromosomal microarrays (CMAs) are routinely used in both research and clinical laboratories; yet, little attention has been given to the estimation of genome-wide true and false negatives during the assessment of these assays and how such information could be used to calibrate various algorithmic metrics to improve performance. Low-throughput, locus-specific methods such as fluorescence in situ hybridization (FISH), quantitative PCR (qPCR), or multiplex ligation-dependent probe amplification (MLPA) preclude rigorous calibration of various metrics used by copy number variant (CNV) detection algorithms. To aid this task, we have established a comparative methodology, CNV-ROC, which is capable of performing a high throughput, low cost, analysis of CMAs that takes into consideration genome-wide true and false negatives. CNV-ROC uses a higher resolution microarray to confirm calls from a lower resolution microarray and provides for a true measure of genome-wide performance metrics at the resolution offered by microarray testing. CNV-ROC also provides for a very precise comparison of CNV calls between two microarray platforms without the need to establish an arbitrary degree of overlap. Comparison of CNVs across microarrays is done on a per-probe basis and receiver operator characteristic (ROC) analysis is used to calibrate algorithmic metrics, such as log2 ratio threshold, to enhance CNV calling performance. CNV-ROC addresses a critical and consistently overlooked aspect of analytical assessments of genome-wide techniques like CMAs which is the measurement and use of genome-wide true and false negative data for the calculation of performance metrics and comparison of CNV profiles between different microarray experiments. PMID:25595567

  5. THE ABRF MARG MICROARRAY SURVEY 2005: TAKING THE PULSE ON THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years microarray technology has evolved into a critical component of any discovery based program. Since 1999, the Association of Biomolecular Resource Facilities (ABRF) Microarray Research Group (MARG) has conducted biennial surveys designed to generate a pr...

  6. Living Cell Microarrays: An Overview of Concepts.

    PubMed

    Jonczyk, Rebecca; Kurth, Tracy; Lavrentieva, Antonina; Walter, Johanna-Gabriela; Scheper, Thomas; Stahl, Frank

    2016-01-01

    Living cell microarrays are a highly efficient cellular screening system. Due to the low number of cells required per spot, cell microarrays enable the use of primary and stem cells and provide resolution close to the single-cell level. Apart from a variety of conventional static designs, microfluidic microarray systems have also been established. An alternative format is a microarray consisting of three-dimensional cell constructs ranging from cell spheroids to cells encapsulated in hydrogel. These systems provide an in vivo-like microenvironment and are preferably used for the investigation of cellular physiology, cytotoxicity, and drug screening. Thus, many different high-tech microarray platforms are currently available. Disadvantages of many systems include their high cost, the requirement of specialized equipment for their manufacture, and the poor comparability of results between different platforms. In this article, we provide an overview of static, microfluidic, and 3D cell microarrays. In addition, we describe a simple method for the printing of living cell microarrays on modified microscope glass slides using standard DNA microarray equipment available in most laboratories. Applications in research and diagnostics are discussed, e.g., the selective and sensitive detection of biomarkers. Finally, we highlight current limitations and the future prospects of living cell microarrays. PMID:27600077

  7. Living Cell Microarrays: An Overview of Concepts

    PubMed Central

    Jonczyk, Rebecca; Kurth, Tracy; Lavrentieva, Antonina; Walter, Johanna-Gabriela; Scheper, Thomas; Stahl, Frank

    2016-01-01

    Living cell microarrays are a highly efficient cellular screening system. Due to the low number of cells required per spot, cell microarrays enable the use of primary and stem cells and provide resolution close to the single-cell level. Apart from a variety of conventional static designs, microfluidic microarray systems have also been established. An alternative format is a microarray consisting of three-dimensional cell constructs ranging from cell spheroids to cells encapsulated in hydrogel. These systems provide an in vivo-like microenvironment and are preferably used for the investigation of cellular physiology, cytotoxicity, and drug screening. Thus, many different high-tech microarray platforms are currently available. Disadvantages of many systems include their high cost, the requirement of specialized equipment for their manufacture, and the poor comparability of results between different platforms. In this article, we provide an overview of static, microfluidic, and 3D cell microarrays. In addition, we describe a simple method for the printing of living cell microarrays on modified microscope glass slides using standard DNA microarray equipment available in most laboratories. Applications in research and diagnostics are discussed, e.g., the selective and sensitive detection of biomarkers. Finally, we highlight current limitations and the future prospects of living cell microarrays. PMID:27600077

  8. Living Cell Microarrays: An Overview of Concepts

    PubMed Central

    Jonczyk, Rebecca; Kurth, Tracy; Lavrentieva, Antonina; Walter, Johanna-Gabriela; Scheper, Thomas; Stahl, Frank

    2016-01-01

    Living cell microarrays are a highly efficient cellular screening system. Due to the low number of cells required per spot, cell microarrays enable the use of primary and stem cells and provide resolution close to the single-cell level. Apart from a variety of conventional static designs, microfluidic microarray systems have also been established. An alternative format is a microarray consisting of three-dimensional cell constructs ranging from cell spheroids to cells encapsulated in hydrogel. These systems provide an in vivo-like microenvironment and are preferably used for the investigation of cellular physiology, cytotoxicity, and drug screening. Thus, many different high-tech microarray platforms are currently available. Disadvantages of many systems include their high cost, the requirement of specialized equipment for their manufacture, and the poor comparability of results between different platforms. In this article, we provide an overview of static, microfluidic, and 3D cell microarrays. In addition, we describe a simple method for the printing of living cell microarrays on modified microscope glass slides using standard DNA microarray equipment available in most laboratories. Applications in research and diagnostics are discussed, e.g., the selective and sensitive detection of biomarkers. Finally, we highlight current limitations and the future prospects of living cell microarrays.

  9. Highly parallel microbial diagnostics using oligonucleotide microarrays.

    PubMed

    Loy, Alexander; Bodrossy, Levente

    2006-01-01

    Oligonucleotide microarrays are highly parallel hybridization platforms, allowing rapid and simultaneous identification of many different microorganisms and viruses in a single assay. In the past few years, researchers have been confronted with a dramatic increase in the number of studies reporting development and/or improvement of oligonucleotide microarrays for microbial diagnostics, but use of the technology in routine diagnostics is still constrained by a variety of factors. Careful development of microarray essentials (such as oligonucleotide probes, protocols for target preparation and hybridization, etc.) combined with extensive performance testing are thus mandatory requirements for the maturation of diagnostic microarrays from fancy technological gimmicks to robust and routinely applicable tools.

  10. AMDA 2.13: A major update for automated cross-platform microarray data analysis.

    PubMed

    Kapetis, Dimos; Clarelli, Ferdinando; Vitulli, Federico; de Rosbo, Nicole Kerlero; Beretta, Ottavio; Foti, Maria; Ricciardi-Castagnoli, Paola; Zolezzi, Francesca

    2012-07-01

    Microarray platforms require analytical pipelines with modules for data pre-processing including data normalization, statistical analysis for identification of differentially expressed genes, cluster analysis, and functional annotation. We previously developed the Automated Microarray Data Analysis (AMDA, version 2.3.5) pipeline to process Affymetrix 3' IVT GeneChips. The availability of newer technologies that demand open-source tools for microarray data analysis has impelled us to develop an updated multi-platform version, AMDA 2.13. It includes additional quality control metrics, annotation-driven (annotation grade of Affymetrix NetAffx) and signal-driven (Inter-Quartile Range) gene filtering, and approaches to experimental design. To enhance understanding of biological data, differentially expressed genes have been mapped into KEGG pathways. Finally, a more stable and user-friendly interface was designed to integrate the requirements for different platforms. AMDA 2.13 allows the analysis of Affymetrix (cartridges and plates) and whole transcript probe design (Gene 1.0/1.1 ST and Exon 1.0 ST GeneChips), Illumina Bead Arrays, and one-channel Agilent 4×44 arrays. Relative to early versions, it supports various experimental designs and delivers more insightful biological understanding and up-to-date annotations.

  11. Breast image pre-processing for mammographic tissue segmentation.

    PubMed

    He, Wenda; Hogg, Peter; Juette, Arne; Denton, Erika R E; Zwiggelaar, Reyer

    2015-12-01

    During mammographic image acquisition, a compression paddle is used to even the breast thickness in order to obtain optimal image quality. Clinical observation has indicated that some mammograms may exhibit abrupt intensity change and low visibility of tissue structures in the breast peripheral areas. Such appearance discrepancies can affect image interpretation and may not be desirable for computer aided mammography, leading to incorrect diagnosis and/or detection which can have a negative impact on sensitivity and specificity of screening mammography. This paper describes a novel mammographic image pre-processing method to improve image quality for analysis. An image selection process is incorporated to better target problematic images. The processed images show improved mammographic appearances not only in the breast periphery but also across the mammograms. Mammographic segmentation and risk/density classification were performed to facilitate a quantitative and qualitative evaluation. When using the processed images, the results indicated more anatomically correct segmentation in tissue specific areas, and subsequently better classification accuracies were achieved. Visual assessments were conducted in a clinical environment to determine the quality of the processed images and the resultant segmentation. The developed method has shown promising results. It is expected to be useful in early breast cancer detection, risk-stratified screening, and aiding radiologists in the process of decision making prior to surgery and/or treatment.

  12. Software for Preprocessing Data From Rocket-Engine Tests

    NASA Technical Reports Server (NTRS)

    Cheng, Chiu-Fu

    2002-01-01

    Three computer programs have been written to preprocess digitized outputs of sensors during rocket-engine tests at Stennis Space Center (SSC). The programs apply exclusively to the SSC "E" test-stand complex and utilize the SSC file format. The programs are the following: 1) Engineering Units Generator (EUGEN) converts sensor-output-measurement data to engineering units. The inputs to EUGEN are raw binary test-data files, which include the voltage data, a list identifying the data channels, and time codes. EUGEN effects conversion by use of a file that contains calibration coefficients for each channel; 2) QUICKLOOK enables immediate viewing of a few selected channels of data, in contradistinction to viewing only after post test processing (which can take 30 minutes to several hours depending on the number of channels and other test parameters) of data from all channels. QUICKLOOK converts the selected data into a form in which they can be plotted in engineering units by use of Winplot (a free graphing program written by Rick Paris); and 3) EUPLOT provides a quick means for looking at data files generated by EUGEN without the necessity of relying on the PVWAVE based plotting software.

  13. Software for Preprocessing Data from Rocket-Engine Tests

    NASA Technical Reports Server (NTRS)

    Cheng, Chiu-Fu

    2004-01-01

    Three computer programs have been written to preprocess digitized outputs of sensors during rocket-engine tests at Stennis Space Center (SSC). The programs apply exclusively to the SSC E test-stand complex and utilize the SSC file format. The programs are the following: Engineering Units Generator (EUGEN) converts sensor-output-measurement data to engineering units. The inputs to EUGEN are raw binary test-data files, which include the voltage data, a list identifying the data channels, and time codes. EUGEN effects conversion by use of a file that contains calibration coefficients for each channel. QUICKLOOK enables immediate viewing of a few selected channels of data, in contradistinction to viewing only after post-test processing (which can take 30 minutes to several hours depending on the number of channels and other test parameters) of data from all channels. QUICKLOOK converts the selected data into a form in which they can be plotted in engineering units by use of Winplot (a free graphing program written by Rick Paris). EUPLOT provides a quick means for looking at data files generated by EUGEN without the necessity of relying on the PV-WAVE based plotting software.

  14. Software for Preprocessing Data From Rocket-Engine Tests

    NASA Technical Reports Server (NTRS)

    Cheng, Chiu-Fu

    2003-01-01

    Three computer programs have been written to preprocess digitized outputs of sensors during rocket-engine tests at Stennis Space Center (SSC). The programs apply exclusively to the SSC E test-stand complex and utilize the SSC file format. The programs are the following: (1) Engineering Units Generator (EUGEN) converts sensor-output-measurement data to engineering units. The inputs to EUGEN are raw binary test-data files, which include the voltage data, a list identifying the data channels, and time codes. EUGEN effects conversion by use of a file that contains calibration coefficients for each channel. (2) QUICKLOOK enables immediate viewing of a few selected channels of data, in contradistinction to viewing only after post-test processing (which can take 30 minutes to several hours depending on the number of channels and other test parameters) of data from all channels. QUICKLOOK converts the selected data into a form in which they can be plotted in engineering units by use of Winplot. (3) EUPLOT provides a quick means for looking at data files generated by EUGEN without the necessity of relying on the PVWAVE based plotting software.

  15. Multimodal image fusion with SIMS: Preprocessing with image registration.

    PubMed

    Tarolli, Jay Gage; Bloom, Anna; Winograd, Nicholas

    2016-06-14

    In order to utilize complementary imaging techniques to supply higher resolution data for fusion with secondary ion mass spectrometry (SIMS) chemical images, there are a number of aspects that, if not given proper consideration, could produce results which are easy to misinterpret. One of the most critical aspects is that the two input images must be of the same exact analysis area. With the desire to explore new higher resolution data sources that exists outside of the mass spectrometer, this requirement becomes even more important. To ensure that two input images are of the same region, an implementation of the insight segmentation and registration toolkit (ITK) was developed to act as a preprocessing step before performing image fusion. This implementation of ITK allows for several degrees of movement between two input images to be accounted for, including translation, rotation, and scale transforms. First, the implementation was confirmed to accurately register two multimodal images by supplying a known transform. Once validated, two model systems, a copper mesh grid and a group of RAW 264.7 cells, were used to demonstrate the use of the ITK implementation to register a SIMS image with a microscopy image for the purpose of performing image fusion.

  16. Macular Preprocessing of Linear Acceleratory Stimuli: Implications for the Clinic

    NASA Technical Reports Server (NTRS)

    Ross, M. D.; Hargens, Alan R. (Technical Monitor)

    1996-01-01

    Three-dimensional reconstructions of innervation patterns in rat maculae were carried out using serial section images sent to a Silicon Graphics workstation from a transmission electron microscope. Contours were extracted from mosaicked sections, then registered and visualized using Biocomputation Center software. Purposes were to determine innervation patterns of type II cells and areas encompassed by vestibular afferent receptive fields. Terminals on type II cells typically are elongated and compartmentalized into parts varying in vesicular content; reciprocal and serial synapses are common. The terminals originate as processes of nearby calyces or from nerve fibers passing to calyces outside the immediate vicinity. Thus, receptive fields of the afferents overlap in unique ways. Multiple processes are frequent; from 4 to 6 afferents supply 12-16 terminals on a type II cell. Processes commonly communicate with two type II cells. The morphology indicates that extensive preprocessing of linear acceleratory stimuli occurs peripherally, as is true also of visual and olfactory systems. Clinically, this means that loss of individual nerve fibers may not be noticed behaviorally, due to redundancy (receptive field overlap). However, peripheral processing implies the presence of neuroactive agents whose loss can acutely or chronically alter normal peripheral function and cause balance disorders. (Platform presentation preferred - Theme 11)

  17. Estimating Gene Signals From Noisy Microarray Images

    PubMed Central

    Sarder, Pinaki; Davis, Paul H.; Stanley, Samuel L.

    2016-01-01

    In oligonucleotide microarray experiments, noise is a challenging problem, as biologists now are studying their organisms not in isolation but in the context of a natural environment. In low photomultiplier tube (PMT) voltage images, weak gene signals and their interactions with the background fluorescence noise are most problematic. In addition, nonspecific sequences bind to array spots intermittently causing inaccurate measurements. Conventional techniques cannot precisely separate the foreground and the background signals. In this paper, we propose analytically based estimation technique. We assume a priori spot-shape information using a circular outer periphery with an elliptical center hole. We assume Gaussian statistics for modeling both the foreground and background signals. The mean of the foreground signal quantifies the weak gene signal corresponding to the spot, and the variance gives the measure of the undesired binding that causes fluctuation in the measurement. We propose a foreground-signal and shape-estimation algorithm using the Gibbs sampling method. We compare our developed algorithm with the existing Mann–Whitney (MW)- and expectation maximization (EM)/iterated conditional modes (ICM)-based methods. Our method outperforms the existing methods with considerably smaller mean-square error (MSE) for all signal-to-noise ratios (SNRs) in computer-generated images and gives better qualitative results in low-SNR real-data images. Our method is computationally relatively slow because of its inherent sampling operation and hence only applicable to very noisy-spot images. In a realistic example using our method, we show that the gene-signal fluctuations on the estimated foreground are better observed for the input noisy images with relatively higher undesired bindings. PMID:18556262

  18. 2008 Microarray Research Group (MARG Survey): Sensing the State of Microarray Technology

    EPA Science Inventory

    Over the past several years, the field of microarrays has grown and evolved drastically. In its continued efforts to track this evolution and transformation, the ABRF-MARG has once again conducted a survey of international microarray facilities and individual microarray users. Th...

  19. THE ABRF-MARG MICROARRAY SURVEY 2004: TAKING THE PULSE OF THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years, the field of microarrays has grown and evolved drastically. In its continued efforts to track this evolution, the ABRF-MARG has once again conducted a survey of international microarray facilities and individual microarray users. The goal of the surve...

  20. Data screening and preprocessing for Landsat MSS data

    NASA Technical Reports Server (NTRS)

    Lambeck, P. F.; Kauth, R.; Thomas, G. S.

    1978-01-01

    Two computer algorithms are presented. The first, called SCREEN, is used to automatically identify pixels representing clouds, cloud shadows, snow, water, or anomalous signals in Landsat-2 data. The second, called XSTAR, compensates Landsat-2 data for the effects of atmospheric haze, without requiring ground measurements or ground references. The presentation of these algorithms includes their theoretical background, algebraic details, and performance characteristics. Verification of the algorithms has for the present been limited to Landsat agricultural data. Plans for further development of the XSTAR technique are also presented.

  1. MAGMA: analysis of two-channel microarrays made easy.

    PubMed

    Rehrauer, Hubert; Zoller, Stefan; Schlapbach, Ralph

    2007-07-01

    The web application MAGMA provides a simple and intuitive interface to identify differentially expressed genes from two-channel microarray data. While the underlying algorithms are not superior to those of similar web applications, MAGMA is particularly user friendly and can be used without prior training. The user interface guides the novice user through the most typical microarray analysis workflow consisting of data upload, annotation, normalization and statistical analysis. It automatically generates R-scripts that document MAGMA's entire data processing steps, thereby allowing the user to regenerate all results in his local R installation. The implementation of MAGMA follows the model-view-controller design pattern that strictly separates the R-based statistical data processing, the web-representation and the application logic. This modular design makes the application flexible and easily extendible by experts in one of the fields: statistical microarray analysis, web design or software development. State-of-the-art Java Server Faces technology was used to generate the web interface and to perform user input processing. MAGMA's object-oriented modular framework makes it easily extendible and applicable to other fields and demonstrates that modern Java technology is also suitable for rather small and concise academic projects. MAGMA is freely available at www.magma-fgcz.uzh.ch. PMID:17517778

  2. Coupled two-way clustering analysis of gene microarray data

    NASA Astrophysics Data System (ADS)

    Getz, Gad; Levine, Erel; Domany, Eytan

    2000-10-01

    We present a coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task. We present an algorithm, based on iterative clustering, that performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.

  3. Comparing Binaural Pre-processing Strategies I: Instrumental Evaluation.

    PubMed

    Baumgärtel, Regina M; Krawczyk-Becker, Martin; Marquardt, Daniel; Völker, Christoph; Hu, Hongmei; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Ernst, Stephan M A; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias

    2015-01-01

    In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios.

  4. Effect of data normalization on fuzzy clustering of DNA microarray data

    PubMed Central

    Kim, Seo Young; Lee, Jae Won; Bae, Jong Sung

    2006-01-01

    Background Microarray technology has made it possible to simultaneously measure the expression levels of large numbers of genes in a short time. Gene expression data is information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. Clustering is an important tool for finding groups of genes with similar expression patterns in microarray data analysis. However, hard clustering methods, which assign each gene exactly to one cluster, are poorly suited to the analysis of microarray datasets because in such datasets the clusters of genes frequently overlap. Results In this study we applied the fuzzy partitional clustering method known as Fuzzy C-Means (FCM) to overcome the limitations of hard clustering. To identify the effect of data normalization, we used three normalization methods, the two common scale and location transformations and Lowess normalization methods, to normalize three microarray datasets and three simulated datasets. First we determined the optimal parameters for FCM clustering. We found that the optimal fuzzification parameter in the FCM analysis of a microarray dataset depended on the normalization method applied to the dataset during preprocessing. We additionally evaluated the effect of normalization of noisy datasets on the results obtained when hard clustering or FCM clustering was applied to those datasets. The effects of normalization were evaluated using both simulated datasets and microarray datasets. A comparative analysis showed that the clustering results depended on the normalization method used and the noisiness of the data. In particular, the selection of the fuzzification parameter value for the FCM method was sensitive to the normalization method used for datasets with large variations across samples. Conclusion Lowess normalization is more robust for clustering of genes from general microarray data than the two common scale and location adjustment methods

  5. Studying bovine early embryo transcriptome by microarray.

    PubMed

    Dufort, Isabelle; Robert, Claude; Sirard, Marc-André

    2015-01-01

    Microarrays represent a significant advantage when studying gene expression in early embryo because they allow for a speedy study of a large number of genes even if the sample of interest contains small quantities of genetic material. Here we describe the protocols developed by the EmbryoGENE Network to study the bovine transcriptome in early embryo using a microarray experimental design.

  6. Microarrays Made Simple: "DNA Chips" Paper Activity

    ERIC Educational Resources Information Center

    Barnard, Betsy

    2006-01-01

    DNA microarray technology is revolutionizing biological science. DNA microarrays (also called DNA chips) allow simultaneous screening of many genes for changes in expression between different cells. Now researchers can obtain information about genes in days or weeks that used to take months or years. The paper activity described in this article…

  7. Application of microarray technology in pulmonary diseases

    PubMed Central

    Tzouvelekis, Argyris; Patlakas, George; Bouros, Demosthenes

    2004-01-01

    Microarrays are a powerful tool that have multiple applications both in clinical and cell biology arenas of common lung diseases. To exemplify how this tool can be useful, in this review, we will provide an overview of the application of microarray technology in research relevant to common lung diseases and present some of the future perspectives. PMID:15585067

  8. Sensing immune responses with customized peptide microarrays.

    PubMed

    Schirwitz, Christopher; Loeffler, Felix F; Felgenhauer, Thomas; Stadler, Volker; Breitling, Frank; Bischoff, F Ralf

    2012-12-01

    The intent to solve biological and biomedical questions in high-throughput led to an immense interest in microarray technologies. Nowadays, DNA microarrays are routinely used to screen for oligonucleotide interactions within a large variety of potential interaction partners. To study interactions on the protein level with the same efficiency, protein and peptide microarrays offer similar advantages, but their production is more demanding. A new technology to produce peptide microarrays with a laser printer provides access to affordable and highly complex peptide microarrays. Such a peptide microarray can contain up to 775 peptide spots per cm², whereby the position of each peptide spot and, thus, the amino acid sequence of the corresponding peptide, is exactly known. Compared to other techniques, such as the SPOT synthesis, more features per cm² at lower costs can be synthesized which paves the way for laser printed peptide microarrays to take on roles as efficient and affordable biomedical sensors. Here, we describe the laser printer-based synthesis of peptide microarrays and focus on an application involving the blood sera of tetanus immunized individuals, indicating the potential of peptide arrays to sense immune responses.

  9. Automated Pre-processing for NMR Assignments with Reduced Tedium

    2004-05-11

    An important rate-limiting step in the reasonance asignment process is accurate identification of resonance peaks in MNR spectra. NMR spectra are noisy. Hence, automatic peak-picking programs must navigate between the Scylla of reliable but incomplete picking, and the Charybdis of noisy but complete picking. Each of these extremes complicates the assignment process: incomplete peak-picking results in the loss of essential connectivities, while noisy picking conceals the true connectivities under a combinatiorial explosion of false positives.more » Intermediate processing can simplify the assignment process by preferentially removing false peaks from noisy peak lists. This is accomplished by requiring consensus between multiple NMR experiments, exploiting a priori information about NMR spectra, and drawing on empirical statistical distributions of chemical shift extracted from the BioMagResBank. Experienced NMR practitioners currently apply many of these techniques "by hand", which is tedious, and may appear arbitrary to the novice. To increase efficiency, we have created a systematic and automated approach to this process, known as APART. Automated pre-processing has three main advantages: reduced tedium, standardization, and pedagogy. In the hands of experienced spectroscopists, the main advantage is reduced tedium (a rapid increase in the ratio of true peaks to false peaks with minimal effort). When a project is passed from hand to hand, the main advantage is standardization. APART automatically documents the peak filtering process by archiving its original recommendations, the accompanying justifications, and whether a user accepted or overrode a given filtering recommendation. In the hands of a novice, this tool can reduce the stumbling block of learning to differentiate between real peaks and noise, by providing real-time examples of how such decisions are made.« less

  10. Performance of Multi-User Transmitter Pre-Processing Assisted Multi-Cell IDMA System for Downlink Transmission

    NASA Astrophysics Data System (ADS)

    Partibane, B.; Nagarajan, V.; Vishvaksenan, K. S.; Kalidoss, R.

    2015-06-01

    In this paper, we present the performance of multi-user transmitter pre-processing (MUTP) assisted coded-interleave division multiple access (IDMA) system over correlated frequency-selective channels for downlink communication. We realize MUTP using singular value decomposition (SVD) technique, which exploits the channel state information (CSI) of all the active users that is acquired via feedback channels. We consider the MUTP technique to alleviate the effects of co-channel interference (CCI) and multiple access interference (MAI). To be specific, we estimate the CSI using least square error (LSE) algorithm at each of the mobile stations (MSs) and perform vector quantization using Lloyd's algorithm, and feedback the bits that represents the quantized magnitudes and phases to the base station (BS) through the dedicated low rate noisy channel. Finally we recover the quantized bits at the BS to formulate the pre-processing matrix. The performance of MUTP aided IDMA systems are evaluated for five types of delay spread distributions pertaining to long-term evolution (LTE) and Stanford University Interim (SUI) channel models. We also compare the performance of MUTP with minimum mean square error (MMSE) detector for the coded IDMA system. The considered TP scheme alleviates the effects of CCI with less complex signal detection at the MSs when compared to MMSE detector. Further, our simulation results reveal that SVD-based MUTP assisted coded IDMA system outperforms the MMSE detector in terms of achievable bit error rate (BER) with low signal-to-noise ratio (SNR) requirement by mitigating the effects of CCI and MAI.

  11. Microarray Applications in Microbial Ecology Research.

    SciTech Connect

    Gentry, T.; Schadt, C.; Zhou, J.

    2006-04-06

    Microarray technology has the unparalleled potential tosimultaneously determine the dynamics and/or activities of most, if notall, of the microbial populations in complex environments such as soilsand sediments. Researchers have developed several types of arrays thatcharacterize the microbial populations in these samples based on theirphylogenetic relatedness or functional genomic content. Several recentstudies have used these microarrays to investigate ecological issues;however, most have only analyzed a limited number of samples withrelatively few experiments utilizing the full high-throughput potentialof microarray analysis. This is due in part to the unique analyticalchallenges that these samples present with regard to sensitivity,specificity, quantitation, and data analysis. This review discussesspecific applications of microarrays to microbial ecology research alongwith some of the latest studies addressing the difficulties encounteredduring analysis of complex microbial communities within environmentalsamples. With continued development, microarray technology may ultimatelyachieve its potential for comprehensive, high-throughput characterizationof microbial populations in near real-time.

  12. Real-time DNA microarray analysis

    PubMed Central

    Hassibi, Arjang; Vikalo, Haris; Riechmann, José Luis; Hassibi, Babak

    2009-01-01

    We present a quantification method for affinity-based DNA microarrays which is based on the real-time measurements of hybridization kinetics. This method, i.e. real-time DNA microarrays, enhances the detection dynamic range of conventional systems by being impervious to probe saturation in the capturing spots, washing artifacts, microarray spot-to-spot variations, and other signal amplitude-affecting non-idealities. We demonstrate in both theory and practice that the time-constant of target capturing in microarrays, similar to all affinity-based biosensors, is inversely proportional to the concentration of the target analyte, which we subsequently use as the fundamental parameter to estimate the concentration of the analytes. Furthermore, to empirically validate the capabilities of this method in practical applications, we present a FRET-based assay which enables the real-time detection in gene expression DNA microarrays. PMID:19723688

  13. Real-time DNA microarray analysis.

    PubMed

    Hassibi, Arjang; Vikalo, Haris; Riechmann, José Luis; Hassibi, Babak

    2009-11-01

    We present a quantification method for affinity-based DNA microarrays which is based on the real-time measurements of hybridization kinetics. This method, i.e. real-time DNA microarrays, enhances the detection dynamic range of conventional systems by being impervious to probe saturation in the capturing spots, washing artifacts, microarray spot-to-spot variations, and other signal amplitude-affecting non-idealities. We demonstrate in both theory and practice that the time-constant of target capturing in microarrays, similar to all affinity-based biosensors, is inversely proportional to the concentration of the target analyte, which we subsequently use as the fundamental parameter to estimate the concentration of the analytes. Furthermore, to empirically validate the capabilities of this method in practical applications, we present a FRET-based assay which enables the real-time detection in gene expression DNA microarrays. PMID:19723688

  14. Tissue Microarrays in Clinical Oncology

    PubMed Central

    Voduc, David; Kenney, Challayne; Nielsen, Torsten O.

    2008-01-01

    The tissue microarray is a recently-implemented, high-throughput technology for the analysis of molecular markers in oncology. This research tool permits the rapid assessment of a biomarker in thousands of tumor samples, using commonly available laboratory assays such as immunohistochemistry and in-situ hybridization. Although introduced less than a decade ago, the TMA has proven to be invaluable in the study of tumor biology, the development of diagnostic tests, and the investigation of oncological biomarkers. This review describes the impact of TMA-based research in clinical oncology and its potential future applications. Technical aspects of TMA construction, and the advantages and disadvantages inherent to this technology are also discussed. PMID:18314063

  15. DNA Microarrays for Identifying Fishes

    PubMed Central

    Nölte, M.; Weber, H.; Silkenbeumer, N.; Hjörleifsdottir, S.; Hreggvidsson, G. O.; Marteinsson, V.; Kappel, K.; Planes, S.; Tinti, F.; Magoulas, A.; Garcia Vazquez, E.; Turan, C.; Hervet, C.; Campo Falgueras, D.; Antoniou, A.; Landi, M.; Blohm, D.

    2008-01-01

    In many cases marine organisms and especially their diverse developmental stages are difficult to identify by morphological characters. DNA-based identification methods offer an analytically powerful addition or even an alternative. In this study, a DNA microarray has been developed to be able to investigate its potential as a tool for the identification of fish species from European seas based on mitochondrial 16S rDNA sequences. Eleven commercially important fish species were selected for a first prototype. Oligonucleotide probes were designed based on the 16S rDNA sequences obtained from 230 individuals of 27 fish species. In addition, more than 1200 sequences of 380 species served as sequence background against which the specificity of the probes was tested in silico. Single target hybridisations with Cy5-labelled, PCR-amplified 16S rDNA fragments from each of the 11 species on microarrays containing the complete set of probes confirmed their suitability. True-positive, fluorescence signals obtained were at least one order of magnitude stronger than false-positive cross-hybridisations. Single nontarget hybridisations resulted in cross-hybridisation signals at approximately 27% of the cases tested, but all of them were at least one order of magnitude lower than true-positive signals. This study demonstrates that the 16S rDNA gene is suitable for designing oligonucleotide probes, which can be used to differentiate 11 fish species. These data are a solid basis for the second step to create a “Fish Chip” for approximately 50 fish species relevant in marine environmental and fisheries research, as well as control of fisheries products. PMID:18270778

  16. Generation of attributes for learning algorithms

    SciTech Connect

    Hu, Yuh-Jyh; Kibler, D.

    1996-12-31

    Inductive algorithms rely strongly on their representational biases. Constructive induction can mitigate representational inadequacies. This paper introduces the notion of a relative gain measure and describes a new constructive induction algorithm (GALA) which is independent of the learning algorithm. Unlike most previous research on constructive induction, our methods are designed as preprocessing step before standard machine learning algorithms are applied. We present the results which demonstrate the effectiveness of GALA on artificial and real domains for several learners: C4.5, CN2, perceptron and backpropagation.

  17. Algorithms and Algorithmic Languages.

    ERIC Educational Resources Information Center

    Veselov, V. M.; Koprov, V. M.

    This paper is intended as an introduction to a number of problems connected with the description of algorithms and algorithmic languages, particularly the syntaxes and semantics of algorithmic languages. The terms "letter, word, alphabet" are defined and described. The concept of the algorithm is defined and the relation between the algorithm and…

  18. Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability.

    PubMed

    Uziela, Karolis; Honkela, Antti

    2015-01-01

    Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all difficult. Although RNA-sequencing (RNA-seq) is attracting the most attention, at present, the rate of new microarray studies submitted to public databases far exceeds the rate of new RNA-seq studies. There is clearly a need for methods that make it easier to combine data from different technologies. In this paper, we propose a new method for processing RNA-seq data that yields gene expression estimates that are much more similar to corresponding estimates from microarray data, hence greatly improving cross-platform comparability. The method we call PREBS is based on estimating the expression from RNA-seq reads overlapping the microarray probe regions, and processing these estimates with standard microarray summarisation algorithms. Using paired microarray and RNA-seq samples from TCGA LAML data set we show that PREBS expression estimates derived from RNA-seq are more similar to microarray-based expression estimates than those from other RNA-seq processing methods. In an experiment to retrieve paired microarray samples from a database using an RNA-seq query sample, gene signatures defined based on PREBS expression estimates were found to be much more accurate than those from other methods. PREBS also allows new ways of using RNA-seq data, such as expression estimation for microarray probe sets. An implementation of the proposed method is available in the Bioconductor package "prebs."

  19. EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration

    PubMed Central

    Forment, Javier; Gilabert, Francisco; Robles, Antonio; Conejero, Vicente; Nuez, Fernando; Blanca, Jose M

    2008-01-01

    Background Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation. Results We have created EST2uni, an integrated, highly-configurable EST analysis pipeline and data mining software package that automates the pre-processing, clustering, annotation, database creation, and data mining of EST collections. The pipeline uses standard EST analysis tools and the software has a modular design to facilitate the addition of new analytical methods and their configuration. Currently implemented analyses include functional and structural annotation, SNP and microsatellite discovery, integration of previously known genetic marker data and gene expression results, and assistance in cDNA microarray design. It can be run in parallel in a PC cluster in order to reduce the time necessary for the analysis. It also creates a web site linked to the database, showing collection statistics, with complex query capabilities and tools for data mining and retrieval. Conclusion The software package presented here provides an efficient and complete bioinformatics tool for the management of EST collections which is very easy to adapt to the local needs of different EST projects. The code is freely available under the GPL license and can be obtained at . This site also provides detailed instructions for

  20. Comparison of contamination of femoral heads and pre-processed bone chips during hip revision arthroplasty.

    PubMed

    Mathijssen, N M C; Sturm, P D; Pilot, P; Bloem, R M; Buma, P; Petit, P L; Schreurs, B W

    2013-12-01

    With bone impaction grafting, cancellous bone chips made from allograft femoral heads are impacted in a bone defect, which introduces an additional source of infection. The potential benefit of the use of pre-processed bone chips was investigated by comparing the bacterial contamination of bone chips prepared intraoperatively with the bacterial contamination of pre-processed bone chips at different stages in the surgical procedure. To investigate baseline contamination of the bone grafts, specimens were collected during 88 procedures before actual use or preparation of the bone chips: in 44 procedures intraoperatively prepared chips were used (Group A) and in the other 44 procedures pre-processed bone chips were used (Group B). In 64 of these procedures (32 using locally prepared bone chips and 32 using pre-processed bone chips) specimens were also collected later in the procedure to investigate contamination after use and preparation of the bone chips. In total, 8 procedures had one or more positive specimen(s) (12.5 %). Contamination rates were not significantly different between bone chips prepared at the operating theatre and pre-processed bone chips. In conclusion, there was no difference in bacterial contamination between bone chips prepared from whole femoral heads in the operating room and pre-processed bone chips, and therefore, both types of bone allografts are comparable with respect to risk of infection.

  1. Genetic programming based ensemble system for microarray data classification.

    PubMed

    Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To

    2015-01-01

    Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.

  2. Microarray-integrated optoelectrofluidic immunoassay system.

    PubMed

    Han, Dongsik; Park, Je-Kyun

    2016-05-01

    A microarray-based analytical platform has been utilized as a powerful tool in biological assay fields. However, an analyte depletion problem due to the slow mass transport based on molecular diffusion causes low reaction efficiency, resulting in a limitation for practical applications. This paper presents a novel method to improve the efficiency of microarray-based immunoassay via an optically induced electrokinetic phenomenon by integrating an optoelectrofluidic device with a conventional glass slide-based microarray format. A sample droplet was loaded between the microarray slide and the optoelectrofluidic device on which a photoconductive layer was deposited. Under the application of an AC voltage, optically induced AC electroosmotic flows caused by a microarray-patterned light actively enhanced the mass transport of target molecules at the multiple assay spots of the microarray simultaneously, which reduced tedious reaction time from more than 30 min to 10 min. Based on this enhancing effect, a heterogeneous immunoassay with a tiny volume of sample (5 μl) was successfully performed in the microarray-integrated optoelectrofluidic system using immunoglobulin G (IgG) and anti-IgG, resulting in improved efficiency compared to the static environment. Furthermore, the application of multiplex assays was also demonstrated by multiple protein detection.

  3. Progress in the application of DNA microarrays.

    PubMed Central

    Lobenhofer, E K; Bushel, P R; Afshari, C A; Hamadeh, H K

    2001-01-01

    Microarray technology has been applied to a variety of different fields to address fundamental research questions. The use of microarrays, or DNA chips, to study the gene expression profiles of biologic samples began in 1995. Since that time, the fundamental concepts behind the chip, the technology required for making and using these chips, and the multitude of statistical tools for analyzing the data have been extensively reviewed. For this reason, the focus of this review will be not on the technology itself but on the application of microarrays as a research tool and the future challenges of the field. PMID:11673116

  4. DNA Microarrays in Herbal Drug Research

    PubMed Central

    Chavan, Preeti; Joshi, Kalpana; Patwardhan, Bhushan

    2006-01-01

    Natural products are gaining increased applications in drug discovery and development. Being chemically diverse they are able to modulate several targets simultaneously in a complex system. Analysis of gene expression becomes necessary for better understanding of molecular mechanisms. Conventional strategies for expression profiling are optimized for single gene analysis. DNA microarrays serve as suitable high throughput tool for simultaneous analysis of multiple genes. Major practical applicability of DNA microarrays remains in DNA mutation and polymorphism analysis. This review highlights applications of DNA microarrays in pharmacodynamics, pharmacogenomics, toxicogenomics and quality control of herbal drugs and extracts. PMID:17173108

  5. Protein Microarrays: Novel Developments and Applications

    PubMed Central

    Berrade, Luis; Garcia, Angie E.

    2011-01-01

    Protein microarray technology possesses some of the greatest potential for providing direct information on protein function and potential drug targets. For example, functional protein microarrays are ideal tools suited for the mapping of biological pathways. They can be used to study most major types of interactions and enzymatic activities that take place in biochemical pathways and have been used for the analysis of simultaneous multiple biomolecular interactions involving protein-protein, protein-lipid, protein-DNA and protein-small molecule interactions. Because of this unique ability to analyze many kinds of molecular interactions en masse, the requirement of very small sample amount and the potential to be miniaturized and automated, protein microarrays are extremely well suited for protein profiling, drug discovery, drug target identification and clinical prognosis and diagnosis. The aim of this review is to summarize the most recent developments in the production, applications and analysis of protein microarrays. PMID:21116694

  6. Quality Visualization of Microarray Datasets Using Circos

    PubMed Central

    Koch, Martin; Wiese, Michael

    2012-01-01

    Quality control and normalization is considered the most important step in the analysis of microarray data. At present there are various methods available for quality assessments of microarray datasets. However there seems to be no standard visualization routine, which also depicts individual microarray quality. Here we present a convenient method for visualizing the results of standard quality control tests using Circos plots. In these plots various quality measurements are drawn in a circular fashion, thus allowing for visualization of the quality and all outliers of each distinct array within a microarray dataset. The proposed method is intended for use with the Affymetrix Human Genome platform (i.e., GPL 96, GPL570 and GPL571). Circos quality measurement plots are a convenient way for the initial quality estimate of Affymetrix datasets that are stored in publicly available databases.

  7. SemBiosphere: a semantic web approach to recommending microarray clustering services.

    PubMed

    Yip, Kevin Y; Qi, Peishen; Schultz, Martin; Cheung, David W; Cheung, Kei-Hoi

    2006-01-01

    Clustering is a popular method for analyzing microarray data. Given the large number of clustering algorithms being available, it is difficult to identify the most suitable ones for a particular task. It is also difficult to locate, download, install and run the algorithms. This paper describes a matchmaking system, SemBiosphere, which solves both problems. It recommends clustering algorithms based on some minimal user requirement inputs and the data properties. An ontology was developed in OWL, an expressive ontological language, for describing what the algorithms are and how they perform, in addition to how they can be invoked. This allows machines to "understand" the algorithms and make the recommendations. The algorithm can be implemented by different groups and in different languages, and run on different platforms at geographically distributed sites. Through the use of XML-based web services, they can all be invoked in the same standard way. The current clustering services were transformed from the non-semantic web services of the Biosphere system, which includes a variety of algorithms that have been applied to microarray gene expression data analysis. New algorithms can be incorporated into the system without too much effort. The SemBiosphere system and the complete clustering ontology can be accessed at http://yeasthub2.gersteinlab. org/sembiosphere/.

  8. Contributions to Statistical Problems Related to Microarray Data

    ERIC Educational Resources Information Center

    Hong, Feng

    2009-01-01

    Microarray is a high throughput technology to measure the gene expression. Analysis of microarray data brings many interesting and challenging problems. This thesis consists three studies related to microarray data. First, we propose a Bayesian model for microarray data and use Bayes Factors to identify differentially expressed genes. Second, we…

  9. The Impact of Photobleaching on Microarray Analysis.

    PubMed

    von der Haar, Marcel; Preuß, John-Alexander; von der Haar, Kathrin; Lindner, Patrick; Scheper, Thomas; Stahl, Frank

    2015-01-01

    DNA-Microarrays have become a potent technology for high-throughput analysis of genetic regulation. However, the wide dynamic range of signal intensities of fluorophore-based microarrays exceeds the dynamic range of a single array scan by far, thus limiting the key benefit of microarray technology: parallelization. The implementation of multi-scan techniques represents a promising approach to overcome these limitations. These techniques are, in turn, limited by the fluorophores' susceptibility to photobleaching when exposed to the scanner's laser light. In this paper the photobleaching characteristics of cyanine-3 and cyanine-5 as part of solid state DNA microarrays are studied. The effects of initial fluorophore intensity as well as laser scanner dependent variables such as the photomultiplier tube's voltage on bleaching and imaging are investigated. The resulting data is used to develop a model capable of simulating the expected degree of signal intensity reduction caused by photobleaching for each fluorophore individually, allowing for the removal of photobleaching-induced, systematic bias in multi-scan procedures. Single-scan applications also benefit as they rely on pre-scans to determine the optimal scanner settings. These findings constitute a step towards standardization of microarray experiments and analysis and may help to increase the lab-to-lab comparability of microarray experiment results. PMID:26378589

  10. Automated analytical microarrays: a critical review.

    PubMed

    Seidel, Michael; Niessner, Reinhard

    2008-07-01

    Microarrays provide a powerful analytical tool for the simultaneous detection of multiple analytes in a single experiment. The specific affinity reaction of nucleic acids (hybridization) and antibodies towards antigens is the most common bioanalytical method for generating multiplexed quantitative results. Nucleic acid-based analysis is restricted to the detection of cells and viruses. Antibodies are more universal biomolecular receptors that selectively bind small molecules such as pesticides, small toxins, and pharmaceuticals and to biopolymers (e.g. toxins, allergens) and complex biological structures like bacterial cells and viruses. By producing an appropriate antibody, the corresponding antigenic analyte can be detected on a multiplexed immunoanalytical microarray. Food and water analysis along with clinical diagnostics constitute potential application fields for multiplexed analysis. Diverse fluorescence, chemiluminescence, electrochemical, and label-free microarray readout systems have been developed in the last decade. Some of them are constructed as flow-through microarrays by combination with a fluidic system. Microarrays have the potential to become widely accepted as a system for analytical applications, provided that robust and validated results on fully automated platforms are successfully generated. This review gives an overview of the current research on microarrays with the focus on automated systems and quantitative multiplexed applications.

  11. Evaluation of Surface Chemistries for Antibody Microarrays

    SciTech Connect

    Seurynck-Servoss, Shannon L.; White, Amanda M.; Baird, Cheryl L.; Rodland, Karin D.; Zangar, Richard C.

    2007-12-01

    Antibody microarrays are an emerging technology that promises to be a powerful tool for the detection of disease biomarkers. The current technology for protein microarrays has been primarily derived from DNA microarrays and is not fully characterized for use with proteins. For example, there are a myriad of surface chemistries that are commercially available for antibody microarrays, but no rigorous studies that compare these different surfaces. Therefore, we have used an enzyme-linked immunosorbent assay (ELISA) microarray platform to analyze 16 different commercially available slide types. Full standard curves were generated for 24 different assays. We found that this approach provides a rigorous and quantitative system for comparing the different slide types based on spot size and morphology, slide noise, spot background, lower limit of detection, and reproducibility. These studies demonstrate that the properties of the slide surface affect the activity of immobilized antibodies and the quality of data produced. Although many slide types can produce useful data, glass slides coated with poly-L-lysine or aminosilane, with or without activation with a crosslinker, consistently produce superior results in the ELISA microarray analyses we performed.

  12. Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning

    PubMed Central

    Maulik, Ujjwal

    2014-01-01

    Microarrays have now gone from obscurity to being almost ubiquitous in biological research. At the same time, the statistical methodology for microarray analysis has progressed from simple visual assessments of results to novel algorithms for analyzing changes in expression profiles. In a micro-RNA (miRNA) or gene-expression profiling experiment, the expression levels of thousands of genes/miRNAs are simultaneously monitored to study the effects of certain treatments, diseases, and developmental stages on their expressions. Microarray-based gene expression profiling can be used to identify genes, whose expressions are changed in response to pathogens or other organisms by comparing gene expression in infected to that in uninfected cells or tissues. Recent studies have revealed that patterns of altered microarray expression profiles in cancer can serve as molecular biomarkers for tumor diagnosis, prognosis of disease-specific outcomes, and prediction of therapeutic responses. Microarray data sets containing expression profiles of a number of miRNAs or genes are used to identify biomarkers, which have dysregulation in normal and malignant tissues. However, small sample size remains a bottleneck to design successful classification methods. On the other hand, adequate number of microarray data that do not have clinical knowledge can be employed as additional source of information. In this paper, a combination of kernelized fuzzy rough set (KFRS) and semisupervised support vector machine (S3VM) is proposed for predicting cancer biomarkers from one miRNA and three gene expression data sets. Biomarkers are discovered employing three feature selection methods, including KFRS. The effectiveness of the proposed KFRS and S3VM combination on the microarray data sets is demonstrated, and the cancer biomarkers identified from miRNA data are reported. Furthermore, biological significance tests are conducted for miRNA cancer biomarkers. PMID:27170887

  13. Identifying genes relevant to specific biological conditions in time course microarray experiments.

    PubMed

    Singh, Nitesh Kumar; Repsilber, Dirk; Liebscher, Volkmar; Taher, Leila; Fuellen, Georg

    2013-01-01

    Microarrays have been useful in understanding various biological processes by allowing the simultaneous study of the expression of thousands of genes. However, the analysis of microarray data is a challenging task. One of the key problems in microarray analysis is the classification of unknown expression profiles. Specifically, the often large number of non-informative genes on the microarray adversely affects the performance and efficiency of classification algorithms. Furthermore, the skewed ratio of sample to variable poses a risk of overfitting. Thus, in this context, feature selection methods become crucial to select relevant genes and, hence, improve classification accuracy. In this study, we investigated feature selection methods based on gene expression profiles and protein interactions. We found that in our setup, the addition of protein interaction information did not contribute to any significant improvement of the classification results. Furthermore, we developed a novel feature selection method that relies exclusively on observed gene expression changes in microarray experiments, which we call "relative Signal-to-Noise ratio" (rSNR). More precisely, the rSNR ranks genes based on their specificity to an experimental condition, by comparing intrinsic variation, i.e. variation in gene expression within an experimental condition, with extrinsic variation, i.e. variation in gene expression across experimental conditions. Genes with low variation within an experimental condition of interest and high variation across experimental conditions are ranked higher, and help in improving classification accuracy. We compared different feature selection methods on two time-series microarray datasets and one static microarray dataset. We found that the rSNR performed generally better than the other methods.

  14. Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning.

    PubMed

    Chakraborty, Debasis; Maulik, Ujjwal

    2014-01-01

    Microarrays have now gone from obscurity to being almost ubiquitous in biological research. At the same time, the statistical methodology for microarray analysis has progressed from simple visual assessments of results to novel algorithms for analyzing changes in expression profiles. In a micro-RNA (miRNA) or gene-expression profiling experiment, the expression levels of thousands of genes/miRNAs are simultaneously monitored to study the effects of certain treatments, diseases, and developmental stages on their expressions. Microarray-based gene expression profiling can be used to identify genes, whose expressions are changed in response to pathogens or other organisms by comparing gene expression in infected to that in uninfected cells or tissues. Recent studies have revealed that patterns of altered microarray expression profiles in cancer can serve as molecular biomarkers for tumor diagnosis, prognosis of disease-specific outcomes, and prediction of therapeutic responses. Microarray data sets containing expression profiles of a number of miRNAs or genes are used to identify biomarkers, which have dysregulation in normal and malignant tissues. However, small sample size remains a bottleneck to design successful classification methods. On the other hand, adequate number of microarray data that do not have clinical knowledge can be employed as additional source of information. In this paper, a combination of kernelized fuzzy rough set (KFRS) and semisupervised support vector machine (S(3)VM) is proposed for predicting cancer biomarkers from one miRNA and three gene expression data sets. Biomarkers are discovered employing three feature selection methods, including KFRS. The effectiveness of the proposed KFRS and S(3)VM combination on the microarray data sets is demonstrated, and the cancer biomarkers identified from miRNA data are reported. Furthermore, biological significance tests are conducted for miRNA cancer biomarkers.

  15. On the Development of Parafoveal Preprocessing: Evidence from the Incremental Boundary Paradigm

    PubMed Central

    Marx, Christina; Hutzler, Florian; Schuster, Sarah; Hawelka, Stefan

    2016-01-01

    Parafoveal preprocessing of upcoming words and the resultant preview benefit are key aspects of fluent reading. Evidence regarding the development of parafoveal preprocessing during reading acquisition, however, is scarce. The present developmental (cross-sectional) eye tracking study estimated the magnitude of parafoveal preprocessing of beginning readers with a novel variant of the classical boundary paradigm. Additionally, we assessed the association of parafoveal preprocessing with several reading-related psychometric measures. The participants were children learning to read the regular German orthography with about 1, 3, and 5 years of formal reading instruction (Grade 2, 4, and 6, respectively). We found evidence of parafoveal preprocessing in each Grade. However, an effective use of parafoveal information was related to the individual reading fluency of the participants (i.e., the reading rate expressed as words-per-minute) which substantially overlapped between the Grades. The size of the preview benefit was furthermore associated with the children’s performance in rapid naming tasks and with their performance in a pseudoword reading task. The latter task assessed the children’s efficiency in phonological decoding and our findings show that the best decoders exhibited the largest preview benefit. PMID:27148123

  16. On the Development of Parafoveal Preprocessing: Evidence from the Incremental Boundary Paradigm.

    PubMed

    Marx, Christina; Hutzler, Florian; Schuster, Sarah; Hawelka, Stefan

    2016-01-01

    Parafoveal preprocessing of upcoming words and the resultant preview benefit are key aspects of fluent reading. Evidence regarding the development of parafoveal preprocessing during reading acquisition, however, is scarce. The present developmental (cross-sectional) eye tracking study estimated the magnitude of parafoveal preprocessing of beginning readers with a novel variant of the classical boundary paradigm. Additionally, we assessed the association of parafoveal preprocessing with several reading-related psychometric measures. The participants were children learning to read the regular German orthography with about 1, 3, and 5 years of formal reading instruction (Grade 2, 4, and 6, respectively). We found evidence of parafoveal preprocessing in each Grade. However, an effective use of parafoveal information was related to the individual reading fluency of the participants (i.e., the reading rate expressed as words-per-minute) which substantially overlapped between the Grades. The size of the preview benefit was furthermore associated with the children's performance in rapid naming tasks and with their performance in a pseudoword reading task. The latter task assessed the children's efficiency in phonological decoding and our findings show that the best decoders exhibited the largest preview benefit. PMID:27148123

  17. Algorithms for optimal dyadic decision trees

    SciTech Connect

    Hush, Don; Porter, Reid

    2009-01-01

    A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.

  18. poolMC: Smart pooling of mRNA samples in microarray experiments

    PubMed Central

    2010-01-01

    Background Typically, pooling of mRNA samples in microarray experiments implies mixing mRNA from several biological-replicate samples before hybridization onto a microarray chip. Here we describe an alternative smart pooling strategy in which different samples, not necessarily biological replicates, are pooled in an information theoretic efficient way. Further, each sample is tested on multiple chips, but always in pools made up of different samples. The end goal is to exploit the compressibility of microarray data to reduce the number of chips used and increase the robustness to noise in measurements. Results A theoretical framework to perform smart pooling of mRNA samples in microarray experiments was established and the software implementation of the pooling and decoding algorithms was developed in MATLAB. A proof-of-concept smart pooled experiment was performed using validated biological samples on commercially available gene chips. Differential-expression analysis of the smart pooled data was performed and compared against the unpooled control experiment. Conclusions The theoretical developments and experimental demonstration in this paper provide a useful starting point to investigate smart pooling of mRNA samples in microarray experiments. Although the smart pooled experiment did not compare favorably with the control, the experiment highlighted important conditions for the successful implementation of smart pooling - linearity of measurements, sparsity in data, and large experiment size. PMID:20525223

  19. Preprocessing Inconsistent Linear System for a Meaningful Least Squares Solution

    NASA Technical Reports Server (NTRS)

    Sen, Syamal K.; Shaykhian, Gholam Ali

    2011-01-01

    Mathematical models of many physical/statistical problems are systems of linear equations. Due to measurement and possible human errors/mistakes in modeling/data, as well as due to certain assumptions to reduce complexity, inconsistency (contradiction) is injected into the model, viz. the linear system. While any inconsistent system irrespective of the degree of inconsistency has always a least-squares solution, one needs to check whether an equation is too much inconsistent or, equivalently too much contradictory. Such an equation will affect/distort the least-squares solution to such an extent that renders it unacceptable/unfit to be used in a real-world application. We propose an algorithm which (i) prunes numerically redundant linear equations from the system as these do not add any new information to the model, (ii) detects contradictory linear equations along with their degree of contradiction (inconsistency index), (iii) removes those equations presumed to be too contradictory, and then (iv) obtain the minimum norm least-squares solution of the acceptably inconsistent reduced linear system. The algorithm presented in Matlab reduces the computational and storage complexities and also improves the accuracy of the solution. It also provides the necessary warning about the existence of too much contradiction in the model. In addition, we suggest a thorough relook into the mathematical modeling to determine the reason why unacceptable contradiction has occurred thus prompting us to make necessary corrections/modifications to the models - both mathematical and, if necessary, physical.

  20. Adaptive filtering image preprocessing for smart FPA technology

    NASA Astrophysics Data System (ADS)

    Brooks, Geoffrey W.

    1995-05-01

    This paper discusses two applications of adaptive filters for image processing on parallel architectures. The first, based on the results of previously accomplished work, summarizes the analyses of various adaptive filters implemented for pixel-level image prediction. FIR filters, fixed and adaptive IIR filters, and various variable step size algorithms were compared with a focus on algorithm complexity against the ability to predict future pixel values. A gaussian smoothing operation with varying spatial and temporal constants were also applied for comparisons of random noise reductions. The second application is a suggestion to use memory-adaptive IIR filters for detecting and tracking motion within an image. Objects within an image are made of edges, or segments, with varying degrees of motion. An application has been previously published that describes FIR filters connecting pixels and using correlations to determine motion and direction. This implementation seems limited to detecting motion coinciding with FIR filter operation rate and the associated harmonics. Upgrading the FIR structures with adaptive IIR structures can eliminate these limitations. These and any other pixel-level adaptive filtering application require data memory for filter parameters and some basic computational capability. Tradeoffs have to be made between chip real estate and these desired features. System tradeoffs will also have to be made as to where it makes the most sense to do which level of processing. Although smart pixels may not be ready to implement adaptive filters, applications such as these should give the smart pixel designer some long range goals.

  1. Image analysis and data normalization procedures are crucial for microarray analyses.

    PubMed

    Kadanga, Ali Kpatcha; Leroux, Christine; Bonnet, Muriel; Chauvet, Stéphanie; Meunier, Bruno; Cassar-Malek, Isabelle; Hocquette, Jean-François

    2008-03-17

    This study was conducted with the aim of optimizing the experimental design of array experiments. We compared two image analysis and normalization procedures prior to data analysis using two experimental designs. For this, RNA samples from Charolais steers Longissimus thoracis muscle and subcutaneous adipose tissues were labeled and hybridized to a bovine 8,400 oligochip either in triplicate or in a dye-swap design. Image analysis and normalization were processed by either GenePix/MadScan or ImaGene/GeneSight. Statistical data analysis was then run using either the SAM method or a Student's t-test using a multiple test correction run on R 2.1 software. Our results show that image analysis and normalization procedure had an impact whereas the statistical methods much less influenced the outcome of differentially expressed genes. Image analysis and data normalization are thus an important aspect of microarray experiments, having a potentially significant impact on downstream analyses such as the identification of differentially expressed genes. This study provides indications on the choice of raw data preprocessing in microarray technology.

  2. Image Analysis and Data Normalization Procedures are Crucial for Microarray Analyses

    PubMed Central

    Kadanga, Ali Kpatcha; Leroux, Christine; Bonnet, Muriel; Chauvet, Stéphanie; Meunier, Bruno; Cassar-Malek, Isabelle; Hocquette, Jean-François

    2008-01-01

    This study was conducted with the aim of optimizing the experimental design of array experiments. We compared two image analysis and normalization procedures prior to data analysis using two experimental designs. For this, RNA samples from Charolais steers Longissimus thoracis muscle and subcutaneous adipose tissues were labeled and hybridized to a bovine 8,400 oligochip either in triplicate or in a dye-swap design. Image analysis and normalization were processed by either GenePix/MadScan or ImaGene/GeneSight. Statistical data analysis was then run using either the SAM method or a Student’s t-test using a multiple test correction run on R 2.1 software. Our results show that image analysis and normalization procedure had an impact whereas the statistical methods much less influenced the outcome of differentially expressed genes. Image analysis and data normalization are thus an important aspect of microarray experiments, having a potentially significant impact on downstream analyses such as the identification of differentially expressed genes. This study provides indications on the choice of raw data preprocessing in microarray technology. PMID:19787079

  3. Chromosomal Microarray versus Karyotyping for Prenatal Diagnosis

    PubMed Central

    Wapner, Ronald J.; Martin, Christa Lese; Levy, Brynn; Ballif, Blake C.; Eng, Christine M.; Zachary, Julia M.; Savage, Melissa; Platt, Lawrence D.; Saltzman, Daniel; Grobman, William A.; Klugman, Susan; Scholl, Thomas; Simpson, Joe Leigh; McCall, Kimberly; Aggarwal, Vimla S.; Bunke, Brian; Nahum, Odelia; Patel, Ankita; Lamb, Allen N.; Thom, Elizabeth A.; Beaudet, Arthur L.; Ledbetter, David H.; Shaffer, Lisa G.; Jackson, Laird

    2013-01-01

    Background Chromosomal microarray analysis has emerged as a primary diagnostic tool for the evaluation of developmental delay and structural malformations in children. We aimed to evaluate the accuracy, efficacy, and incremental yield of chromosomal microarray analysis as compared with karyotyping for routine prenatal diagnosis. Methods Samples from women undergoing prenatal diagnosis at 29 centers were sent to a central karyotyping laboratory. Each sample was split in two; standard karyotyping was performed on one portion and the other was sent to one of four laboratories for chromosomal microarray. Results We enrolled a total of 4406 women. Indications for prenatal diagnosis were advanced maternal age (46.6%), abnormal result on Down’s syndrome screening (18.8%), structural anomalies on ultrasonography (25.2%), and other indications (9.4%). In 4340 (98.8%) of the fetal samples, microarray analysis was successful; 87.9% of samples could be used without tissue culture. Microarray analysis of the 4282 nonmosaic samples identified all the aneuploidies and unbalanced rearrangements identified on karyotyping but did not identify balanced translocations and fetal triploidy. In samples with a normal karyotype, microarray analysis revealed clinically relevant deletions or duplications in 6.0% with a structural anomaly and in 1.7% of those whose indications were advanced maternal age or positive screening results. Conclusions In the context of prenatal diagnostic testing, chromosomal microarray analysis identified additional, clinically significant cytogenetic information as compared with karyotyping and was equally efficacious in identifying aneuploidies and unbalanced rearrangements but did not identify balanced translocations and triploidies. (Funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development and others; ClinicalTrials.gov number, NCT01279733.) PMID:23215555

  4. Preprocessed barley, rye, and triticale as a feedstock for an integrated fuel ethanol-feedlot plant

    SciTech Connect

    Sosulski, K.; Wang, Sunmin; Ingledew, W.M.

    1997-12-31

    Rye, triticale, and barley were evaluated as starch feedstock to replace wheat for ethanol production. Preprocessing of grain by abrasion on a Satake mill reduced fiber and increased starch concentrations in feed-stock for fermentations. Higher concentrations of starch in flours from preprocessed cereal grains would increase plant throughput by 8-23% since more starch is processed in the same weight of feedstock. Increased concentrations of starch for fermentation resulted in higher concentrations of ethanol in beer. Energy requirements to produce one L of ethanol from preprocessed grains were reduced, the natural gas by 3.5-11.4%, whereas power consumption was reduced by 5.2-15.6%. 7 refs., 7 figs., 4 tabs.

  5. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships

    PubMed Central

    2010-01-01

    Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245

  6. Optimization of Preprocessing and Densification of Sorghum Stover at Full-scale Operation

    SciTech Connect

    Neal A. Yancey; Jaya Shankar Tumuluru; Craig C. Conner; Christopher T. Wright

    2011-08-01

    Transportation costs can be a prohibitive step in bringing biomass to a preprocessing location or biofuel refinery. One alternative to transporting biomass in baled or loose format to a preprocessing location, is to utilize a mobile preprocessing system that can be relocated to various locations where biomass is stored, preprocess and densify the biomass, then ship it to the refinery as needed. The Idaho National Laboratory has a full scale 'Process Demonstration Unit' PDU which includes a stage 1 grinder, hammer mill, drier, pellet mill, and cooler with the associated conveyance system components. Testing at bench and pilot scale has been conducted to determine effects of moisture on preprocessing, crop varieties on preprocessing efficiency and product quality. The INLs PDU provides an opportunity to test the conclusions made at the bench and pilot scale on full industrial scale systems. Each component of the PDU is operated from a central operating station where data is collected to determine power consumption rates for each step in the process. The power for each electrical motor in the system is monitored from the control station to monitor for problems and determine optimal conditions for the system performance. The data can then be viewed to observe how changes in biomass input parameters (moisture and crop type for example), mechanical changes (screen size, biomass drying, pellet size, grinding speed, etc.,), or other variations effect the power consumption of the system. Sorgum in four foot round bales was tested in the system using a series of 6 different screen sizes including: 3/16 in., 1 in., 2 in., 3 in., 4 in., and 6 in. The effect on power consumption, product quality, and production rate were measured to determine optimal conditions.

  7. Boosting model performance and interpretation by entangling preprocessing selection and variable selection.

    PubMed

    Gerretzen, Jan; Szymańska, Ewa; Bart, Jacob; Davies, Antony N; van Manen, Henk-Jan; van den Heuvel, Edwin R; Jansen, Jeroen J; Buydens, Lutgarde M C

    2016-09-28

    The aim of data preprocessing is to remove data artifacts-such as a baseline, scatter effects or noise-and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames. In that approach, the focus was solely on improving the predictive performance of the chemometric model. This is, however, only one of the two relevant criteria in modeling: interpretation of the model results can be just as important. Variable selection is often used to achieve such interpretation. Data artifacts, however, may hamper proper variable selection by masking the true relevant variables. The choice of preprocessing therefore has a huge impact on the outcome of variable selection methods and may thus hamper an objective interpretation of the final model. To enhance such objective interpretation, we here integrate variable selection into the preprocessing selection approach that is based on DoE. We show that the entanglement of preprocessing selection and variable selection not only improves the interpretation, but also the predictive performance of the model. This is achieved by analyzing several experimental data sets of which the true relevant variables are available as prior knowledge. We show that a selection of variables is provided that complies more with the true informative variables compared to individual optimization of both model aspects. Importantly, the approach presented in this work is generic. Different types of models (e.g. PCR, PLS, …) can be incorporated into it, as well as different variable selection methods and different preprocessing methods, according to the taste and experience of

  8. Data acquisition, preprocessing and analysis for the Virginia Tech OLYMPUS experiment

    NASA Technical Reports Server (NTRS)

    Remaklus, P. Will

    1991-01-01

    Virginia Tech is conducting a slant path propagation experiment using the 12, 20, and 30 GHz OLYMPUS beacons. Beacon signal measurements are made using separate terminals for each frequency. In addition, short baseline diversity measurements are collected through a mobile 20 GHz terminal. Data collection is performed with a custom data acquisition and control system. Raw data are preprocessed to remove equipment biases and discontinuities prior to analysis. Preprocessed data are then statistically analyzed to investigate parameters such as frequency scaling, fade slope and duration, and scintillation intensity.

  9. ACTS (Advanced Communications Technology Satellite) Propagation Experiment: Preprocessing Software User's Manual

    NASA Technical Reports Server (NTRS)

    Crane, Robert K.; Wang, Xuhe; Westenhaver, David

    1996-01-01

    The preprocessing software manual describes the Actspp program originally developed to observe and diagnose Advanced Communications Technology Satellite (ACTS) propagation terminal/receiver problems. However, it has been quite useful for automating the preprocessing functions needed to convert the terminal output to useful attenuation estimates. Prior to having data acceptable for archival functions, the individual receiver system must be calibrated and the power level shifts caused by ranging tone modulation must be received. Actspp provides three output files: the daylog, the diurnal coefficient file, and the file that contains calibration information.

  10. Boosting model performance and interpretation by entangling preprocessing selection and variable selection.

    PubMed

    Gerretzen, Jan; Szymańska, Ewa; Bart, Jacob; Davies, Antony N; van Manen, Henk-Jan; van den Heuvel, Edwin R; Jansen, Jeroen J; Buydens, Lutgarde M C

    2016-09-28

    The aim of data preprocessing is to remove data artifacts-such as a baseline, scatter effects or noise-and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames. In that approach, the focus was solely on improving the predictive performance of the chemometric model. This is, however, only one of the two relevant criteria in modeling: interpretation of the model results can be just as important. Variable selection is often used to achieve such interpretation. Data artifacts, however, may hamper proper variable selection by masking the true relevant variables. The choice of preprocessing therefore has a huge impact on the outcome of variable selection methods and may thus hamper an objective interpretation of the final model. To enhance such objective interpretation, we here integrate variable selection into the preprocessing selection approach that is based on DoE. We show that the entanglement of preprocessing selection and variable selection not only improves the interpretation, but also the predictive performance of the model. This is achieved by analyzing several experimental data sets of which the true relevant variables are available as prior knowledge. We show that a selection of variables is provided that complies more with the true informative variables compared to individual optimization of both model aspects. Importantly, the approach presented in this work is generic. Different types of models (e.g. PCR, PLS, …) can be incorporated into it, as well as different variable selection methods and different preprocessing methods, according to the taste and experience of

  11. Influence of Hemp Fibers Pre-processing on Low Density Polyethylene Matrix Composites Properties

    NASA Astrophysics Data System (ADS)

    Kukle, S.; Vidzickis, R.; Zelca, Z.; Belakova, D.; Kajaks, J.

    2016-04-01

    In present research with short hemp fibres reinforced LLDPE matrix composites with fibres content in a range from 30 to 50 wt% subjected to four different pre-processing technologies were produced and such their properties as tensile strength and elongation at break, tensile modulus, melt flow index, micro hardness and water absorption dynamics were investigated. Capillary viscosimetry was used for fluidity evaluation and melt flow index (MFI) evaluated for all variants. MFI of fibres of two pre-processing variants were high enough to increase hemp fibres content from 30 to 50 wt% with moderate increase of water sorption capability.

  12. DNA microarray analyses in higher plants.

    PubMed

    Galbraith, David W

    2006-01-01

    DNA microarrays were originally devised and described as a convenient technology for the global analysis of plant gene expression. Over the past decade, their use has expanded enormously to cover all kingdoms of living organisms. At the same time, the scope of applications of microarrays has increased beyond expression analyses, with plant genomics playing a leadership role in the on-going development of this technology. As the field has matured, the rate-limiting step has moved from that of the technical process of data generation to that of data analysis. We currently face major problems in dealing with the accumulating datasets, not simply with respect to how to archive, access, and process the huge amounts of data that have been and are being produced, but also in determining the relative quality of the different datasets. A major recognized concern is the appropriate use of statistical design in microarray experiments, without which the datasets are rendered useless. A vigorous area of current research involves the development of novel statistical tools specifically for microarray experiments. This article describes, in a necessarily selective manner, the types of platforms currently employed in microarray research and provides an overview of recent activities using these platforms in plant biology.

  13. Oligonucleotide microarrays in constitutional genetic diagnosis.

    PubMed

    Keren, Boris; Le Caignec, Cedric

    2011-06-01

    Oligonucleotide microarrays such as comparative genomic hybridization arrays and SNP microarrays enable the identification of genomic imbalances - also termed copy-number variants - with increasing resolution. This article will focus on the most significant applications of high-throughput oligonucleotide microarrays, both in genetic diagnosis and research. In genetic diagnosis, the method is becoming a standard tool for investigating patients with unexplained developmental delay/intellectual disability, autism spectrum disorders and/or with multiple congenital anomalies. Oligonucleotide microarray have also been recently applied to the detection of genomic imbalances in prenatal diagnosis either to characterize a chromosomal rearrangement that has previously been identified by standard prenatal karyotyping or to detect a cryptic genomic imbalance in a fetus with ultrasound abnormalities and a normal standard prenatal karyotype. In research, oligonucleotide microarrays have been used for a wide range of applications, such as the identification of new genes responsible for monogenic disorders and the association of a copy-number variant as a predisposing factor to a common disease. Despite its widespread use, the interpretation of results is not always straightforward. We will discuss several unexpected results and ethical issues raised by these new methods.

  14. Advancing Microarray Assembly with Acoustic Dispensing Technology

    PubMed Central

    Wong, E. Y.; Diamond, S. L.

    2011-01-01

    In the assembly of microarrays and microarray-based chemical assays and enzymatic bioassays, most approaches use pins for contact spotting. Acoustic dispensing is a technology capable of nanoliter transfers by using acoustic energy to eject liquid sample from an open source well. Although typically used for well plate transfers, when applied to microarraying it avoids drawbacks of undesired physical contact with sample, difficulty in assembling multicomponent reactions on a chip by readdressing, a rigid mode of printing that lacks patterning capabilities, and time-consuming wash steps. We demonstrated the utility of acoustic dispensing by delivering human cathepsin L in a drop-on-drop fashion into individual 50-nanoliter, pre-spotted reaction volumes to activate enzyme reactions at targeted positions on a microarray. We generated variable-sized spots ranging from 200 to 750 μm (and higher), and handled the transfer of fluorescent bead suspensions with increasing source well concentrations of 0.1 to 10 ×108 beads/mL in a linear fashion. There are no tips that can clog and liquid dispensing CVs are generally below 5%. This platform expands the toolbox for generating analytical arrays and meets needs associated with spatially-addressed assembly of multicomponent microarrays on the nanoliter scale. PMID:19035650

  15. A Synthetic Kinome Microarray Data Generator

    PubMed Central

    Maleki, Farhad; Kusalik, Anthony

    2015-01-01

    Cellular pathways involve the phosphorylation and dephosphorylation of proteins. Peptide microarrays called kinome arrays facilitate the measurement of the phosphorylation activity of hundreds of proteins in a single experiment. Analyzing the data from kinome microarrays is a multi-step process. Typically, various techniques are possible for a particular step, and it is necessary to compare and evaluate them. Such evaluations require data for which correct analysis results are known. Unfortunately, such kinome data is not readily available in the community. Further, there are no established techniques for creating artificial kinome datasets with known results and with the same characteristics as real kinome datasets. In this paper, a methodology for generating synthetic kinome array data is proposed. The methodology relies on actual intensity measurements from kinome microarray experiments and preserves their subtle characteristics. The utility of the methodology is demonstrated by evaluating methods for eliminating heterogeneous variance in kinome microarray data. Phosphorylation intensities from kinome microarrays often exhibit such heterogeneous variance and its presence can negatively impact downstream statistical techniques that rely on homogeneity of variance. It is shown that using the output from the proposed synthetic data generator, it is possible to critically compare two variance stabilization methods. PMID:27600233

  16. Immune-Signatures for Lung Cancer Diagnostics: Evaluation of Protein Microarray Data Normalization Strategies

    PubMed Central

    Brezina, Stefanie; Soldo, Regina; Kreuzhuber, Roman; Hofer, Philipp; Gsur, Andrea; Weinhaeusel, Andreas

    2015-01-01

    New minimal invasive diagnostic methods for early detection of lung cancer are urgently needed. It is known that the immune system responds to tumors with production of tumor-autoantibodies. Protein microarrays are a suitable highly multiplexed platform for identification of autoantibody signatures against tumor-associated antigens (TAA). These microarrays can be probed using 0.1 mg immunoglobulin G (IgG), purified from 10 µL of plasma. We used a microarray comprising recombinant proteins derived from 15,417 cDNA clones for the screening of 100 lung cancer samples, including 25 samples of each main histological entity of lung cancer, and 100 controls. Since this number of samples cannot be processed at once, the resulting data showed non-biological variances due to “batch effects”. Our aim was to evaluate quantile normalization, “distance-weighted discrimination” (DWD), and “ComBat” for their effectiveness in data pre-processing for elucidating diagnostic immune-signatures. “ComBat” data adjustment outperformed the other methods and allowed us to identify classifiers for all lung cancer cases versus controls and small-cell, squamous cell, large-cell, and adenocarcinoma of the lung with an accuracy of 85%, 94%, 96%, 92%, and 83% (sensitivity of 0.85, 0.92, 0.96, 0.88, 0.83; specificity of 0.85, 0.96, 0.96, 0.96, 0.83), respectively. These promising data would be the basis for further validation using targeted autoantibody tests.

  17. t-Test at the Probe Level: An Alternative Method to Identify Statistically Significant Genes for Microarray Data

    PubMed Central

    Boareto, Marcelo; Caticha, Nestor

    2014-01-01

    Microarray data analysis typically consists in identifying a list of differentially expressed genes (DEG), i.e., the genes that are differentially expressed between two experimental conditions. Variance shrinkage methods have been considered a better choice than the standard t-test for selecting the DEG because they correct the dependence of the error with the expression level. This dependence is mainly caused by errors in background correction, which more severely affects genes with low expression values. Here, we propose a new method for identifying the DEG that overcomes this issue and does not require background correction or variance shrinkage. Unlike current methods, our methodology is easy to understand and implement. It consists of applying the standard t-test directly on the normalized intensity data, which is possible because the probe intensity is proportional to the gene expression level and because the t-test is scale- and location-invariant. This methodology considerably improves the sensitivity and robustness of the list of DEG when compared with the t-test applied to preprocessed data and to the most widely used shrinkage methods, Significance Analysis of Microarrays (SAM) and Linear Models for Microarray Data (LIMMA). Our approach is useful especially when the genes of interest have small differences in expression and therefore get ignored by standard variance shrinkage methods.

  18. An image-data-compression algorithm

    NASA Technical Reports Server (NTRS)

    Hilbert, E. E.; Rice, R. F.

    1981-01-01

    Cluster Compression Algorithm (CCA) preprocesses Landsat image data immediately following satellite data sensor (receiver). Data are reduced by extracting pertinent image features and compressing this result into concise format for transmission to ground station. This results in narrower transmission bandwidth, increased data-communication efficiency, and reduced computer time in reconstructing and analyzing image. Similar technique could be applied to other types of recorded data to cut costs of transmitting, storing, distributing, and interpreting complex information.

  19. Protein microarrays for parasite antigen discovery.

    PubMed

    Driguez, Patrick; Doolan, Denise L; Molina, Douglas M; Loukas, Alex; Trieu, Angela; Felgner, Phil L; McManus, Donald P

    2015-01-01

    The host serological profile to a parasitic infection, such as schistosomiasis, can be used to define potential vaccine and diagnostic targets. Determining the host antibody response using traditional approaches is hindered by the large number of putative antigens in any parasite proteome. Parasite protein microarrays offer the potential for a high-throughput host antibody screen to simplify this task. In order to construct the array, parasite proteins are selected from available genomic sequence and protein databases using bioinformatic tools. Selected open reading frames are PCR amplified, incorporated into a vector for cell-free protein expression, and printed robotically onto glass slides. The protein microarrays can be probed with antisera from infected/immune animals or humans and the antibody reactivity measured with fluorophore labeled antibodies on a confocal laser microarray scanner to identify potential targets for diagnosis or therapeutic or prophylactic intervention. PMID:25388117

  20. Preprocessing of SAR interferometric data using anisotropic diffusion filter

    NASA Astrophysics Data System (ADS)

    Sartor, Kenneth; Allen, Josef De Vaughn; Ganthier, Emile; Tenali, Gnana Bhaskar

    2007-04-01

    The most commonly used smoothing algorithms for complex data processing are blurring functions (i.e., Hanning, Taylor weighting, Gaussian, etc.). Unfortunately, the filters so designed blur the edges in a Synthetic Aperture Radar (SAR) scene, reduce the accuracy of features, and blur the fringe lines in an interferogram. For the Digital Surface Map (DSM) extraction, the blurring of these fringe lines causes inaccuracies in the height of the unwrapped terrain surface. Our goal here is to perform spatially non-uniform smoothing to overcome the above mentioned disadvantages. This is achieved by using a Complex Anisotropic Non-Linear Diffuser (CANDI) filter that is a spatially varying. In particular, an appropriate choice of the convection function in the CANDI filter is able to accomplish the non-uniform smoothing. This boundary sharpening intra-region smoothing filter acts on interferometric SAR (IFSAR) data with noise to produce an interferogram with significantly reduced noise contents and desirable local smoothing. Results of CANDI filtering will be discussed and compared with those obtained by using the standard filters on simulated data.

  1. Comparison and improvement of algorithms for computing minimal cut sets

    PubMed Central

    2013-01-01

    Background Constrained minimal cut sets (cMCSs) have recently been introduced as a framework to enumerate minimal genetic intervention strategies for targeted optimization of metabolic networks. Two different algorithmic schemes (adapted Berge algorithm and binary integer programming) have been proposed to compute cMCSs from elementary modes. However, in their original formulation both algorithms are not fully comparable. Results Here we show that by a small extension to the integer program both methods become equivalent. Furthermore, based on well-known preprocessing procedures for integer programming we present efficient preprocessing steps which can be used for both algorithms. We then benchmark the numerical performance of the algorithms in several realistic medium-scale metabolic models. The benchmark calculations reveal (i) that these preprocessing steps can lead to an enormous speed-up under both algorithms, and (ii) that the adapted Berge algorithm outperforms the binary integer approach. Conclusions Generally, both of our new implementations are by at least one order of magnitude faster than other currently available implementations. PMID:24191903

  2. Hybridization and Selective Release of DNA Microarrays

    SciTech Connect

    Beer, N R; Baker, B; Piggott, T; Maberry, S; Hara, C M; DeOtte, J; Benett, W; Mukerjee, E; Dzenitis, J; Wheeler, E K

    2011-11-29

    DNA microarrays contain sequence specific probes arrayed in distinct spots numbering from 10,000 to over 1,000,000, depending on the platform. This tremendous degree of multiplexing gives microarrays great potential for environmental background sampling, broad-spectrum clinical monitoring, and continuous biological threat detection. In practice, their use in these applications is not common due to limited information content, long processing times, and high cost. The work focused on characterizing the phenomena of microarray hybridization and selective release that will allow these limitations to be addressed. This will revolutionize the ways that microarrays can be used for LLNL's Global Security missions. The goals of this project were two-fold: automated faster hybridizations and selective release of hybridized features. The first study area involves hybridization kinetics and mass-transfer effects. the standard hybridization protocol uses an overnight incubation to achieve the best possible signal for any sample type, as well as for convenience in manual processing. There is potential to significantly shorten this time based on better understanding and control of the rate-limiting processes and knowledge of the progress of the hybridization. In the hybridization work, a custom microarray flow cell was used to manipulate the chemical and thermal environment of the array and autonomously image the changes over time during hybridization. The second study area is selective release. Microarrays easily generate hybridization patterns and signatures, but there is still an unmet need for methodologies enabling rapid and selective analysis of these patterns and signatures. Detailed analysis of individual spots by subsequent sequencing could potentially yield significant information for rapidly mutating and emerging (or deliberately engineered) pathogens. In the selective release work, optical energy deposition with coherent light quickly provides the thermal energy to

  3. Photo-Generation of Carbohydrate Microarrays

    NASA Astrophysics Data System (ADS)

    Carroll, Gregory T.; Wang, Denong; Turro, Nicholas J.; Koberstein, Jeffrey T.

    The unparalleled structural diversity of carbohydrates among biological molecules has been recognized for decades. Recent studies have highlighted carbohydrate signaling roles in many important biological processes, such as fertilization, embryonic development, cell differentiation and cellȁ4cell communication, blood coagulation, inflammation, chemotaxis, as well as host recognition and immune responses to microbial pathogens. In this chapter, we summarize recent progress in the establishment of carbohydrate-based microarrays and the application of these technologies in exploring the biological information content in carbohydrates. A newly established photochemical platform of carbohydrate microarrays serves as a model for a focused discussion.

  4. Protein Microarrays for the Detection of Biothreats

    NASA Astrophysics Data System (ADS)

    Herr, Amy E.

    Although protein microarrays have proven to be an important tool in proteomics research, the technology is emerging as useful for public health and defense applications. Recent progress in the measurement and characterization of biothreat agents is reviewed in this chapter. Details concerning validation of various protein microarray formats, from contact-printed sandwich assays to supported lipid bilayers, are presented. The reviewed technologies have important implications for in vitro characterization of toxin-ligand interactions, serotyping of bacteria, screening of potential biothreat inhibitors, and as core components of biosensors, among others, research and engineering applications.

  5. Pineal Function: Impact of Microarray Analysis

    PubMed Central

    Klein, David C.; Bailey, Michael J.; Carter, David A.; Kim, Jong-so; Shi, Qiong; Ho, Anthony; Chik, Constance; Gaildrat, Pascaline; Morin, Fabrice; Ganguly, Surajit; Rath, Martin F.; Møller, Morten; Sugden, David; Rangel, Zoila G.; Munson, Peter J.; Weller, Joan L.; Coon, Steven L.

    2009-01-01

    Microarray analysis has provided a new understanding of pineal function by identifying genes that are highly expressed in this tissue relative to other tissues and also by identifying over 600 genes that are expressed on a 24-hour schedule. This effort has highlighted surprising similarity to the retina and has provided reason to explore new avenues of study including intracellular signaling, signal transduction, transcriptional cascades, thyroid/retinoic acid hormone signaling, metal biology, RNA splicing, and the role the pineal gland plays in the immune/inflammation response. The new foundation that microarray analysis has provided will broadly support future research on pineal function. PMID:19622385

  6. The use of microarrays in microbial ecology

    SciTech Connect

    Andersen, G.L.; He, Z.; DeSantis, T.Z.; Brodie, E.L.; Zhou, J.

    2009-09-15

    Microarrays have proven to be a useful and high-throughput method to provide targeted DNA sequence information for up to many thousands of specific genetic regions in a single test. A microarray consists of multiple DNA oligonucleotide probes that, under high stringency conditions, hybridize only to specific complementary nucleic acid sequences (targets). A fluorescent signal indicates the presence and, in many cases, the abundance of genetic regions of interest. In this chapter we will look at how microarrays are used in microbial ecology, especially with the recent increase in microbial community DNA sequence data. Of particular interest to microbial ecologists, phylogenetic microarrays are used for the analysis of phylotypes in a community and functional gene arrays are used for the analysis of functional genes, and, by inference, phylotypes in environmental samples. A phylogenetic microarray that has been developed by the Andersen laboratory, the PhyloChip, will be discussed as an example of a microarray that targets the known diversity within the 16S rRNA gene to determine microbial community composition. Using multiple, confirmatory probes to increase the confidence of detection and a mismatch probe for every perfect match probe to minimize the effect of cross-hybridization by non-target regions, the PhyloChip is able to simultaneously identify any of thousands of taxa present in an environmental sample. The PhyloChip is shown to reveal greater diversity within a community than rRNA gene sequencing due to the placement of the entire gene product on the microarray compared with the analysis of up to thousands of individual molecules by traditional sequencing methods. A functional gene array that has been developed by the Zhou laboratory, the GeoChip, will be discussed as an example of a microarray that dynamically identifies functional activities of multiple members within a community. The recent version of GeoChip contains more than 24,000 50mer

  7. MicroRNA expression profiling using microarrays.

    PubMed

    Love, Cassandra; Dave, Sandeep

    2013-01-01

    MicroRNAs are small noncoding RNAs which are able to regulate gene expression at both the transcriptional and translational levels. There is a growing recognition of the role of microRNAs in nearly every tissue type and cellular process. Thus there is an increasing need for accurate quantitation of microRNA expression in a variety of tissues. Microarrays provide a robust method for the examination of microRNA expression. In this chapter, we describe detailed methods for the use of microarrays to measure microRNA expression and discuss methods for the analysis of microRNA expression data. PMID:23666707

  8. SNOMAD (Standardization and NOrmalization of MicroArray Data): web-accessible gene expression data analysis.

    PubMed

    Colantuoni, Carlo; Henry, George; Zeger, Scott; Pevsner, Jonathan

    2002-11-01

    SNOMAD is a collection of algorithms for the normalization and standardization of gene expression datasets derived from diverse biological and technological sources. In addition to conventional transformations and visualization tools, SNOMAD includes two non-linear transformations which correct for bias and variance which are non-uniformly distributed across the range of microarray element signal intensities: (1). Local mean normalization; and (2). Local variance correction (Z-score generation using a locally calculated standard deviation).

  9. Pre-processing SAR image stream to facilitate compression for transport on bandwidth-limited-link

    SciTech Connect

    Rush, Bobby G.; Riley, Robert

    2015-09-29

    Pre-processing is applied to a raw VideoSAR (or similar near-video rate) product to transform the image frame sequence into a product that resembles more closely the type of product for which conventional video codecs are designed, while sufficiently maintaining utility and visual quality of the product delivered by the codec.

  10. Integrated Multi-Strategic Web Document Pre-Processing for Sentence and Word Boundary Detection.

    ERIC Educational Resources Information Center

    Shim, Junhyeok; Kim, Dongseok; Cha, Jeongwon; Lee, Gary Geunbae; Seo, Jungyun

    2002-01-01

    Discussion of natural language processing focuses on a multi-strategic integrated text preprocessing method for difficult problems of sentence boundary disambiguation and word boundary disambiguation of Web texts. Describes an evaluation of the method using Korean Web document collections. (Author/LRW)

  11. Parafoveal Preprocessing in Reading Revisited: Evidence from a Novel Preview Manipulation

    ERIC Educational Resources Information Center

    Gagl, Benjamin; Hawelka, Stefan; Richlan, Fabio; Schuster, Sarah; Hutzler, Florian

    2014-01-01

    The study investigated parafoveal preprocessing by the means of the classical invisible boundary paradigm and a novel manipulation of the parafoveal previews (i.e., visual degradation). Eye movements were investigated on 5-letter target words with constraining (i.e., highly informative) initial letters or similarly constraining final letters.…

  12. Microarray data mining: A novel optimization-based approach to uncover biologically coherent structures

    PubMed Central

    Tan, Meng P; Smith, Erin N; Broach, James R; Floudas, Christodoulos A

    2008-01-01

    Background DNA microarray technology allows for the measurement of genome-wide expression patterns. Within the resultant mass of data lies the problem of analyzing and presenting information on this genomic scale, and a first step towards the rapid and comprehensive interpretation of this data is gene clustering with respect to the expression patterns. Classifying genes into clusters can lead to interesting biological insights. In this study, we describe an iterative clustering approach to uncover biologically coherent structures from DNA microarray data based on a novel clustering algorithm EP_GOS_Clust. Results We apply our proposed iterative algorithm to three sets of experimental DNA microarray data from experiments with the yeast Saccharomyces cerevisiae and show that the proposed iterative approach improves biological coherence. Comparison with other clustering techniques suggests that our iterative algorithm provides superior performance with regard to biological coherence. An important consequence of our approach is that an increasing proportion of genes find membership in clusters of high biological coherence and that the average cluster specificity improves. Conclusion The results from these clustering experiments provide a robust basis for extracting motifs and trans-acting factors that determine particular patterns of expression. In addition, the biological coherence of the clusters is iteratively assessed independently of the clustering. Thus, this method will not be severely impacted by functional annotations that are missing, inaccurate, or sparse. PMID:18538024

  13. A Combinational Clustering Based Method for cDNA Microarray Image Segmentation.

    PubMed

    Shao, Guifang; Li, Tiejun; Zuo, Wangda; Wu, Shunxiang; Liu, Tundong

    2015-01-01

    Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1) Stanford Microarray Database (SMD), 2) Gene Expression Omnibus (GEO), 3) Baylor College of Medicine (BCM), 4) Swiss Institute of Bioinformatics (SIB), 5) Joe DeRisi's individual tiff files (DeRisi), and 6) University of California, San Francisco (UCSF), indicate that the improved approach is

  14. A Combinational Clustering Based Method for cDNA Microarray Image Segmentation

    PubMed Central

    Shao, Guifang; Li, Tiejun; Zuo, Wangda; Wu, Shunxiang; Liu, Tundong

    2015-01-01

    Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1) Stanford Microarray Database (SMD), 2) Gene Expression Omnibus (GEO), 3) Baylor College of Medicine (BCM), 4) Swiss Institute of Bioinformatics (SIB), 5) Joe DeRisi’s individual tiff files (DeRisi), and 6) University of California, San Francisco (UCSF), indicate that the improved approach is

  15. A Combinational Clustering Based Method for cDNA Microarray Image Segmentation.

    PubMed

    Shao, Guifang; Li, Tiejun; Zuo, Wangda; Wu, Shunxiang; Liu, Tundong

    2015-01-01

    Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1) Stanford Microarray Database (SMD), 2) Gene Expression Omnibus (GEO), 3) Baylor College of Medicine (BCM), 4) Swiss Institute of Bioinformatics (SIB), 5) Joe DeRisi's individual tiff files (DeRisi), and 6) University of California, San Francisco (UCSF), indicate that the improved approach is

  16. Value of Distributed Preprocessing of Biomass Feedstocks to a Bioenergy Industry

    SciTech Connect

    Christopher T Wright

    2006-07-01

    Biomass preprocessing is one of the primary operations in the feedstock assembly system and the front-end of a biorefinery. Its purpose is to chop, grind, or otherwise format the biomass into a suitable feedstock for conversion to ethanol and other bioproducts. Many variables such as equipment cost and efficiency, and feedstock moisture content, particle size, bulk density, compressibility, and flowability affect the location and implementation of this unit operation. Previous conceptual designs show this operation to be located at the front-end of the biorefinery. However, data are presented that show distributed preprocessing at the field-side or in a fixed preprocessing facility can provide significant cost benefits by producing a higher value feedstock with improved handling, transporting, and merchandising potential. In addition, data supporting the preferential deconstruction of feedstock materials due to their bio-composite structure identifies the potential for significant improvements in equipment efficiencies and compositional quality upgrades. Theses data are collected from full-scale low and high capacity hammermill grinders with various screen sizes. Multiple feedstock varieties with a range of moisture values were used in the preprocessing tests. The comparative values of the different grinding configurations, feedstock varieties, and moisture levels are assessed through post-grinding analysis of the different particle fractions separated with a medium-scale forage particle separator and a Rototap separator. The results show that distributed preprocessing produces a material that has bulk flowable properties and fractionation benefits that can improve the ease of transporting, handling and conveying the material to the biorefinery and improve the biochemical and thermochemical conversion processes.

  17. Preprocessing strategy influences graph-based exploration of altered functional networks in major depression.

    PubMed

    Borchardt, Viola; Lord, Anton Richard; Li, Meng; van der Meer, Johan; Heinze, Hans-Jochen; Bogerts, Bernhard; Breakspear, Michael; Walter, Martin

    2016-04-01

    Resting-state fMRI studies have gained widespread use in exploratory studies of neuropsychiatric disorders. Graph metrics derived from whole brain functional connectivity studies have been used to reveal disease-related variations in many neuropsychiatric disorders including major depression (MDD). These techniques show promise in developing diagnostics for these often difficult to identify disorders. However, the analysis of resting-state datasets is increasingly beset by a myriad of approaches and methods, each with underlying assumptions. Choosing the most appropriate preprocessing parameters a priori is difficult. Nevertheless, the specific methodological choice influences graph-theoretical network topologies as well as regional metrics. The aim of this study was to systematically compare different preprocessing strategies by evaluating their influence on group differences between healthy participants (HC) and depressive patients. We thus investigated the effects of common preprocessing variants, including global mean-signal regression (GMR), temporal filtering, detrending, and network sparsity on group differences between brain networks of HC and MDD patients measured by global and nodal graph theoretical metrics. Occurrence of group differences in global metrics was absent in the majority of tested preprocessing variants, but in local graph metrics it is sparse, variable, and highly dependent on the combination of preprocessing variant and sparsity threshold. Sparsity thresholds between 16 and 22% were shown to have the greatest potential to reveal differences between HC and MDD patients in global and local network metrics. Our study offers an overview of consequences of methodological decisions and which neurobiological characteristics of MDD they implicate, adding further caution to this rapidly growing field.

  18. A comprehensive analysis about the influence of low-level preprocessing techniques on mass spectrometry data for sample classification.

    PubMed

    López-Fernández, Hugo; Reboiro-Jato, Miguel; Glez-Peña, Daniel; Fernández-Riverola, Florentino

    2014-01-01

    Matrix-Assisted Laser Desorption Ionisation Time-of-Flight (MALDI-TOF) is one of the high-throughput mass spectrometry technologies able to produce data requiring an extensive preprocessing before subsequent analyses. In this context, several low-level preprocessing techniques have been successfully developed for different tasks, including baseline correction, smoothing, normalisation, peak detection and peak alignment. In this work, we present a systematic comparison of different software packages aiding in the compulsory preprocessing of MALDI-TOF data. In order to guarantee the validity of our study, we test multiple configurations of each preprocessing technique that are subsequently used to train a set of classifiers whose performance (kappa and accuracy) provide us accurate information for the final comparison. Results from experiments show the real impact of preprocessing techniques on classification, evidencing that MassSpecWavelet provides the best performance and Support Vector Machines (SVM) are one of the most accurate classifiers.

  19. Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.

    PubMed

    Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar

    2016-04-01

    Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data.

  20. Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.

    PubMed

    Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar

    2016-04-01

    Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data. PMID:26975600

  1. Diagnostic Oligonucleotide Microarray Fingerprinting of Bacillus Isolates

    SciTech Connect

    Chandler, Darrell P.; Alferov, Oleg; Chernov, Boris; Daly, Don S.; Golova, Julia; Perov, Alexander N.; Protic, Miroslava; Robison, Richard; Shipma, Matthew; White, Amanda M.; Willse, Alan R.

    2006-01-01

    A diagnostic, genome-independent microbial fingerprinting method using DNA oligonucleotide microarrays was used for high-resolution differentiation between closely related Bacillus strains, including two strains of Bacillus anthracis that are monomorphic (indistinguishable) via amplified fragment length polymorphism fingerprinting techniques. Replicated hybridizations on 391-probe nonamer arrays were used to construct a prototype fingerprint library for quantitative comparisons. Descriptive analysis of the fingerprints, including phylogenetic reconstruction, is consistent with previous taxonomic organization of the genus. Newly developed statistical analysis methods were used to quantitatively compare and objectively confirm apparent differences in microarray fingerprints with the statistical rigor required for microbial forensics and clinical diagnostics. These data suggest that a relatively simple fingerprinting microarray and statistical analysis method can differentiate between species in the Bacillus cereus complex, and between strains of B. anthracis. A synthetic DNA standard was used to understand underlying microarray and process-level variability, leading to specific recommendations for the development of a standard operating procedure and/or continued technology enhancements for microbial forensics and diagnostics.

  2. Shrinkage covariance matrix approach for microarray data

    NASA Astrophysics Data System (ADS)

    Karjanto, Suryaefiza; Aripin, Rasimah

    2013-04-01

    Microarray technology was developed for the purpose of monitoring the expression levels of thousands of genes. A microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints including the high cost of producing microarray chips. As a result, the widely used standard covariance estimator is not appropriate for this purpose. One such technique is the Hotelling's T2 statistic which is a multivariate test statistic for comparing means between two groups. It requires that the number of observations (n) exceeds the number of genes (p) in the set but in microarray studies it is common that n < p. This leads to a biased estimate of the covariance matrix. In this study, the Hotelling's T2 statistic with the shrinkage approach is proposed to estimate the covariance matrix for testing differential gene expression. The performance of this approach is then compared with other commonly used multivariate tests using a widely analysed diabetes data set as illustrations. The results across the methods are consistent, implying that this approach provides an alternative to existing techniques.

  3. Microarrays (DNA Chips) for the Classroom Laboratory

    ERIC Educational Resources Information Center

    Barnard, Betsy; Sussman, Michael; BonDurant, Sandra Splinter; Nienhuis, James; Krysan, Patrick

    2006-01-01

    We have developed and optimized the necessary laboratory materials to make DNA microarray technology accessible to all high school students at a fraction of both cost and data size. The primary component is a DNA chip/array that students "print" by hand and then analyze using research tools that have been adapted for classroom use. The primary…

  4. DISC-BASED IMMUNOASSAY MICROARRAYS. (R825433)

    EPA Science Inventory

    Microarray technology as applied to areas that include genomics, diagnostics, environmental, and drug discovery, is an interesting research topic for which different chip-based devices have been developed. As an alternative, we have explored the principle of compact disc-based...

  5. MICROARRAY DATA ANALYSIS USING MULTIPLE STATISTICAL MODELS

    EPA Science Inventory

    Microarray Data Analysis Using Multiple Statistical Models

    Wenjun Bao1, Judith E. Schmid1, Amber K. Goetz1, Ming Ouyang2, William J. Welsh2,Andrew I. Brooks3,4, ChiYi Chu3,Mitsunori Ogihara3,4, Yinhe Cheng5, David J. Dix1. 1National Health and Environmental Effects Researc...

  6. Raman-based microarray readout: a review.

    PubMed

    Haisch, Christoph

    2016-07-01

    For a quarter of a century, microarrays have been part of the routine analytical toolbox. Label-based fluorescence detection is still the commonest optical readout strategy. Since the 1990s, a continuously increasing number of label-based as well as label-free experiments on Raman-based microarray readout concepts have been reported. This review summarizes the possible concepts and methods and their advantages and challenges. A common label-based strategy is based on the binding of selective receptors as well as Raman reporter molecules to plasmonic nanoparticles in a sandwich immunoassay, which results in surface-enhanced Raman scattering signals of the reporter molecule. Alternatively, capture of the analytes can be performed by receptors on a microarray surface. Addition of plasmonic nanoparticles again leads to a surface-enhanced Raman scattering signal, not of a label but directly of the analyte. This approach is mostly proposed for bacteria and cell detection. However, although many promising readout strategies have been discussed in numerous publications, rarely have any of them made the step from proof of concept to a practical application, let alone routine use. Graphical Abstract Possible realization of a SERS (Surface-Enhanced Raman Scattering) system for microarray readout. PMID:26973235

  7. PRACTICAL STRATEGIES FOR PROCESSING AND ANALYZING SPOTTED OLIGONUCLEOTIDE MICROARRAY DATA

    EPA Science Inventory

    Thoughtful data analysis is as important as experimental design, biological sample quality, and appropriate experimental procedures for making microarrays a useful supplement to traditional toxicology. In the present study, spotted oligonucleotide microarrays were used to profile...

  8. Examining microarray slide quality for the EPA using SNL's hyperspectral microarray scanner.

    SciTech Connect

    Rohde, Rachel M.; Timlin, Jerilyn Ann

    2005-11-01

    This report summarizes research performed at Sandia National Laboratories (SNL) in collaboration with the Environmental Protection Agency (EPA) to assess microarray quality on arrays from two platforms of interest to the EPA. Custom microarrays from two novel, commercially produced array platforms were imaged with SNL's unique hyperspectral imaging technology and multivariate data analysis was performed to investigate sources of emission on the arrays. No extraneous sources of emission were evident in any of the array areas scanned. This led to the conclusions that either of these array platforms could produce high quality, reliable microarray data for the EPA toxicology programs. Hyperspectral imaging results are presented and recommendations for microarray analyses using these platforms are detailed within the report.

  9. Data Mining for Tectonic Tremor in a Large Global Seismogram Database using Preprocessed Data Quality Measurements

    NASA Astrophysics Data System (ADS)

    Rasor, B. A.; Brudzinski, M. R.

    2013-12-01

    The collision of plates at subduction zones yields the potential for disastrous earthquakes, yet the processes that lead up to these events are still largely unclear and make them difficult to forecast. Recent advancements in seismic monitoring has revealed subtle ground vibrations termed tectonic tremor that occur as long-lived swarms of narrow bandwidth activity, different from local earthquakes of comparable amplitude that create brief signals of broader, higher frequency. The close proximity of detected tremor events to the lower edge of the seismogenic zone along the subduction interface suggests a potential triggering relationship between tremor and megathrust earthquakes. Most tremor catalogs are constructed with detection methods that involve an exhausting download of years of high sample rate seismic data, as well as large computation power to process the large data volume and identify temporal patterns of tremor activity. We have developed a tremor detection method that employs the underutilized Quality Analysis Control Kit (QuACK), originally built to analyze station performance and identify instrument problems across the many seismic networks that contribute data to one of the largest seismogram databases in the world (IRIS DMC). The QuACK dataset stores seismogram amplitudes at a wide range of frequencies calculated every hour since 2005 for most stations achieved in the IRIS DMC. Such a preprocessed dataset is advantageous considering several tremor detection techniques use hourly seismic amplitudes in the frequency band where tremor is most active (2-5 Hz) to characterize the time history of tremor. Yet these previous detection techniques have relied on downloading years of 40-100 sample-per-second data to make the calculations, which typically takes several days on a 36-node high-performance cluster to calculate the amplitude variations for a single station. Processing times are even longer for a recently developed detection algorithm that utilize

  10. Identifying Fishes through DNA Barcodes and Microarrays

    PubMed Central

    Kochzius, Marc; Seidel, Christian; Antoniou, Aglaia; Botla, Sandeep Kumar; Campo, Daniel; Cariani, Alessia; Vazquez, Eva Garcia; Hauschild, Janet; Hervet, Caroline; Hjörleifsdottir, Sigridur; Hreggvidsson, Gudmundur; Kappel, Kristina; Landi, Monica; Magoulas, Antonios; Marteinsson, Viggo; Nölte, Manfred; Planes, Serge; Tinti, Fausto; Turan, Cemal; Venugopal, Moleyur N.; Weber, Hannes; Blohm, Dietmar

    2010-01-01

    Background International fish trade reached an import value of 62.8 billion Euro in 2006, of which 44.6% are covered by the European Union. Species identification is a key problem throughout the life cycle of fishes: from eggs and larvae to adults in fisheries research and control, as well as processed fish products in consumer protection. Methodology/Principal Findings This study aims to evaluate the applicability of the three mitochondrial genes 16S rRNA (16S), cytochrome b (cyt b), and cytochrome oxidase subunit I (COI) for the identification of 50 European marine fish species by combining techniques of “DNA barcoding” and microarrays. In a DNA barcoding approach, neighbour Joining (NJ) phylogenetic trees of 369 16S, 212 cyt b, and 447 COI sequences indicated that cyt b and COI are suitable for unambiguous identification, whereas 16S failed to discriminate closely related flatfish and gurnard species. In course of probe design for DNA microarray development, each of the markers yielded a high number of potentially species-specific probes in silico, although many of them were rejected based on microarray hybridisation experiments. None of the markers provided probes to discriminate the sibling flatfish and gurnard species. However, since 16S-probes were less negatively influenced by the “position of label” effect and showed the lowest rejection rate and the highest mean signal intensity, 16S is more suitable for DNA microarray probe design than cty b and COI. The large portion of rejected COI-probes after hybridisation experiments (>90%) renders the DNA barcoding marker as rather unsuitable for this high-throughput technology. Conclusions/Significance Based on these data, a DNA microarray containing 64 functional oligonucleotide probes for the identification of 30 out of the 50 fish species investigated was developed. It represents the next step towards an automated and easy-to-handle method to identify fish, ichthyoplankton, and fish products. PMID

  11. Facilitating functional annotation of chicken microarray data

    PubMed Central

    2009-01-01

    Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO). However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM) tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and will be updated on regular

  12. Microarray analysis at single molecule resolution

    PubMed Central

    Mureşan, Leila; Jacak, Jarosław; Klement, Erich Peter; Hesse, Jan; Schütz, Gerhard J.

    2010-01-01

    Bioanalytical chip-based assays have been enormously improved in sensitivity in the recent years; detection of trace amounts of substances down to the level of individual fluorescent molecules has become state of the art technology. The impact of such detection methods, however, has yet not fully been exploited, mainly due to a lack in appropriate mathematical tools for robust data analysis. One particular example relates to the analysis of microarray data. While classical microarray analysis works at resolutions of two to 20 micrometers and quantifies the abundance of target molecules by determining average pixel intensities, a novel high resolution approach [1] directly visualizes individual bound molecules as diffraction limited peaks. The now possible quantification via counting is less susceptible to labeling artifacts and background noise. We have developed an approach for the analysis of high-resolution microarray images. It consists first of a single molecule detection step, based on undecimated wavelet transforms, and second, of a spot identification step via spatial statistics approach (corresponding to the segmentation step in the classical microarray analysis). The detection method was tested on simulated images with a concentration range of 0.001 to 0.5 molecules per square micron and signal-to-noise ratio (SNR) between 0.9 and 31.6. For SNR above 15 the false negatives relative error was below 15%. Separation of foreground/background proved reliable, in case foreground density exceeds background by a factor of 2. The method has also been applied to real data from high-resolution microarray measurements. PMID:20123580

  13. Advanced Recording and Preprocessing of Physiological Signals. [data processing equipment for flow measurement of blood flow by ultrasonics

    NASA Technical Reports Server (NTRS)

    Bentley, P. B.

    1975-01-01

    The measurement of the volume flow-rate of blood in an artery or vein requires both an estimate of the flow velocity and its spatial distribution and the corresponding cross-sectional area. Transcutaneous measurements of these parameters can be performed using ultrasonic techniques that are analogous to the measurement of moving objects by use of a radar. Modern digital data recording and preprocessing methods were applied to the measurement of blood-flow velocity by means of the CW Doppler ultrasonic technique. Only the average flow velocity was measured and no distribution or size information was obtained. Evaluations of current flowmeter design and performance, ultrasonic transducer fabrication methods, and other related items are given. The main thrust was the development of effective data-handling and processing methods by application of modern digital techniques. The evaluation resulted in useful improvements in both the flowmeter instrumentation and the ultrasonic transducers. Effective digital processing algorithms that provided enhanced blood-flow measurement accuracy and sensitivity were developed. Block diagrams illustrative of the equipment setup are included.

  14. Pre-processing of data coming from a laser-EMAT system for non-destructive testing of steel slabs.

    PubMed

    Sgarbi, Mirko; Colla, Valentina; Cateni, Sivia; Higson, Stuart

    2012-01-01

    Non destructive test systems are increasingly applied in the industrial context for their strong potentialities in improving and standardizing quality control. Especially in the intermediate manufacturing stages, early detection of defects on semi-finished products allow their direction towards later production processes according to their quality, with consequent considerable savings in time, energy, materials and work. However, the raw data coming from non destructive test systems are not always immediately suitable for sophisticated defect detection algorithms, due to noise and disturbances which are unavoidable, especially in harsh operating conditions, such as the ones which are typical of the steelmaking cycle. The paper describes some pre-processing operations which are required in order to exploit the data coming from a non destructive test system. Such a system is based on the joint exploitation of Laser and Electro-Magnetic Acoustic Transducer technologies and is applied to the detection of surface and sub-surface cracks in cold and hot steel slabs. PMID:21855062

  15. Prenatal alcohol exposure alters gene expression in the rat brain: Experimental design and bioinformatic analysis of microarray data.

    PubMed

    Lussier, Alexandre A; Stepien, Katarzyna A; Weinberg, Joanne; Kobor, Michael S

    2015-09-01

    We previously identified gene expression changes in the prefrontal cortex and hippocampus of rats prenatally exposed to alcohol under both steady-state and challenge conditions (Lussier et al., 2015, Alcohol.: Clin. Exp. Res., 39, 251-261). In this study, adult female rats from three prenatal treatment groups (ad libitum-fed control, pair-fed, and ethanol-fed) were injected with physiological saline solution or complete Freund׳s adjuvant (CFA) to induce arthritis (adjuvant-induced arthritis, AA). The prefrontal cortex and hippocampus were collected 16 days (peak of arthritis) or 39 days (during recovery) following injection, and whole genome gene expression was assayed using Illumina׳s RatRef-12 expression microarray. Here, we provide additional metadata, detailed explanations of data pre-processing steps and quality control, as well as a basic framework for the bioinformatic analyses performed. The datasets from this study are publicly available on the GEO repository (accession number GSE63561). PMID:26217797

  16. Prenatal alcohol exposure alters gene expression in the rat brain: Experimental design and bioinformatic analysis of microarray data

    PubMed Central

    Lussier, Alexandre A.; Stepien, Katarzyna A.; Weinberg, Joanne; Kobor, Michael S.

    2015-01-01

    We previously identified gene expression changes in the prefrontal cortex and hippocampus of rats prenatally exposed to alcohol under both steady-state and challenge conditions (Lussier et al., 2015, Alcohol.: Clin. Exp. Res., 39, 251–261). In this study, adult female rats from three prenatal treatment groups (ad libitum-fed control, pair-fed, and ethanol-fed) were injected with physiological saline solution or complete Freund׳s adjuvant (CFA) to induce arthritis (adjuvant-induced arthritis, AA). The prefrontal cortex and hippocampus were collected 16 days (peak of arthritis) or 39 days (during recovery) following injection, and whole genome gene expression was assayed using Illumina׳s RatRef-12 expression microarray. Here, we provide additional metadata, detailed explanations of data pre-processing steps and quality control, as well as a basic framework for the bioinformatic analyses performed. The datasets from this study are publicly available on the GEO repository (accession number GSE63561). PMID:26217797

  17. Segmentation of complementary DNA microarray images by wavelet-based Markov random field model.

    PubMed

    Athanasiadis, Emmanouil I; Cavouras, Dionisis A; Glotsos, Dimitris Th; Georgiadis, Pantelis V; Kalatzis, Ioannis K; Nikiforidis, George C

    2009-11-01

    A wavelet-based modification of the Markov random field (WMRF) model is proposed for segmenting complementary DNA (cDNA) microarray images. For evaluation purposes, five simulated and a set of five real microarray images were used. The one-level stationary wavelet transform (SWT) of each microarray image was used to form two images, a denoised image, using hard thresholding filter, and a magnitude image, from the amplitudes of the horizontal and vertical components of SWT. Elements from these two images were suitably combined to form the WMRF model for segmenting spots from their background. The WMRF was compared against the conventional MRF and the Fuzzy C means (FCM) algorithms on simulated and real microarray images and their performances were evaluated by means of the segmentation matching factor (SMF) and the coefficient of determination (r2). Additionally, the WMRF was compared against the SPOT and SCANALYZE, and performances were evaluated by the mean absolute error (MAE) and the coefficient of variation (CV). The WMRF performed more accurately than the MRF and FCM (SMF: 92.66, 92.15, and 89.22, r2 : 0.92, 0.90, and 0.84, respectively) and achieved higher reproducibility than the MRF, SPOT, and SCANALYZE (MAE: 497, 1215, 1180, and 503, CV: 0.88, 1.15, 0.93, and 0.90, respectively).

  18. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information.

    PubMed

    Mortazavi, Atiyeh; Moattar, Mohammad Hossein

    2016-01-01

    High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI) for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches. PMID:27127506

  19. Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification

    PubMed Central

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2015-01-01

    This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice. PMID:25823003

  20. Measuring information flow in cellular networks by the systems biology method through microarray data.

    PubMed

    Chen, Bor-Sen; Li, Cheng-Wei

    2015-01-01

    In general, it is very difficult to measure the information flow in a cellular network directly. In this study, based on an information flow model and microarray data, we measured the information flow in cellular networks indirectly by using a systems biology method. First, we used a recursive least square parameter estimation algorithm to identify the system parameters of coupling signal transduction pathways and the cellular gene regulatory network (GRN). Then, based on the identified parameters and systems theory, we estimated the signal transductivities of the coupling signal transduction pathways from the extracellular signals to each downstream protein and the information transductivities of the GRN between transcription factors in response to environmental events. According to the proposed method, the information flow, which is characterized by signal transductivity in coupling signaling pathways and information transductivity in the GRN, can be estimated by microarray temporal data or microarray sample data. It can also be estimated by other high-throughput data such as next-generation sequencing or proteomic data. Finally, the information flows of the signal transduction pathways and the GRN in leukemia cancer cells and non-leukemia normal cells were also measured to analyze the systematic dysfunction in this cancer from microarray sample data. The results show that the signal transductivities of signal transduction pathways change substantially from normal cells to leukemia cancer cells.

  1. Using attribute behavior diversity to build accurate decision tree committees for microarray data.

    PubMed

    Han, Qian; Dong, Guozhu

    2012-08-01

    DNA microarrays (gene chips), frequently used in biological and medical studies, measure the expressions of thousands of genes per sample. Using microarray data to build accurate classifiers for diseases is an important task. This paper introduces an algorithm, called Committee of Decision Trees by Attribute Behavior Diversity (CABD), to build highly accurate ensembles of decision trees for such data. Since a committee's accuracy is greatly influenced by the diversity among its member classifiers, CABD uses two new ideas to "optimize" that diversity, namely (1) the concept of attribute behavior-based similarity between attributes, and (2) the concept of attribute usage diversity among trees. The ideas are effective for microarray data, since such data have many features and behavior similarity between genes can be high. Experiments on microarray data for six cancers show that CABD outperforms previous ensemble methods significantly and outperforms SVM, and show that the diversified features used by CABD's decision tree committee can be used to improve performance of other classifiers such as SVM. CABD has potential for other high-dimensional data, and its ideas may apply to ensembles of other classifier types. PMID:22809418

  2. Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers.

    PubMed

    Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin

    2013-01-01

    DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance. PMID:24078908

  3. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information.

    PubMed

    Mortazavi, Atiyeh; Moattar, Mohammad Hossein

    2016-01-01

    High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI) for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

  4. A robust measure of correlation between two genes on a microarray

    PubMed Central

    Hardin, Johanna; Mitani, Aya; Hicks, Leanne; VanKoten, Brian

    2007-01-01

    Background The underlying goal of microarray experiments is to identify gene expression patterns across different experimental conditions. Genes that are contained in a particular pathway or that respond similarly to experimental conditions could be co-expressed and show similar patterns of expression on a microarray. Using any of a variety of clustering methods or gene network analyses we can partition genes of interest into groups, clusters, or modules based on measures of similarity. Typically, Pearson correlation is used to measure distance (or similarity) before implementing a clustering algorithm. Pearson correlation is quite susceptible to outliers, however, an unfortunate characteristic when dealing with microarray data (well known to be typically quite noisy.) Results We propose a resistant similarity metric based on Tukey's biweight estimate of multivariate scale and location. The resistant metric is simply the correlation obtained from a resistant covariance matrix of scale. We give results which demonstrate that our correlation metric is much more resistant than the Pearson correlation while being more efficient than other nonparametric measures of correlation (e.g., Spearman correlation.) Additionally, our method gives a systematic gene flagging procedure which is useful when dealing with large amounts of noisy data. Conclusion When dealing with microarray data, which are known to be quite noisy, robust methods should be used. Specifically, robust distances, including the biweight correlation, should be used in clustering and gene network analysis. PMID:17592643

  5. Multivariate curve resolution for hyperspectral image analysis :applications to microarray technology.

    SciTech Connect

    Van Benthem, Mark Hilary; Sinclair, Michael B.; Haaland, David Michael; Martinez, M. Juanita (University of New Mexico, Albuquerque, NM); Timlin, Jerilyn Ann; Werner-Washburne, Margaret C. (University of New Mexico, Albuquerque, NM); Aragon, Anthony D. (University of New Mexico, Albuquerque, NM)

    2003-01-01

    Multivariate curve resolution (MCR) using constrained alternating least squares algorithms represents a powerful analysis capability for a quantitative analysis of hyperspectral image data. We will demonstrate the application of MCR using data from a new hyperspectral fluorescence imaging microarray scanner for monitoring gene expression in cells from thousands of genes on the array. The new scanner collects the entire fluorescence spectrum from each pixel of the scanned microarray. Application of MCR with nonnegativity and equality constraints reveals several sources of undesired fluorescence that emit in the same wavelength range as the reporter fluorphores. MCR analysis of the hyperspectral images confirms that one of the sources of fluorescence is due to contaminant fluorescence under the printed DNA spots that is spot localized. Thus, traditional background subtraction methods used with data collected from the current commercial microarray scanners will lead to errors in determining the relative expression of low-expressed genes. With the new scanner and MCR analysis, we generate relative concentration maps of the background, impurity, and fluorescent labels over the entire image. Since the concentration maps of the fluorescent labels are relatively unaffected by the presence of background and impurity emissions, the accuracy and useful dynamic range of the gene expression data are both greatly improved over those obtained by commercial microarray scanners.

  6. Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering

    PubMed Central

    de Brevern, Alexandre G; Hazout, Serge; Malpertuy, Alain

    2004-01-01

    Background Microarray technologies produced large amount of data. The hierarchical clustering is commonly used to identify clusters of co-expressed genes. However, microarray datasets often contain missing values (MVs) representing a major drawback for the use of the clustering methods. Usually the MVs are not treated, or replaced by zero or estimated by the k-Nearest Neighbor (kNN) approach. The topic of the paper is to study the stability of gene clusters, defined by various hierarchical clustering algorithms, of microarrays experiments including or not MVs. Results In this study, we show that the MVs have important effects on the stability of the gene clusters. Moreover, the magnitude of the gene misallocations is depending on the aggregation algorithm. The most appropriate aggregation methods (e.g. complete-linkage and Ward) are highly sensitive to MVs, and surprisingly, for a very tiny proportion of MVs (e.g. 1%). In most of the case, the MVs must be replaced by expected values. The MVs replacement by the kNN approach clearly improves the identification of co-expressed gene clusters. Nevertheless, we observe that kNN approach is less suitable for the extreme values of gene expression. Conclusion The presence of MVs (even at a low rate) is a major factor of gene cluster instability. In addition, the impact depends on the hierarchical clustering algorithm used. Some methods should be used carefully. Nevertheless, the kNN approach constitutes one efficient method for restoring the missing expression gene values, with a low error level. Our study highlights the need of statistical treatments in microarray data to avoid misinterpretation. PMID:15324460

  7. Microarray missing data imputation based on a set theoretic framework and biological knowledge.

    PubMed

    Gan, Xiangchao; Liew, Alan Wee-Chung; Yan, Hong

    2006-01-01

    Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods.

  8. Influence of data preprocessing on the quantitative determination of nutrient content in poultry manure by near infrared spectroscopy.

    PubMed

    Chen, L J; Xing, L; Han, L J

    2010-01-01

    With increasing concern over potential polltion from farm wastes, there is a need for rapid and robust methods that can analyze livestock manure nutrient content. The near infrared spectroscopy (NIRS) method was used to determine nutrient content in diverse poultry manure samples (n=91). Various standard preprocessing methods (derivatives, multiplicative scatter correction, Savitsky-Golay smoothing, and standard normal variate) were applied to reduce data systemic noise. In addition, a new preprocessing method known as direct orthogonal signal correction (DOSC) was tested. Calibration models for ammonium nitrogen, total potassium, total nitrogen, and total phosphorus were developed with the partial least squares (PLS) method. The results showed that all the preprocessed data improved prediction results compared with the non-preprocessing method. Compared with the other preprocessing methods, the DOSC method gave the best results. The DOSC method achieved moderately successful prediction for ammonium nitrogen, total nitrogen, and total phosphorus. However, all preprocessing methods did not provide reliable prediction for total potassium. This indicates the DOSC method, especially combined with other preprocessing methods, needs further study to allow a more complete predictive analysis of manure nutrient content.

  9. Gene Expression Signature in Endemic Osteoarthritis by Microarray Analysis

    PubMed Central

    Wang, Xi; Ning, Yujie; Zhang, Feng; Yu, Fangfang; Tan, Wuhong; Lei, Yanxia; Wu, Cuiyan; Zheng, Jingjing; Wang, Sen; Yu, Hanjie; Li, Zheng; Lammi, Mikko J.; Guo, Xiong

    2015-01-01

    Kashin-Beck Disease (KBD) is an endemic osteochondropathy with an unknown pathogenesis. Diagnosis of KBD is effective only in advanced cases, which eliminates the possibility of early treatment and leads to an inevitable exacerbation of symptoms. Therefore, we aim to identify an accurate blood-based gene signature for the detection of KBD. Previously published gene expression profile data on cartilage and peripheral blood mononuclear cells (PBMCs) from adults with KBD were compared to select potential target genes. Microarray analysis was conducted to evaluate the expression of the target genes in a cohort of 100 KBD patients and 100 healthy controls. A gene expression signature was identified using a training set, which was subsequently validated using an independent test set with a minimum redundancy maximum relevance (mRMR) algorithm and support vector machine (SVM) algorithm. Fifty unique genes were differentially expressed between KBD patients and healthy controls. A 20-gene signature was identified that distinguished between KBD patients and controls with 90% accuracy, 85% sensitivity, and 95% specificity. This study identified a 20-gene signature that accurately distinguishes between patients with KBD and controls using peripheral blood samples. These results promote the further development of blood-based genetic biomarkers for detection of KBD. PMID:25997002

  10. Viral diagnosis in Indian livestock using customized microarray chips.

    PubMed

    Yadav, Brijesh S; Pokhriyal, Mayank; Ratta, Barkha; Kumar, Ajay; Saxena, Meeta; Sharma, Bhaskar

    2015-01-01

    Viral diagnosis in Indian livestock using customized microarray chips is gaining momentum in recent years. Hence, it is possible to design customized microarray chip for viruses infecting livestock in India. Customized microarray chips identified Bovine herpes virus-1 (BHV-1), Canine Adeno Virus-1 (CAV-1), and Canine Parvo Virus-2 (CPV-2) in clinical samples. Microarray identified specific probes were further confirmed using RT-PCR in all clinical and known samples. Therefore, the application of microarray chips during viral disease outbreaks in Indian livestock is possible where conventional methods are unsuitable. It should be noted that customized application requires a detailed cost efficiency calculation.

  11. Advancing translational research with next-generation protein microarrays.

    PubMed

    Yu, Xiaobo; Petritis, Brianne; LaBaer, Joshua

    2016-04-01

    Protein microarrays are a high-throughput technology used increasingly in translational research, seeking to apply basic science findings to enhance human health. In addition to assessing protein levels, posttranslational modifications, and signaling pathways in patient samples, protein microarrays have aided in the identification of potential protein biomarkers of disease and infection. In this perspective, the different types of full-length protein microarrays that are used in translational research are reviewed. Specific studies employing these microarrays are presented to highlight their potential in finding solutions to real clinical problems. Finally, the criteria that should be considered when developing next-generation protein microarrays are provided. PMID:26749402

  12. AKITA: Application Knowledge Interface to Algorithms

    NASA Astrophysics Data System (ADS)

    Barros, Paul; Mathis, Allison; Newman, Kevin; Wilder, Steven

    2013-05-01

    We propose a methodology for using sensor metadata and targeted preprocessing to optimize which selection from a large suite of algorithms are most appropriate for a given data set. Rather than applying several general purpose algorithms or requiring a human operator to oversee the analysis of the data, our method allows the most effective algorithm to be automatically chosen, conserving both computational, network and human resources. For example, the amount of video data being produced daily is far greater than can ever be analyzed. Computer vision algorithms can help sift for the relevant data, but not every algorithm is suited to every data type nor is it efficient to run them all. A full body detector won't work well when the camera is zoomed in or when it is raining and all the people are occluded by foul weather gear. However, leveraging metadata knowledge of the camera settings and the conditions under which the data was collected (generated by automatic preprocessing), face or umbrella detectors could be applied instead, increasing the likelihood of a correct reading. The Lockheed Martin AKITA™ system is a modular knowledge layer which uses knowledge of the system and environment to determine how to most efficiently and usefully process whatever data it is given.

  13. A fast meteor detection algorithm

    NASA Astrophysics Data System (ADS)

    Gural, P.

    2016-01-01

    A low latency meteor detection algorithm for use with fast steering mirrors had been previously developed to track and telescopically follow meteors in real-time (Gural, 2007). It has been rewritten as a generic clustering and tracking software module for meteor detection that meets both the demanding throughput requirements of a Raspberry Pi while also maintaining a high probability of detection. The software interface is generalized to work with various forms of front-end video pre-processing approaches and provides a rich product set of parameterized line detection metrics. Discussion will include the Maximum Temporal Pixel (MTP) compression technique as a fast thresholding option for feeding the detection module, the detection algorithm trade for maximum processing throughput, details on the clustering and tracking methodology, processing products, performance metrics, and a general interface description.

  14. Data preprocessing and preliminary results of the moon-based ultraviolet telescope on CE-3 lander

    NASA Astrophysics Data System (ADS)

    Wang, f.

    2015-10-01

    The moon-based ultraviolet telescope (MUVT) is one of the payloads on the Chang'e-3(CE-3)lunar lander. Because of the advantages of having no atmospheric disturbances and the slow rotation of the Moon, we can make longterm continuous observations of a series of important remote celestial objects in the near ultraviolet band, and perform a sky survey of selected areas. We can find characteristic changes incelestial brightness with time by analyzing image data from the MUVT ,and deduce the radiation mechanism and physical properties of these celestial objects after comparing with a physical model. In order to explain the scientific purposes of MUVT, this article analyzes the preprocessing of MUVT image data and makes a preliminary evaluation of data quality. The results demonstrate that the methods used for data collection and preprocessing are effective, and the Level 2A and 2B image data satisfy the requirements of follow-up scientific researches.

  15. Radar signal pre-processing to suppress surface bounce and multipath

    DOEpatents

    Paglieroni, David W; Mast, Jeffrey E; Beer, N. Reginald

    2013-12-31

    A method and system for detecting the presence of subsurface objects within a medium is provided. In some embodiments, the imaging and detection system operates in a multistatic mode to collect radar return signals generated by an array of transceiver antenna pairs that is positioned across the surface and that travels down the surface. The imaging and detection system pre-processes that return signal to suppress certain undesirable effects. The imaging and detection system then generates synthetic aperture radar images from real aperture radar images generated from the pre-processed return signal. The imaging and detection system then post-processes the synthetic aperture radar images to improve detection of subsurface objects. The imaging and detection system identifies peaks in the energy levels of the post-processed image frame, which indicates the presence of a subsurface object.

  16. KONFIG and REKONFIG: Two interactive preprocessing to the Navy/NASA Engine Program (NNEP)

    NASA Technical Reports Server (NTRS)

    Fishbach, L. H.

    1981-01-01

    The NNEP is a computer program that is currently being used to simulate the thermodynamic cycle performance of almost all types of turbine engines by many government, industry, and university personnel. The NNEP uses arrays of input data to set up the engine simulation and component matching method as well as to describe the characteristics of the components. A preprocessing program (KONFIG) is described in which the user at a terminal on a time shared computer can interactively prepare the arrays of data required. It is intended to make it easier for the occasional or new user to operate NNEP. Another preprocessing program (REKONFIG) in which the user can modify the component specifications of a previously configured NNEP dataset is also described. It is intended to aid in preparing data for parametric studies and/or studies of similar engines such a mixed flow turbofans, turboshafts, etc.

  17. The impact of data preprocessing in traumatic brain injury detection using functional magnetic resonance imaging.

    PubMed

    Vergara, Victor M; Damaraju, Eswar; Mayer, Andrew B; Miller, Robyn; Cetin, Mustafa S; Calhoun, Vince

    2015-01-01

    Traumatic brain injury (TBI) can adversely affect a person's thinking, memory, personality and behavior. For this reason new and better biomarkers are being investigated. Resting state functional network connectivity (rsFNC) derived from functional magnetic resonance (fMRI) imaging is emerging as a possible biomarker. One of the main concerns with this technique is the appropriateness of methods used to correct for subject movement. In this work we used 50 mild TBI patients and matched healthy controls to explore the outcomes obtained from different fMRI data preprocessing. Results suggest that correction for motion variance before spatial smoothing is the best alternative. Following this preprocessing option a significant group difference was found between cerebellum and supplementary motor area/paracentral lobule. In this case the mTBI group exhibits an increase in rsFNC.

  18. The Role of GRAIL Orbit Determination in Preprocessing of Gravity Science Measurements

    NASA Technical Reports Server (NTRS)

    Kruizinga, Gerhard; Asmar, Sami; Fahnestock, Eugene; Harvey, Nate; Kahan, Daniel; Konopliv, Alex; Oudrhiri, Kamal; Paik, Meegyeong; Park, Ryan; Strekalov, Dmitry; Watkins, Michael; Yuan, Dah-Ning

    2013-01-01

    The Gravity Recovery And Interior Laboratory (GRAIL) mission has constructed a lunar gravity field with unprecedented uniform accuracy on the farside and nearside of the Moon. GRAIL lunar gravity field determination begins with preprocessing of the gravity science measurements by applying corrections for time tag error, general relativity, measurement noise and biases. Gravity field determination requires the generation of spacecraft ephemerides of an accuracy not attainable with the pre-GRAIL lunar gravity fields. Therefore, a bootstrapping strategy was developed, iterating between science data preprocessing and lunar gravity field estimation in order to construct sufficiently accurate orbit ephemerides.This paper describes the GRAIL measurements, their dependence on the spacecraft ephemerides and the role of orbit determination in the bootstrapping strategy. Simulation results will be presented that validate the bootstrapping strategy followed by bootstrapping results for flight data, which have led to the latest GRAIL lunar gravity fields.

  19. The impact of data preprocessing in traumatic brain injury detection using functional magnetic resonance imaging.

    PubMed

    Vergara, Victor M; Damaraju, Eswar; Mayer, Andrew B; Miller, Robyn; Cetin, Mustafa S; Calhoun, Vince

    2015-01-01

    Traumatic brain injury (TBI) can adversely affect a person's thinking, memory, personality and behavior. For this reason new and better biomarkers are being investigated. Resting state functional network connectivity (rsFNC) derived from functional magnetic resonance (fMRI) imaging is emerging as a possible biomarker. One of the main concerns with this technique is the appropriateness of methods used to correct for subject movement. In this work we used 50 mild TBI patients and matched healthy controls to explore the outcomes obtained from different fMRI data preprocessing. Results suggest that correction for motion variance before spatial smoothing is the best alternative. Following this preprocessing option a significant group difference was found between cerebellum and supplementary motor area/paracentral lobule. In this case the mTBI group exhibits an increase in rsFNC. PMID:26737520

  20. PMD: A Resource for Archiving and Analyzing Protein Microarray data.

    PubMed

    Xu, Zhaowei; Huang, Likun; Zhang, Hainan; Li, Yang; Guo, Shujuan; Wang, Nan; Wang, Shi-Hua; Chen, Ziqing; Wang, Jingfang; Tao, Sheng-Ce

    2016-01-27

    Protein microarray is a powerful technology for both basic research and clinical study. However, because there is no database specifically tailored for protein microarray, the majority of the valuable original protein microarray data is still not publically accessible. To address this issue, we constructed Protein Microarray Database (PMD), which is specifically designed for archiving and analyzing protein microarray data. In PMD, users can easily browse and search the entire database by experimental name, protein microarray type, and sample information. Additionally, PMD integrates several data analysis tools and provides an automated data analysis pipeline for users. With just one click, users can obtain a comprehensive analysis report for their protein microarray data. The report includes preliminary data analysis, such as data normalization, candidate identification, and an in-depth bioinformatics analysis of the candidates, which include functional annotation, pathway analysis, and protein-protein interaction network analysis. PMD is now freely available at www.proteinmicroarray.cn.

  1. PMD: A Resource for Archiving and Analyzing Protein Microarray data

    PubMed Central

    Xu, Zhaowei; Huang, Likun; Zhang, Hainan; Li, Yang; Guo, Shujuan; Wang, Nan; Wang, Shi-hua; Chen, Ziqing; Wang, Jingfang; Tao, Sheng-ce

    2016-01-01

    Protein microarray is a powerful technology for both basic research and clinical study. However, because there is no database specifically tailored for protein microarray, the majority of the valuable original protein microarray data is still not publically accessible. To address this issue, we constructed Protein Microarray Database (PMD), which is specifically designed for archiving and analyzing protein microarray data. In PMD, users can easily browse and search the entire database by experimental name, protein microarray type, and sample information. Additionally, PMD integrates several data analysis tools and provides an automated data analysis pipeline for users. With just one click, users can obtain a comprehensive analysis report for their protein microarray data. The report includes preliminary data analysis, such as data normalization, candidate identification, and an in-depth bioinformatics analysis of the candidates, which include functional annotation, pathway analysis, and protein-protein interaction network analysis. PMD is now freely available at www.proteinmicroarray.cn. PMID:26813635

  2. Statistical Considerations for Analysis of Microarray Experiments

    PubMed Central

    Owzar, Kouros; Barry, William T.; Jung, Sin-Ho

    2014-01-01

    Microarray technologies enable the simultaneous interrogation of expressions from thousands of genes from a biospecimen sample taken from a patient. This large set of expressions generate a genetic profile of the patient that may be used to identify potential prognostic or predictive genes or genetic models for clinical outcomes. The aim of this article is to provide a broad overview of some of the major statistical considerations for the design and analysis of microarrays experiments conducted as correlative science studies to clinical trials. An emphasis will be placed on how the lack of understanding and improper use of statistical concepts and methods will lead to noise discovery and misinterpretation of experimental results. PMID:22212230

  3. Plasmonically amplified fluorescence bioassay with microarray format

    NASA Astrophysics Data System (ADS)

    Gogalic, S.; Hageneder, S.; Ctortecka, C.; Bauch, M.; Khan, I.; Preininger, Claudia; Sauer, U.; Dostalek, J.

    2015-05-01

    Plasmonic amplification of fluorescence signal in bioassays with microarray detection format is reported. A crossed relief diffraction grating was designed to couple an excitation laser beam to surface plasmons at the wavelength overlapping with the absorption and emission bands of fluorophore Dy647 that was used as a label. The surface of periodically corrugated sensor chip was coated with surface plasmon-supporting gold layer and a thin SU8 polymer film carrying epoxy groups. These groups were employed for the covalent immobilization of capture antibodies at arrays of spots. The plasmonic amplification of fluorescence signal on the developed microarray chip was tested by using interleukin 8 sandwich immunoassay. The readout was performed ex situ after drying the chip by using a commercial scanner with high numerical aperture collecting lens. Obtained results reveal the enhancement of fluorescence signal by a factor of 5 when compared to a regular glass chip.

  4. Microarrays: how many do you need?

    PubMed

    Zien, Alexander; Fluck, Juliane; Zimmer, Ralf; Lengauer, Thomas

    2003-01-01

    We estimate the number of microarrays that is required in order to gain reliable results from a common type of study: the pairwise comparison of different classes of samples. We show that current knowledge allows for the construction of models that look realistic with respect to searches for individual differentially expressed genes and derive prototypical parameters from real data sets. Such models allow investigation of the dependence of the required number of samples on the relevant parameters: the biological variability of the samples within each class, the fold changes in expression that are desired to be detected, the detection sensitivity of the microarrays, and the acceptable error rates of the results. We supply experimentalists with general conclusions as well as a freely accessible Java applet at www.scai.fhg.de/special/bio/howmanyarrays/ for fine tuning simulations to their particular settings. PMID:12935350

  5. A Flexible Microarray Data Simulation Model

    PubMed Central

    Dembélé, Doulaye

    2013-01-01

    Microarray technology allows monitoring of gene expression profiling at the genome level. This is useful in order to search for genes involved in a disease. The performances of the methods used to select interesting genes are most often judged after other analyzes (qPCR validation, search in databases...), which are also subject to error. A good evaluation of gene selection methods is possible with data whose characteristics are known, that is to say, synthetic data. We propose a model to simulate microarray data with similar characteristics to the data commonly produced by current platforms. The parameters used in this model are described to allow the user to generate data with varying characteristics. In order to show the flexibility of the proposed model, a commented example is given and illustrated. An R package is available for immediate use.

  6. Profiling protein function with small molecule microarrays

    PubMed Central

    Winssinger, Nicolas; Ficarro, Scott; Schultz, Peter G.; Harris, Jennifer L.

    2002-01-01

    The regulation of protein function through posttranslational modification, local environment, and protein–protein interaction is critical to cellular function. The ability to analyze on a genome-wide scale protein functional activity rather than changes in protein abundance or structure would provide important new insights into complex biological processes. Herein, we report the application of a spatially addressable small molecule microarray to an activity-based profile of proteases in crude cell lysates. The potential of this small molecule-based profiling technology is demonstrated by the detection of caspase activation upon induction of apoptosis, characterization of the activated caspase, and inhibition of the caspase-executed apoptotic phenotype using the small molecule inhibitor identified in the microarray-based profile. PMID:12167675

  7. Application of DNA Microarray to Clinical Diagnostics.

    PubMed

    Patel, Ankita; Cheung, Sau W

    2016-01-01

    Microarray-based technology to conduct array comparative genomic hybridization (aCGH) has made a significant impact on the diagnosis of human genetic diseases. Such diagnoses, previously undetectable by traditional G-banding chromosome analysis, are now achieved by identifying genomic copy number variants (CNVs) using the microarray. Not only can hundreds of well-characterized genetic syndromes be detected in a single assay, but new genomic disorders and disease-causing genes can also be discovered through the utilization of aCGH technology. Although other platforms such as single nucleotide polymorphism (SNP) arrays can be used for detecting CNVs, in this chapter we focus on describing the methods for performing aCGH using Agilent oligonucleotide arrays for both prenatal (e.g., amniotic fluid and chorionic villus sample) and postnatal samples (e.g., blood).

  8. Hyperspectral imaging in medicine: image pre-processing problems and solutions in Matlab.

    PubMed

    Koprowski, Robert

    2015-11-01

    The paper presents problems and solutions related to hyperspectral image pre-processing. New methods of preliminary image analysis are proposed. The paper shows problems occurring in Matlab when trying to analyse this type of images. Moreover, new methods are discussed which provide the source code in Matlab that can be used in practice without any licensing restrictions. The proposed application and sample result of hyperspectral image analysis.

  9. Improving the accuracy of volumetric segmentation using pre-processing boundary detection and image reconstruction.

    PubMed

    Archibald, Rick; Hu, Jiuxiang; Gelb, Anne; Farin, Gerald

    2004-04-01

    The concentration edge -detection and Gegenbauer image-reconstruction methods were previously shown to improve the quality of segmentation in magnetic resonance imaging. In this study, these methods are utilized as a pre-processing step to the Weibull E-SD field segmentation. It is demonstrated that the combination of the concentration edge detection and Gegenbauer reconstruction method improves the accuracy of segmentation for the simulated test data and real magnetic resonance images used in this study. PMID:15376580

  10. Hyperspectral imaging in medicine: image pre-processing problems and solutions in Matlab.

    PubMed

    Koprowski, Robert

    2015-11-01

    The paper presents problems and solutions related to hyperspectral image pre-processing. New methods of preliminary image analysis are proposed. The paper shows problems occurring in Matlab when trying to analyse this type of images. Moreover, new methods are discussed which provide the source code in Matlab that can be used in practice without any licensing restrictions. The proposed application and sample result of hyperspectral image analysis. PMID:25676816

  11. Data preprocessing for a vehicle-based localization system used in road traffic applications

    NASA Astrophysics Data System (ADS)

    Patelczyk, Timo; Löffler, Andreas; Biebl, Erwin

    2016-09-01

    This paper presents a fixed-point implementation of the preprocessing using a field programmable gate array (FPGA), which is required for a multipath joint angle and delay estimation (JADE) used in road traffic applications. This paper lays the foundation for many model-based parameter estimation methods. Here, a simulation of a vehicle-based localization system application for protecting vulnerable road users, which were equipped with appropriate transponders, is considered. For such safety critical applications, the robustness and real-time capability of the localization is particularly important. Additionally, a motivation to use a fixed-point implementation for the data preprocessing is a limited computing power of the head unit of a vehicle. This study aims to process the raw data provided by the localization system used in this paper. The data preprocessing applied includes a wideband calibration of the physical localization system, separation of relevant information from the received sampled signal, and preparation of the incoming data via further processing. Further, a channel matrix estimation was implemented to complete the data preprocessing, which contains information on channel parameters, e.g., the positions of the objects to be located. In the presented case of a vehicle-based localization system application we assume an urban environment, in which multipath propagation occurs. Since most methods for localization are based on uncorrelated signals, this fact must be addressed. Hence, a decorrelation of incoming data stream in terms of a further localization is required. This decorrelation was accomplished by considering several snapshots in different time slots. As a final aspect of the use of fixed-point arithmetic, quantization errors are considered. In addition, the resources and runtime of the presented implementation are discussed; these factors are strongly linked to a practical implementation.

  12. Performance evaluation of preprocessing techniques utilizing expert information in multivariate calibration.

    PubMed

    Sharma, Sandeep; Goodarzi, Mohammad; Ramon, Herman; Saeys, Wouter

    2014-04-01

    Partial Least Squares (PLS) regression is one of the most used methods for extracting chemical information from Near Infrared (NIR) spectroscopic measurements. The success of a PLS calibration relies largely on the representativeness of the calibration data set. This is not trivial, because not only the expected variation in the analyte of interest, but also the variation of other contributing factors (interferents) should be included in the calibration data. This also implies that changes in interferent concentrations not covered in the calibration step can deteriorate the prediction ability of the calibration model. Several researchers have suggested that PLS models can be robustified against changes in the interferent structure by incorporating expert knowledge in the preprocessing step with the aim to efficiently filter out the spectral influence of the spectral interferents. However, these methods have not yet been compared against each other. Therefore, in the present study, various preprocessing techniques exploiting expert knowledge were compared on two experimental data sets. In both data sets, the calibration and test set were designed to have a different interferent concentration range. The performance of these techniques was compared to that of preprocessing techniques which do not use any expert knowledge. Using expert knowledge was found to improve the prediction performance for both data sets. For data set-1, the prediction error improved nearly 32% when pure component spectra of the analyte and the interferents were used in the Extended Multiplicative Signal Correction framework. Similarly, for data set-2, nearly 63% improvement in the prediction error was observed when the interferent information was utilized in Spectral Interferent Subtraction preprocessing.

  13. Preprocessing techniques to reduce atmospheric and sensor variability in multispectral scanner data.

    NASA Technical Reports Server (NTRS)

    Crane, R. B.

    1971-01-01

    Multispectral scanner data are potentially useful in a variety of remote sensing applications. Large-area surveys of earth resources carried out by automated recognition processing of these data are particularly important. However, the practical realization of such surveys is limited by a variability in the scanner signals that results in improper recognition of the data. This paper discusses ways by which some of this variability can be removed from the data by preprocessing with resultant improvements in recognition results.

  14. Design of a combinatorial dna microarray for protein-dnainteraction studies

    SciTech Connect

    Mintseris, Julian; Eisen, Michael B.

    2006-07-07

    Background: Discovery of precise specificity oftranscription factors is an important step on the way to understandingthe complex mechanisms of gene regulation in eukaryotes. Recently,doublestranded protein-binding microarrays were developed as apotentially scalable approach to tackle transcription factor binding siteidentification. Results: Here we present an algorithmic approach toexperimental design of a microarray that allows for testing fullspecificity of a transcription factor binding to all possible DNA bindingsites of a given length, with optimally efficient use of the array. Thisdesign is universal, works for any factor that binds a sequence motif andis not species-specific. Furthermore, simulation results show that dataproduced with the designed arrays is easier to analyze and would resultin more precise identification of binding sites. Conclusion: In thisstudy, we present a design of a double stranded DNA microarray forprotein-DNA interaction studies and show that our algorithm allowsoptimally efficient use of the arrays for this purpose. We believe such adesign will prove useful for transcription factor binding siteidentification and other biological problems.

  15. Epitope Identification from Fixed-complexity Random-sequence Peptide Microarrays

    PubMed Central

    Richer, Josh; Johnston, Stephen Albert; Stafford, Phillip

    2015-01-01

    Antibodies play an important role in modern science and medicine. They are essential in many biological assays and have emerged as an important class of therapeutics. Unfortunately, current methods for mapping antibody epitopes require costly synthesis or enrichment steps, and no low-cost universal platform exists. In order to address this, we tested a random-sequence peptide microarray consisting of over 330,000 unique peptide sequences sampling 83% of all possible tetramers and 27% of pentamers. It is a single, unbiased platform that can be used in many different types of tests, it does not rely on informatic selection of peptides for a particular proteome, and it does not require iterative rounds of selection. In order to optimize the platform, we developed an algorithm that considers the significance of k-length peptide subsequences (k-mers) within selected peptides that come from the microarray. We tested eight monoclonal antibodies and seven infectious disease cohorts. The method correctly identified five of the eight monoclonal epitopes and identified both reported and unreported epitope candidates in the infectious disease cohorts. This algorithm could greatly enhance the utility of random-sequence peptide microarrays by enabling rapid epitope mapping and antigen identification. PMID:25368412

  16. Automated identification of multiple micro-organisms from resequencing DNA microarrays.

    PubMed

    Malanoski, Anthony P; Lin, Baochuan; Wang, Zheng; Schnur, Joel M; Stenger, David A

    2006-01-01

    There is an increasing recognition that detailed nucleic acid sequence information will be useful and even required in the diagnosis, treatment and surveillance of many significant pathogens. Because generating detailed information about pathogens leads to significantly larger amounts of data, it is necessary to develop automated analysis methods to reduce analysis time and to standardize identification criteria. This is especially important for multiple pathogen assays designed to reduce assay time and costs. In this paper, we present a successful algorithm for detecting pathogens and reporting the maximum level of detail possible using multi-pathogen resequencing microarrays. The algorithm filters the sequence of base calls from the microarray and finds entries in genetic databases that most closely match. Taxonomic databases are then used to relate these entries to each other so that the microorganism can be identified. Although developed using a resequencing microarray, the approach is applicable to any assay method that produces base call sequence information. The success and continued development of this approach means that a non-expert can now perform unassisted analysis of the results obtained from partial sequence data.

  17. Weighted analysis of general microarray experiments

    PubMed Central

    Sjögren, Anders; Kristiansson, Erik; Rudemo, Mats; Nerman, Olle

    2007-01-01

    Background In DNA microarray experiments, measurements from different biological samples are often assumed to be independent and to have identical variance. For many datasets these assumptions have been shown to be invalid and typically lead to too optimistic p-values. A method called WAME has been proposed where a variance is estimated for each sample and a covariance is estimated for each pair of samples. The current version of WAME is, however, limited to experiments with paired design, e.g. two-channel microarrays. Results The WAME procedure is extended to general microarray experiments, making it capable of handling both one- and two-channel datasets. Two public one-channel datasets are analysed and WAME detects both unequal variances and correlations. WAME is compared to other common methods: fold-change ranking, ordinary linear model with t-tests, LIMMA and weighted LIMMA. The p-value distributions are shown to differ greatly between the examined methods. In a resampling-based simulation study, the p-values generated by WAME are found to be substantially more correct than the alternatives when a relatively small proportion of the genes is regulated. WAME is also shown to have higher power than the other methods. WAME is available as an R-package. Conclusion The WAME procedure is generalized and the limitation to paired-design microarray datasets is removed. The examined other methods produce invalid p-values in many cases, while WAME is shown to produce essentially valid p-values when a relatively small proportion of genes is regulated. WAME is also shown to have higher power than the examined alternative methods. PMID:17937807

  18. Undetected sex chromosome aneuploidy by chromosomal microarray.

    PubMed

    Markus-Bustani, Keren; Yaron, Yuval; Goldstein, Myriam; Orr-Urtreger, Avi; Ben-Shachar, Shay

    2012-11-01

    We report on a case of a female fetus found to be mosaic for Turner syndrome (45,X) and trisomy X (47,XXX). Chromosomal microarray analysis (CMA) failed to detect the aneuploidy because of a normal average dosage of the X chromosome. This case represents an unusual instance in which CMA may not detect chromosomal aberrations. Such a possibility should be taken into consideration in similar cases where CMA is used in a clinical setting.

  19. Learning-based image preprocessing for robust computer-aided detection

    NASA Astrophysics Data System (ADS)

    Raghupathi, Laks; Devarakota, Pandu R.; Wolf, Matthias

    2013-03-01

    Recent studies have shown that low dose computed tomography (LDCT) can be an effective screening tool to reduce lung cancer mortality. Computer-aided detection (CAD) would be a beneficial second reader for radiologists in such cases. Studies demonstrate that while iterative reconstructions (IR) improve LDCT diagnostic quality, it however degrades CAD performance significantly (increased false positives) when applied directly. For improving CAD performance, solutions such as retraining with newer data or applying a standard preprocessing technique may not be suffice due to high prevalence of CT scanners and non-uniform acquisition protocols. Here, we present a learning-based framework that can adaptively transform a wide variety of input data to boost an existing CAD performance. This not only enhances their robustness but also their applicability in clinical workflows. Our solution consists of applying a suitable pre-processing filter automatically on the given image based on its characteristics. This requires the preparation of ground truth (GT) of choosing an appropriate filter resulting in improved CAD performance. Accordingly, we propose an efficient consolidation process with a novel metric. Using key anatomical landmarks, we then derive consistent feature descriptors for the classification scheme that then uses a priority mechanism to automatically choose an optimal preprocessing filter. We demonstrate CAD prototype∗ performance improvement using hospital-scale datasets acquired from North America, Europe and Asia. Though we demonstrated our results for a lung nodule CAD, this scheme is straightforward to extend to other post-processing tools dedicated to other organs and modalities.

  20. Reducing Uncertainties of Hydrologic Model Predictions Using a New Ensemble Pre-Processing Approach

    NASA Astrophysics Data System (ADS)

    Khajehei, S.; Moradkhani, H.

    2015-12-01

    Ensemble Streamflow Prediction (ESP) was developed to characterize the uncertainty in hydrologic predictions. However, ESP outputs are still prone to bias due to the uncertainty in the forcing data, initial condition, and model structure. Among these, uncertainty in forcing data has a major impact on the reliability of hydrologic simulations/forecasts. Major steps have been taken in generating less uncertain precipitation forecasts such as the Ensemble Pre-Processing (EPP) to achieve this goal. EPP is introduced as a statistical procedure based on the bivariate joint distribution between observation and forecast to generate ensemble climatologic forecast from single-value forecast. The purpose of this study is to evaluate the performance of pre-processed ensemble precipitation forecast in generating ensemble streamflow predictions. Copula functions used in EPP, model the multivariate joint distribution between univariate variables with any level of dependency. Accordingly, ESP is generated by employing both raw ensemble precipitation forecast as well as pre-processed ensemble precipitation. The ensemble precipitation forecast is taken from Climate Forecast System (CFS) generated by National Weather Service's (NWS) National Centers for Environmental Prediction (NCEP) models. Study is conducted using the precipitation Runoff Modeling System (PRMS) over two basins in the Pacific Northwest USA for the period of 1979 to 2013. Results reveal that applying this new EPP will lead to reduction of uncertainty and overall improvement in the ESP.

  1. Hard-wired digital data preprocessing applied within a modular star and target tracker

    NASA Astrophysics Data System (ADS)

    Schmidt, Uwe; Wunder, Dietmar

    1997-10-01

    Star sensors developed in the last years can be enhanced in terms of mass reduction, lower power consumption, and operational flexibility, by taking advantage of improvements in the detector technology and the electronics components. Jena-Optronik GmbH developed an intelligent modular star and target named 'stellar and extended target intelligent sensor' (SETIS). Emphasis was placed to increase the sensor adaptability to meet specific mission requirements. The intelligent modular star and target tracker shall generate positional information regarding a number of celestial targets or shall act as a navigation camera. The targets will be either stars or extended objects like comets and planetary objects, or both simultaneously. Deign drivers like simultaneous tracking of extended targets and stars or searching for new objects during tracking of already detected objects require a powerful hard-wired digital data preprocessing. An advanced rad-tolerant ASIC- technology is used for the star tracker preprocessor electronics. All of the necessary preprocessing star tracker functions like pixel defect correction, filtering, on-line background estimation, thresholding, object detection and extraction and pixel centroiding are realized in the ASIC design. The technical approach for the intelligent modular star and target tracker is presented in detail. Emphasis is placed on the description of the powerful signal preprocessing capabilities.

  2. Evaluating the validity of spectral calibration models for quantitative analysis following signal preprocessing.

    PubMed

    Chen, Da; Grant, Edward

    2012-11-01

    When paired with high-powered chemometric analysis, spectrometric methods offer great promise for the high-throughput analysis of complex systems. Effective classification or quantification often relies on signal preprocessing to reduce spectral interference and optimize the apparent performance of a calibration model. However, less frequently addressed by systematic research is the affect of preprocessing on the statistical accuracy of a calibration result. The present work demonstrates the effectiveness of two criteria for validating the performance of signal preprocessing in multivariate models in the important dimensions of bias and precision. To assess the extent of bias, we explore the applicability of the elliptic joint confidence region (EJCR) test and devise a new means to evaluate precision by a bias-corrected root mean square error of prediction. We show how these criteria can effectively gauge the success of signal pretreatments in suppressing spectral interference while providing a straightforward means to determine the optimal level of model complexity. This methodology offers a graphical diagnostic by which to visualize the consequences of pretreatment on complex multivariate models, enabling optimization with greater confidence. To demonstrate the application of the EJCR criterion in this context, we evaluate the validity of representative calibration models using standard pretreatment strategies on three spectral data sets. The results indicate that the proposed methodology facilitates the reliable optimization of a well-validated calibration model, thus improving the capability of spectrophotometric analysis.

  3. Microarray analysis in gastric cancer: A review

    PubMed Central

    D’Angelo, Giovanna; Di Rienzo, Teresa; Ojetti, Veronica

    2014-01-01

    Gastric cancer is one of the most common tumors worldwide. Although several treatment options have been developed, the mortality rate is increasing. Lymph node involvement is considered the most reliable prognostic indicator in gastric cancer. Early diagnosis improves the survival rate of patients and increases the likelihood of successful treatment. The most reliable diagnostic method is endoscopic examination, however, it is expensive and not feasible in poorer countries. Therefore, many innovative techniques have been studied to develop a new non-invasive screening test and to identify specific serum biomarkers. DNA microarray analysis is one of the new technologies able to measure the expression levels of a large number of genes simultaneously. It is possible to define the gene expression profile of the tumor and to correlate it with the prognosis and metastasis formation. Several studies in the literature have been published on the role of microarray analysis in gastric cancer and the mechanisms of proliferation and metastasis formation. The aim of this review is to analyze the importance of microarray analysis and its clinical applications to better define the genetic characteristics of gastric cancer and its possible implications in a more decisive treatment. PMID:25232233

  4. Repeatability of published microarray gene expression analyses.

    PubMed

    Ioannidis, John P A; Allison, David B; Ball, Catherine A; Coulibaly, Issa; Cui, Xiangqin; Culhane, Aedín C; Falchi, Mario; Furlanello, Cesare; Game, Laurence; Jurman, Giuseppe; Mangion, Jon; Mehta, Tapan; Nitzberg, Michael; Page, Grier P; Petretto, Enrico; van Noort, Vera

    2009-02-01

    Given the complexity of microarray-based gene expression studies, guidelines encourage transparent design and public data availability. Several journals require public data deposition and several public databases exist. However, not all data are publicly available, and even when available, it is unknown whether the published results are reproducible by independent scientists. Here we evaluated the replication of data analyses in 18 articles on microarray-based gene expression profiling published in Nature Genetics in 2005-2006. One table or figure from each article was independently evaluated by two teams of analysts. We reproduced two analyses in principle and six partially or with some discrepancies; ten could not be reproduced. The main reason for failure to reproduce was data unavailability, and discrepancies were mostly due to incomplete data annotation or specification of data processing and analysis. Repeatability of published microarray studies is apparently limited. More strict publication rules enforcing public data availability and explicit description of data processing and analysis should be considered.

  5. An imputation approach for oligonucleotide microarrays.

    PubMed

    Li, Ming; Wen, Yalu; Lu, Qing; Fu, Wenjiang J

    2013-01-01

    Oligonucleotide microarrays are commonly adopted for detecting and qualifying the abundance of molecules in biological samples. Analysis of microarray data starts with recording and interpreting hybridization signals from CEL images. However, many CEL images may be blemished by noises from various sources, observed as "bright spots", "dark clouds", and "shadowy circles", etc. It is crucial that these image defects are correctly identified and properly processed. Existing approaches mainly focus on detecting defect areas and removing affected intensities. In this article, we propose to use a mixed effect model for imputing the affected intensities. The proposed imputation procedure is a single-array-based approach which does not require any biological replicate or between-array normalization. We further examine its performance by using Affymetrix high-density SNP arrays. The results show that this imputation procedure significantly reduces genotyping error rates. We also discuss the necessary adjustments for its potential extension to other oligonucleotide microarrays, such as gene expression profiling. The R source code for the implementation of approach is freely available upon request.

  6. High-Throughput Enzyme Kinetics Using Microarrays

    SciTech Connect

    Guoxin Lu; Edward S. Yeung

    2007-11-01

    We report a microanalytical method to study enzyme kinetics. The technique involves immobilizing horseradish peroxidase on a poly-L-lysine (PLL)- coated glass slide in a microarray format, followed by applying substrate solution onto the enzyme microarray. Enzyme molecules are immobilized on the PLL-coated glass slide through electrostatic interactions, and no further modification of the enzyme or glass slide is needed. In situ detection of the products generated on the enzyme spots is made possible by monitoring the light intensity of each spot using a scientific-grade charged-coupled device (CCD). Reactions of substrate solutions of various types and concentrations can be carried out sequentially on one enzyme microarray. To account for the loss of enzyme from washing in between runs, a standard substrate solution is used for calibration. Substantially reduced amounts of substrate solution are consumed for each reaction on each enzyme spot. The Michaelis constant K{sub m} obtained by using this method is comparable to the result for homogeneous solutions. Absorbance detection allows universal monitoring, and no chemical modification of the substrate is needed. High-throughput studies of native enzyme kinetics for multiple enzymes are therefore possible in a simple, rapid, and low-cost manner.

  7. Development and Applications of the Lectin Microarray.

    PubMed

    Hirabayashi, Jun; Kuno, Atsushi; Tateno, Hiroaki

    2015-01-01

    The lectin microarray is an emerging technology for glycomics. It has already found maximum use in diverse fields of glycobiology by providing simple procedures for differential glycan profiling in a rapid and high-throughput manner. Since its first appearance in the literature in 2005, many application methods have been developed essentially on the same platform, comprising a series of glycan-binding proteins immobilized on an appropriate substrate such as a glass slide. Because the lectin microarray strategy does not require prior liberation of glycans from the core protein in glycoprotein analysis, it should encourage researchers not familiar with glycotechnology to use glycan analysis in future work. This feasibility should provide a broader range of experimental scientists with good opportunities to investigate novel aspects of glycoscience. Applications of the technology include not only basic sciences but also the growing fields of bio-industry. This chapter describes first the essence of glycan profiling and the basic fabrication of the lectin microarray for this purpose. In the latter part the focus is on diverse applications to both structural and functional glycomics, with emphasis on the wide applicability now available with this new technology. Finally, the importance of developing advanced lectin engineering is discussed.

  8. Metadata management and semantics in microarray repositories.

    PubMed

    Kocabaş, F; Can, T; Baykal, N

    2011-12-01

    The number of microarray and other high-throughput experiments on primary repositories keeps increasing as do the size and complexity of the results in response to biomedical investigations. Initiatives have been started on standardization of content, object model, exchange format and ontology. However, there are backlogs and inability to exchange data between microarray repositories, which indicate that there is a great need for a standard format and data management. We have introduced a metadata framework that includes a metadata card and semantic nets that make experimental results visible, understandable and usable. These are encoded in syntax encoding schemes and represented in RDF (Resource Description Frame-word), can be integrated with other metadata cards and semantic nets, and can be exchanged, shared and queried. We demonstrated the performance and potential benefits through a case study on a selected microarray repository. We concluded that the backlogs can be reduced and that exchange of information and asking of knowledge discovery questions can become possible with the use of this metadata framework. PMID:24052712

  9. Chicken sperm transcriptome profiling by microarray analysis.

    PubMed

    Singh, R P; Shafeeque, C M; Sharma, S K; Singh, R; Mohan, J; Sastry, K V H; Saxena, V K; Azeez, P A

    2016-03-01

    It has been confirmed that mammalian sperm contain thousands of functional RNAs, and some of them have vital roles in fertilization and early embryonic development. Therefore, we attempted to characterize transcriptome of the sperm of fertile chickens using microarray analysis. Spermatozoal RNA was pooled from 10 fertile males and used for RNA preparation. Prior to performing the microarray, RNA quality was assessed using a bioanalyzer, and gDNA and somatic cell RNA contamination was assessed by CD4 and PTPRC gene amplification. The chicken sperm transcriptome was cross-examined by analysing sperm and testes RNA on a 4 × 44K chicken array, and results were verified by RT-PCR. Microarray analysis identified 21,639 predominantly nuclear-encoded transcripts in chicken sperm. The majority (66.55%) of the sperm transcripts were shared with the testes, while surprisingly, 33.45% transcripts were detected (raw signal intensity greater than 50) only in the sperm and not in the testes. The greatest proportion of up-regulated transcripts were responsible for signal transduction (63.20%) followed by embryonic development (56.76%) and cell structure (56.25%). Of the 20 most abundant transcripts, 18 remain uncharacterized, whereas the least abundant genes were mostly associated with the ribosome. These findings lay a foundation for more detailed investigations on sperm RNAs in chickens to identify sperm-based biomarkers for fertility.

  10. [Genomic medicine. Polymorphisms and microarray applications].

    PubMed

    Spalvieri, Mónica P; Rotenberg, Rosa G

    2004-01-01

    This update shows new concepts related to the significance of DNA variations among individuals, as well as to their detection by using a new technology. The sequencing of the human genome is only the beginning of what will enable us to understand genetic diversity. The unit of DNA variability is the polymorphism of a single nucleotide (SNP). At present, studies on SNPs are restricted to basic research but the large number of papers on this subject makes feasible their entrance into clinical practice. We illustrate here the use of SNPs as molecular markers in ethnical genotyping, gene expression in some diseases and as potential targets in pharmacological response, and also introduce the technology of arrays. Microarrays experiments allow the quantification and comparison of gene expression on a large scale, at the same time, by using special chips and array designs. Conventional methods provide data from up to 20 genes, while a single microarray may provide information about thousands of them simultaneously, leading to a more rapid and accurate genotyping. Biotechnology improvements will facilitate our knowledge of each gene sequence, the frequency and exact location of SNPs and their influence on cellular behavior. Although experimental efficiency and validity of results from microarrays are still controversial, the knowledge and characterization of a patient's genetic profile will lead, undoubtedly, to advances in prevention, diagnosis, prognosis and treatment of human diseases. PMID:15637833

  11. A fast readout algorithm for Cluster Counting/Timing drift chambers on a FPGA board

    NASA Astrophysics Data System (ADS)

    Cappelli, L.; Creti, P.; Grancagnolo, F.; Pepino, A.; Tassielli, G.

    2013-08-01

    A fast readout algorithm for Cluster Counting and Timing purposes has been implemented and tested on a Virtex 6 core FPGA board. The algorithm analyses and stores data coming from a Helium based drift tube instrumented by 1 GSPS fADC and represents the outcome of balancing between cluster identification efficiency and high speed performance. The algorithm can be implemented in electronics boards serving multiple fADC channels as an online preprocessing stage for drift chamber signals.

  12. DNA Microarray for Detection of Gastrointestinal Viruses

    PubMed Central

    Martínez, Miguel A.; Soto-del Río, María de los Dolores; Gutiérrez, Rosa María; Chiu, Charles Y.; Greninger, Alexander L.; Contreras, Juan Francisco; López, Susana; Arias, Carlos F.

    2014-01-01

    Gastroenteritis is a clinical illness of humans and other animals that is characterized by vomiting and diarrhea and caused by a variety of pathogens, including viruses. An increasing number of viral species have been associated with gastroenteritis or have been found in stool samples as new molecular tools have been developed. In this work, a DNA microarray capable in theory of parallel detection of more than 100 viral species was developed and tested. Initial validation was done with 10 different virus species, and an additional 5 species were validated using clinical samples. Detection limits of 1 × 103 virus particles of Human adenovirus C (HAdV), Human astrovirus (HAstV), and group A Rotavirus (RV-A) were established. Furthermore, when exogenous RNA was added, the limit for RV-A detection decreased by one log. In a small group of clinical samples from children with gastroenteritis (n = 76), the microarray detected at least one viral species in 92% of the samples. Single infection was identified in 63 samples (83%), and coinfection with more than one virus was identified in 7 samples (9%). The most abundant virus species were RV-A (58%), followed by Anellovirus (15.8%), HAstV (6.6%), HAdV (5.3%), Norwalk virus (6.6%), Human enterovirus (HEV) (9.2%), Human parechovirus (1.3%), Sapporo virus (1.3%), and Human bocavirus (1.3%). To further test the specificity and sensitivity of the microarray, the results were verified by reverse transcription-PCR (RT-PCR) detection of 5 gastrointestinal viruses. The RT-PCR assay detected a virus in 59 samples (78%). The microarray showed good performance for detection of RV-A, HAstV, and calicivirus, while the sensitivity for HAdV and HEV was low. Furthermore, some discrepancies in detection of mixed infections were observed and were addressed by reverse transcription-quantitative PCR (RT-qPCR) of the viruses involved. It was observed that differences in the amount of genetic material favored the detection of the most abundant

  13. DNA microarray for detection of gastrointestinal viruses.

    PubMed

    Martínez, Miguel A; Soto-Del Río, María de Los Dolores; Gutiérrez, Rosa María; Chiu, Charles Y; Greninger, Alexander L; Contreras, Juan Francisco; López, Susana; Arias, Carlos F; Isa, Pavel

    2015-01-01

    Gastroenteritis is a clinical illness of humans and other animals that is characterized by vomiting and diarrhea and caused by a variety of pathogens, including viruses. An increasing number of viral species have been associated with gastroenteritis or have been found in stool samples as new molecular tools have been developed. In this work, a DNA microarray capable in theory of parallel detection of more than 100 viral species was developed and tested. Initial validation was done with 10 different virus species, and an additional 5 species were validated using clinical samples. Detection limits of 1 × 10(3) virus particles of Human adenovirus C (HAdV), Human astrovirus (HAstV), and group A Rotavirus (RV-A) were established. Furthermore, when exogenous RNA was added, the limit for RV-A detection decreased by one log. In a small group of clinical samples from children with gastroenteritis (n = 76), the microarray detected at least one viral species in 92% of the samples. Single infection was identified in 63 samples (83%), and coinfection with more than one virus was identified in 7 samples (9%). The most abundant virus species were RV-A (58%), followed by Anellovirus (15.8%), HAstV (6.6%), HAdV (5.3%), Norwalk virus (6.6%), Human enterovirus (HEV) (9.2%), Human parechovirus (1.3%), Sapporo virus (1.3%), and Human bocavirus (1.3%). To further test the specificity and sensitivity of the microarray, the results were verified by reverse transcription-PCR (RT-PCR) detection of 5 gastrointestinal viruses. The RT-PCR assay detected a virus in 59 samples (78%). The microarray showed good performance for detection of RV-A, HAstV, and calicivirus, while the sensitivity for HAdV and HEV was low. Furthermore, some discrepancies in detection of mixed infections were observed and were addressed by reverse transcription-quantitative PCR (RT-qPCR) of the viruses involved. It was observed that differences in the amount of genetic material favored the detection of the most abundant

  14. An overview of innovations and industrial solutions in Protein Microarray Technology.

    PubMed

    Gupta, Shabarni; Manubhai, K P; Kulkarni, Vishwesh; Srivastava, Sanjeeva

    2016-04-01

    The complexity involving protein array technology reflects in the fact that instrumentation and data analysis are subject to change depending on the biological question, technical compatibility of instruments and software used in each experiment. Industry has played a pivotal role in establishing standards for future deliberations in sustenance of these technologies in the form of protein array chips, arrayers, scanning devices, and data analysis software. This has enhanced the outreach of protein microarray technology to researchers across the globe. These have encouraged a surge in the adaptation of "nonclassical" approaches such as DNA-based protein arrays, micro-contact printing, label-free protein detection, and algorithms for data analysis. This review provides a unique overview of these industrial solutions available for protein microarray based studies. It aims at assessing the developments in various commercial platforms, thus providing a holistic overview of various modalities, options, and compatibility; summarizing the journey of this powerful high-throughput technology. PMID:27089056

  15. Contour Error Map Algorithm

    NASA Technical Reports Server (NTRS)

    Merceret, Francis; Lane, John; Immer, Christopher; Case, Jonathan; Manobianco, John

    2005-01-01

    The contour error map (CEM) algorithm and the software that implements the algorithm are means of quantifying correlations between sets of time-varying data that are binarized and registered on spatial grids. The present version of the software is intended for use in evaluating numerical weather forecasts against observational sea-breeze data. In cases in which observational data come from off-grid stations, it is necessary to preprocess the observational data to transform them into gridded data. First, the wind direction is gridded and binarized so that D(i,j;n) is the input to CEM based on forecast data and d(i,j;n) is the input to CEM based on gridded observational data. Here, i and j are spatial indices representing 1.25-km intervals along the west-to-east and south-to-north directions, respectively; and n is a time index representing 5-minute intervals. A binary value of D or d = 0 corresponds to an offshore wind, whereas a value of D or d = 1 corresponds to an onshore wind. CEM includes two notable subalgorithms: One identifies and verifies sea-breeze boundaries; the other, which can be invoked optionally, performs an image-erosion function for the purpose of attempting to eliminate river-breeze contributions in the wind fields.

  16. Statistically designing microarrays and microarray experiments to enhance sensitivity and specificity.

    PubMed

    Hsu, Jason C; Chang, Jane; Wang, Tao; Steingrímsson, Eiríkur; Magnússon, Magnús Karl; Bergsteinsdottir, Kristin

    2007-01-01

    Gene expression signatures from microarray experiments promise to provide important prognostic tools for predicting disease outcome or response to treatment. A number of microarray studies in various cancers have reported such gene signatures. However, the overlap of gene signatures in the same disease has been limited so far, and some reported signatures have not been reproduced in other populations. Clearly, the methods used for verifying novel gene signatures need improvement. In this article, we describe an experiment in which microarrays and sample hybridization are designed according to the statistical principles of randomization, replication and blocking. Our results show that such designs provide unbiased estimation of differential expression levels as well as powerful tests for them.

  17. Comments on selected fundamental aspects of microarray analysis.

    PubMed

    Riva, Alessandra; Carpentier, Anne-Sophie; Torrésani, Bruno; Hénaut, Alain

    2005-10-01

    Microarrays are becoming a ubiquitous tool of research in life sciences. However, the working principles of microarray-based methodologies are often misunderstood or apparently ignored by the researchers who actually perform and interpret experiments. This in turn seems to lead to a common over-expectation regarding the explanatory and/or knowledge-generating power of microarray analyses. In this note we intend to explain basic principles of five (5) major groups of analytical techniques used in studies of microarray data and their interpretation: the principal component analysis (PCA), the independent component analysis (ICA), the t-test, the analysis of variance (ANOVA), and self organizing maps (SOM). We discuss answers to selected practical questions related to the analysis of microarray data. We also take a closer look at the experimental setup and the rules, which have to be observed in order to exploit microarrays efficiently. Finally, we discuss in detail the scope and limitations of microarray-based methods. We emphasize the fact that no amount of statistical analysis can compensate for (or replace) a well thought through experimental setup. We conclude that microarrays are indeed useful tools in life sciences but by no means should they be expected to generate complete answers to complex biological questions. We argue that even well posed questions, formulated within a microarray-specific terminology, cannot be completely answered with the use of microarray analyses alone.

  18. ARACNe-based inference, using curated microarray data, of Arabidopsis thaliana root transcriptional regulatory networks

    PubMed Central

    2014-01-01

    Background Uncovering the complex transcriptional regulatory networks (TRNs) that underlie plant and animal development remains a challenge. However, a vast amount of data from public microarray experiments is available, which can be subject to inference algorithms in order to recover reliable TRN architectures. Results In this study we present a simple bioinformatics methodology that uses public, carefully curated microarray data and the mutual information algorithm ARACNe in order to obtain a database of transcriptional interactions. We used data from Arabidopsis thaliana root samples to show that the transcriptional regulatory networks derived from this database successfully recover previously identified root transcriptional modules and to propose new transcription factors for the SHORT ROOT/SCARECROW and PLETHORA pathways. We further show that these networks are a powerful tool to integrate and analyze high-throughput expression data, as exemplified by our analysis of a SHORT ROOT induction time-course microarray dataset, and are a reliable source for the prediction of novel root gene functions. In particular, we used our database to predict novel genes involved in root secondary cell-wall synthesis and identified the MADS-box TF XAL1/AGL12 as an unexpected participant in this process. Conclusions This study demonstrates that network inference using carefully curated microarray data yields reliable TRN architectures. In contrast to previous efforts to obtain root TRNs, that have focused on particular functional modules or tissues, our root transcriptional interactions provide an overview of the transcriptional pathways present in Arabidopsis thaliana roots and will likely yield a plethora of novel hypotheses to be tested experimentally. PMID:24739361

  19. High-throughput allogeneic antibody detection using protein microarrays.

    PubMed

    Paul, Jed; Sahaf, Bita; Perloff, Spenser; Schoenrock, Kelsi; Wu, Fang; Nakasone, Hideki; Coller, John; Miklos, David

    2016-05-01

    Enzyme-linked immunosorbent assays (ELISAs) have traditionally been used to detect alloantibodies in patient plasma samples post hematopoietic cell transplantation (HCT); however, protein microarrays have the potential to be multiplexed, more sensitive, and higher throughput than ELISAs. Here, we describe the development of a novel and sensitive microarray method for detection of allogeneic antibodies against minor histocompatibility antigens encoded on the Y chromosome, called HY antigens. Six microarray surfaces were tested for their ability to bind recombinant protein and peptide HY antigens. Significant allogeneic immune responses were determined in male patients with female donors by considering normal male donor responses as baseline. HY microarray results were also compared with our previous ELISA results. Our overall goal was to maximize antibody detection for both recombinant protein and peptide epitopes. For detection of HY antigens, the Epoxy (Schott) protein microarray surface was both most sensitive and reliable and has become the standard surface in our microarray platform. PMID:26902899

  20. Formation and characterization of DNA microarrays at silicon nitride substrates.

    PubMed

    Manning, Mary; Redmond, Gareth

    2005-01-01

    A versatile method for direct, covalent attachment of DNA microarrays at silicon nitride layers, previously deposited by chemical vapor deposition at silicon wafer substrates, is reported. Each microarray fabrication process step, from silicon nitride substrate deposition, surface cleaning, amino-silanation, and attachment of a homobifunctional cross-linking molecule to covalent immobilization of probe oligonucleotides, is defined, characterized, and optimized to yield consistent probe microarray quality, homogeneity, and probe-target hybridization performance. The developed microarray fabrication methodology provides excellent (high signal-to-background ratio) and reproducible responsivity to target oligonucleotide hybridization with a rugged chemical stability that permits exposure of arrays to stringent pre- and posthybridization wash conditions through many sustained cycles of reuse. Overall, the achieved performance features compare very favorably with those of more mature glass based microarrays. It is proposed that this DNA microarray fabrication strategy has the potential to provide a viable route toward the successful realization of future integrated DNA biochips.

  1. Sequencing by hybridization with the generic 6-mer oligonucleotide microarray : an advanced scheme for data processing.

    SciTech Connect

    Chechetkin, V. R.; Turygin, A. Y.; Proudnikov, D. Y.; Prokopenko, D. V.; Kirillov, E. V.; Mirzabekov, A. D.; Biochip Technology Center; Russian Academy of Sciences

    2000-08-01

    DNA sequencing by hybridization was carried out with a microarray of all 4{sup 6} = 4,096 hexadeoxyribonucleotides (the generic microchip). The oligonucleotides immobilized in 100 x 100 x 20-{mu}m polyacrylamide gel pads of the generic microchip were hybridized with fluorescently labeled ssDNA, providing perfect and mismatched duplexes. Melting curves were measured in parallel for all microchip duplexes with a fluorescence microscope equipped with CCD camera. This allowed us to discriminate the perfect duplexes formed by the oligonucleotides, which are complementary to the target DNA. The DNA sequence was reconstructed by overlapping the complementary oligonucleotide probes. We developed a data processing scheme to heighten the discrimination of perfect duplexes from mismatched ones. The procedure was united with a reconstruction of the DNA sequence. The scheme includes the proper definition of a discriminant signal, preprocessing, and the variational principle for the sequence indicator function. The effectiveness of the procedure was confirmed by sequencing, proofreading, and nucleotide polymorphism (mutation) analysis of 13 DNA fragments from 31 to 70 nucleotides long.

  2. Acquisition, preprocessing, and reconstruction of ultralow dose volumetric CT scout for organ-based CT scan planning

    SciTech Connect

    Yin, Zhye De Man, Bruno; Yao, Yangyang; Wu, Mingye; Montillo, Albert; Edic, Peter M.; Kalra, Mannudeep

    2015-05-15

    Purpose: Traditionally, 2D radiographic preparatory scan images (scout scans) are used to plan diagnostic CT scans. However, a 3D CT volume with a full 3D organ segmentation map could provide superior information for customized scan planning and other purposes. A practical challenge is to design the volumetric scout acquisition and processing steps to provide good image quality (at least good enough to enable 3D organ segmentation) while delivering a radiation dose similar to that of the conventional 2D scout. Methods: The authors explored various acquisition methods, scan parameters, postprocessing methods, and reconstruction methods through simulation and cadaver data studies to achieve an ultralow dose 3D scout while simultaneously reducing the noise and maintaining the edge strength around the target organ. Results: In a simulation study, the 3D scout with the proposed acquisition, preprocessing, and reconstruction strategy provided a similar level of organ segmentation capability as a traditional 240 mAs diagnostic scan, based on noise and normalized edge strength metrics. At the same time, the proposed approach delivers only 1.25% of the dose of a traditional scan. In a cadaver study, the authors’ pictorial-structures based organ localization algorithm successfully located the major abdominal-thoracic organs from the ultralow dose 3D scout obtained with the proposed strategy. Conclusions: The authors demonstrated that images with a similar degree of segmentation capability (interpretability) as conventional dose CT scans can be achieved with an ultralow dose 3D scout acquisition and suitable postprocessing. Furthermore, the authors applied these techniques to real cadaver CT scans with a CTDI dose level of less than 0.1 mGy and successfully generated a 3D organ localization map.

  3. ProMAT: protein microarray analysis tool

    SciTech Connect

    White, Amanda M.; Daly, Don S.; Varnum, Susan M.; Anderson, Kevin K.; Bollinger, Nikki; Zangar, Richard C.

    2006-04-04

    Summary: ProMAT is a software tool for statistically analyzing data from ELISA microarray experiments. The software estimates standard curves, sample protein concentrations and their uncertainties for multiple assays. ProMAT generates a set of comprehensive figures for assessing results and diagnosing process quality. The tool is available for Windows or Mac, and is distributed as open-source Java and R code. Availability: ProMAT is available at http://www.pnl.gov/statistics/ProMAT. ProMAT requires Java version 1.5.0 and R version 1.9.1 (or more recent versions) which are distributed with the tool.

  4. Protein Microarrays--Without a Trace

    SciTech Connect

    Camarero, J A

    2007-04-05

    Many experimental approaches in biology and biophysics, as well as applications in diagnosis and drug discovery, require proteins to be immobilized on solid supports. Protein microarrays, for example, provide a high-throughput format to study biomolecular interactions. The technique employed for protein immobilization is a key to the success of these applications. Recent biochemical developments are allowing, for the first time, the selective and traceless immobilization of proteins generated by cell-free systems without the need for purification and/or reconcentration prior to the immobilization step.

  5. Applications of Functional Protein Microarrays in Basic and Clinical Research

    PubMed Central

    Zhu, Heng; Qian, Jiang

    2013-01-01

    The protein microarray technology provides a versatile platform for characterization of hundreds of thousands of proteins in a highly parallel and high-throughput manner. It is viewed as a new tool that overcomes the limitation of DNA microarrays. On the basis of its application, protein microarrays fall into two major classes: analytical and functional protein microarrays. In addition, tissue or cell lysates can also be directly spotted on a slide to form the so-called “reverse-phase” protein microarray. In the last decade, applications of functional protein microarrays in particular have flourished in studying protein function and construction of networks and pathways. In this chapter, we will review the recent advancements in the protein microarray technology, followed by presenting a series of examples to illustrate the power and versatility of protein microarrays in both basic and clinical research. As a powerful technology platform, it would not be surprising if protein microarrays will become one of the leading technologies in proteomic and diagnostic fields in the next decade. PMID:22989767

  6. Refractive index change detection based on porous silicon microarray

    NASA Astrophysics Data System (ADS)

    Chen, Weirong; Jia, Zhenhong; Li, Peng; Lv, Guodong; Lv, Xiaoyi

    2016-05-01

    By combining photolithography with the electrochemical anodization method, a microarray device of porous silicon (PS) photonic crystal was fabricated on the crystalline silicon substrate. The optical properties of the microarray were analyzed with the transfer matrix method. The relationship between refractive index and reflectivity of each array element of the microarray at 633 nm was also studied, and the array surface reflectivity changes were observed through digital imaging. By means of the reflectivity measurement method, reflectivity changes below 10-3 can be observed based on PS microarray. The results of this study can be applied to the detection of biosensor arrays.

  7. Re-Annotator: Annotation Pipeline for Microarray Probe Sequences.

    PubMed

    Arloth, Janine; Bader, Daniel M; Röh, Simone; Altmann, Andre

    2015-01-01

    Microarray technologies are established approaches for high throughput gene expression, methylation and genotyping analysis. An accurate mapping of the array probes is essential to generate reliable biological findings. However, manufacturers of the microarray platforms typically provide incomplete and outdated annotation tables, which often rely on older genome and transcriptome versions that differ substantially from up-to-date sequence databases. Here, we present the Re-Annotator, a re-annotation pipeline for microarray probe sequences. It is primarily designed for gene expression microarrays but can also be adapted to other types of microarrays. The Re-Annotator uses a custom-built mRNA reference database to identify the positions of gene expression array probe sequences. We applied Re-Annotator to the Illumina Human-HT12 v4 microarray platform and found that about one quarter (25%) of the probes differed from the manufacturer's annotation. In further computational experiments on experimental gene expression data, we compared Re-Annotator to another probe re-annotation tool, ReMOAT, and found that Re-Annotator provided an improved re-annotation of microarray probes. A thorough re-annotation of probe information is crucial to any microarray analysis. The Re-Annotator pipeline is freely available at http://sourceforge.net/projects/reannotator along with re-annotated files for Illumina microarrays HumanHT-12 v3/v4 and MouseRef-8 v2.

  8. Chemiluminescence microarrays in analytical chemistry: a critical review.

    PubMed

    Seidel, Michael; Niessner, Reinhard

    2014-09-01

    Multi-analyte immunoassays on microarrays and on multiplex DNA microarrays have been described for quantitative analysis of small organic molecules (e.g., antibiotics, drugs of abuse, small molecule toxins), proteins (e.g., antibodies or protein toxins), and microorganisms, viruses, and eukaryotic cells. In analytical chemistry, multi-analyte detection by use of analytical microarrays has become an innovative research topic because of the possibility of generating several sets of quantitative data for different analyte classes in a short time. Chemiluminescence (CL) microarrays are powerful tools for rapid multiplex analysis of complex matrices. A wide range of applications for CL microarrays is described in the literature dealing with analytical microarrays. The motivation for this review is to summarize the current state of CL-based analytical microarrays. Combining analysis of different compound classes on CL microarrays reduces analysis time, cost of reagents, and use of laboratory space. Applications are discussed, with examples from food safety, water safety, environmental monitoring, diagnostics, forensics, toxicology, and biosecurity. The potential and limitations of research on multiplex analysis by use of CL microarrays are discussed in this review.

  9. Studying cellular processes and detecting disease with protein microarrays

    SciTech Connect

    Zangar, Richard C.; Varnum, Susan M.; Bollinger, Nikki

    2005-10-31

    Protein microarrays are a rapidly developing analytic tool with diverse applications in biomedical research. These applications include profiling of disease markers or autoimmune responses, understanding molecular pathways, protein modifications and protein activities. One factor that is driving this expanding usage is the wide variety of experimental formats that protein microarrays can take. In this review, we provide a short, conceptual overview of the different approaches for protein microarray. We then examine some of the most significant applications of these microarrays to date, with an emphasis on how global protein analyses can be used to facilitate biomedical research.

  10. The use of antigen microarrays in antibody profiling.

    PubMed

    Papp, Krisztián; Prechl, József

    2012-01-01

    Technological advances in the field of microarray production and analysis lead to the development of protein microarrays. Of these, antigen microarrays are one particular format that allows the study of antigen-antibody interactions in a miniaturized and highly multiplexed fashion. Here, we describe the parallel detection of antibodies with different specificities in human serum, a procedure also called antibody profiling. Autoantigens printed on microarray slides are reacted with test sera and the bound antibodies are identified by fluorescently labeled secondary reagents. Reactivity patterns generated this way characterize individuals and can help design novel diagnostic tools.

  11. Cloudy Solar Software - Enhanced Capabilities for Finding, Pre-processing, and Visualizing Solar Data

    NASA Astrophysics Data System (ADS)

    Istvan Etesi, Laszlo; Tolbert, K.; Schwartz, R.; Zarro, D.; Dennis, B.; Csillaghy, A.

    2010-05-01

    In our project "Extending the Virtual Solar Observatory (VSO)” we have combined some of the features available in Solar Software (SSW) to produce an integrated environment for data analysis, supporting the complete workflow from data location, retrieval, preparation, and analysis to creating publication-quality figures. Our goal is an integrated analysis experience in IDL, easy-to-use but flexible enough to allow more sophisticated procedures such as multi-instrument analysis. To that end, we have made the transition from a locally oriented setting where all the analysis is done on the user's computer, to an extended analysis environment where IDL has access to services available on the Internet. We have implemented a form of Cloud Computing that uses the VSO search and a new data retrieval and pre-processing server (PrepServer) that provides remote execution of instrument-specific data preparation. We have incorporated the interfaces to the VSO search and the PrepServer into an IDL widget (SHOW_SYNOP) that provides user-friendly searching and downloading of raw solar data and optionally sends search results for pre-processing to the PrepServer prior to downloading the data. The raw and pre-processed data can be displayed with our plotting suite, PLOTMAN, which can handle different data types (light curves, images, and spectra) and perform basic data operations such as zooming, image overlays, solar rotation, etc. PLOTMAN is highly configurable and suited for visual data analysis and for creating publishable figures. PLOTMAN and SHOW_SYNOP work hand-in-hand for a convenient working environment. Our environment supports a growing number of solar instruments that currently includes RHESSI, SOHO/EIT, TRACE, SECCHI/EUVI, HINODE/XRT, and HINODE/EIS.

  12. An enhanced TIMESAT algorithm for estimating vegetation phenology metrics from MODIS data

    USGS Publications Warehouse

    Tan, B.; Morisette, J.T.; Wolfe, R.E.; Gao, F.; Ederer, G.A.; Nightingale, J.; Pedelty, J.A.

    2011-01-01

    An enhanced TIMESAT algorithm was developed for retrieving vegetation phenology metrics from 250 m and 500 m spatial resolution Moderate Resolution Imaging Spectroradiometer (MODIS) vegetation indexes (VI) over North America. MODIS VI data were pre-processed using snow-cover and land surface temperature data, and temporally smoothed with the enhanced TIMESAT algorithm. An objective third derivative test was applied to define key phenology dates and retrieve a set of phenology metrics. This algorithm has been applied to two MODIS VIs: Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI). In this paper, we describe the algorithm and use EVI as an example to compare three sets of TIMESAT algorithm/MODIS VI combinations: a) original TIMESAT algorithm with original MODIS VI, b) original TIMESAT algorithm with pre-processed MODIS VI, and c) enhanced TIMESAT and pre-processed MODIS VI. All retrievals were compared with ground phenology observations, some made available through the National Phenology Network. Our results show that for MODIS data in middle to high latitude regions, snow and land surface temperature information is critical in retrieving phenology metrics from satellite observations. The results also show that the enhanced TIMESAT algorithm can better accommodate growing season start and end dates that vary significantly from year to year. The TIMESAT algorithm improvements contribute to more spatial coverage and more accurate retrievals of the phenology metrics. Among three sets of TIMESAT/MODIS VI combinations, the start of the growing season metric predicted by the enhanced TIMESAT algorithm using pre-processed MODIS VIs has the best associations with ground observed vegetation greenup dates. ?? 2010 IEEE.

  13. An Enhanced TIMESAT Algorithm for Estimating Vegetation Phenology Metrics from MODIS Data

    NASA Technical Reports Server (NTRS)

    Tan, Bin; Morisette, Jeffrey T.; Wolfe, Robert E.; Gao, Feng; Ederer, Gregory A.; Nightingale, Joanne; Pedelty, Jeffrey A.

    2012-01-01

    An enhanced TIMESAT algorithm was developed for retrieving vegetation phenology metrics from 250 m and 500 m spatial resolution Moderate Resolution Imaging Spectroradiometer (MODIS) vegetation indexes (VI) over North America. MODIS VI data were pre-processed using snow-cover and land surface temperature data, and temporally smoothed with the enhanced TIMESAT algorithm. An objective third derivative test was applied to define key phenology dates and retrieve a set of phenology metrics. This algorithm has been applied to two MODIS VIs: Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI). In this paper, we describe the algorithm and use EVI as an example to compare three sets of TIMESAT algorithm/MODIS VI combinations: a) original TIMESAT algorithm with original MODIS VI, b) original TIMESAT algorithm with pre-processed MODIS VI, and c) enhanced TIMESAT and pre-processed MODIS VI. All retrievals were compared with ground phenology observations, some made available through the National Phenology Network. Our results show that for MODIS data in middle to high latitude regions, snow and land surface temperature information is critical in retrieving phenology metrics from satellite observations. The results also show that the enhanced TIMESAT algorithm can better accommodate growing season start and end dates that vary significantly from year to year. The TIMESAT algorithm improvements contribute to more spatial coverage and more accurate retrievals of the phenology metrics. Among three sets of TIMESAT/MODIS VI combinations, the start of the growing season metric predicted by the enhanced TIMESAT algorithm using pre-processed MODIS VIs has the best associations with ground observed vegetation greenup dates.

  14. Comparative Evaluation of Preprocessing Freeware on Chromatography/Mass Spectrometry Data for Signature Discovery

    SciTech Connect

    Coble, Jamie B.; Fraga, Carlos G.

    2014-07-07

    Preprocessing software is crucial for the discovery of chemical signatures in metabolomics, chemical forensics, and other signature-focused disciplines that involve analyzing large data sets from chemical instruments. Here, four freely available and published preprocessing tools known as metAlign, MZmine, SpectConnect, and XCMS were evaluated for impurity profiling using nominal mass GC/MS data and accurate mass LC/MS data. Both data sets were previously collected from the analysis of replicate samples from multiple stocks of a nerve-agent precursor. Each of the four tools had their parameters set for the untargeted detection of chromatographic peaks from impurities present in the stocks. The peak table generated by each preprocessing tool was analyzed to determine the number of impurity components detected in all replicate samples per stock. A cumulative set of impurity components was then generated using all available peak tables and used as a reference to calculate the percent of component detections for each tool, in which 100% indicated the detection of every component. For the nominal mass GC/MS data, metAlign performed the best followed by MZmine, SpectConnect, and XCMS with detection percentages of 83, 60, 47, and 42%, respectively. For the accurate mass LC/MS data, the order was metAlign, XCMS, and MZmine with detection percentages of 80, 45, and 35%, respectively. SpectConnect did not function for the accurate mass LC/MS data. Larger detection percentages were obtained by combining the top performer with at least one of the other tools such as 96% by combining metAlign with MZmine for the GC/MS data and 93% by combining metAlign with XCMS for the LC/MS data. In terms of quantitative performance, the reported peak intensities had average absolute biases of 41, 4.4, 1.3 and 1.3% for SpectConnect, metAlign, XCMS, and MZmine, respectively, for the GC/MS data. For the LC/MS data, the average absolute biases were 22, 4.5, and 3.1% for metAlign, MZmine, and XCMS

  15. Are hybrid models integrated with data preprocessing techniques suitable for monthly streamflow forecasting? Some experiment evidences

    NASA Astrophysics Data System (ADS)

    Zhang, Xiaoli; Peng, Yong; Zhang, Chi; Wang, Bende

    2015-11-01

    A number of hydrological studies have proven the superior prediction performance of hybrid models coupled with data preprocessing techniques. However, many studies first decompose the entire data series into components and later divide each component into calibration and validation datasets to establish models, which sends some amount of future information into the decomposition and reconstruction processes. As a consequence, the resulting components used to forecast the value of a particular moment are computed using information from future values, which are not available at that particular moment in a forecasting exercise. Since most papers don't present their model framework in detail, it is difficult to identify whether they are performing a real forecast or not. Even though several other papers have explicitly stated which experiment they are performing, a comparison between results in the hindcast and forecast experiments is still missing. Therefore, it is necessary to investigate and compare the performance of these hybrid models in the two experiments in order to estimate whether they are suitable for real forecasting. With the combination of three preprocessing techniques, such as wavelet analysis (WA), empirical mode decomposition (EMD) and singular spectrum analysis (SSA), and two modeling methods (i.e. ANN model and ARMA model), six hybrid models are developed in this study, including WA-ANN, WA-ARMA, EMD-ANN, EMD-ARMA, SSA-ANN and SSA-ARMA. Preprocessing techniques are used to decompose the data series into sub-series, and then these sub-series are modeled using ANN and ARMA models. These models are examined in hindcasting and forecasting of the monthly streamflow of two sites in the Yangtze River of China. The results of this study indicate that the six hybrid models perform better in the hindcast experiment compared with the original ANN and ARMA models, while the hybrid models in the forecast experiment perform worse than the original models and the

  16. Reservoir computing with a slowly modulated mask signal for preprocessing using a mutually coupled optoelectronic system

    NASA Astrophysics Data System (ADS)

    Tezuka, Miwa; Kanno, Kazutaka; Bunsen, Masatoshi

    2016-08-01

    Reservoir computing is a machine-learning paradigm based on information processing in the human brain. We numerically demonstrate reservoir computing with a slowly modulated mask signal for preprocessing by using a mutually coupled optoelectronic system. The performance of our system is quantitatively evaluated by a chaotic time series prediction task. Our system can produce comparable performance with reservoir computing with a single feedback system and a fast modulated mask signal. We showed that it is possible to slow down the modulation speed of the mask signal by using the mutually coupled system in reservoir computing.

  17. Comparative evaluation of preprocessing freeware on chromatography/mass spectrometry data for signature discovery.

    PubMed

    Coble, Jamie B; Fraga, Carlos G

    2014-09-01

    Preprocessing software, which converts large instrumental data sets into a manageable format for data analysis, is crucial for the discovery of chemical signatures in metabolomics, chemical forensics, and other signature-focused disciplines. Here, four freely available and published preprocessing tools known as MetAlign, MZmine, SpectConnect, and XCMS were evaluated for impurity profiling using nominal mass GC/MS data and accurate mass LC/MS data. Both data sets were previously collected from the analysis of replicate samples from multiple stocks of a nerve-agent precursor and method blanks. Parameters were optimized for each of the four tools for the untargeted detection, matching, and cataloging of chromatographic peaks from impurities present in the stock samples. The peak table generated by each preprocessing tool was analyzed to determine the number of impurity components detected in all replicate samples per stock and absent in the method blanks. A cumulative set of impurity components was then generated using all available peak tables and used as a reference to calculate the percent of component detections for each tool, in which 100% indicated the detection of every known component present in a stock. For the nominal mass GC/MS data, MetAlign had the most component detections followed by MZmine, SpectConnect, and XCMS with detection percentages of 83, 60, 47, and 41%, respectively. For the accurate mass LC/MS data, the order was MetAlign, XCMS, and MZmine with detection percentages of 80, 45, and 35%, respectively. SpectConnect did not function for the accurate mass LC/MS data. Larger detection percentages were obtained by combining the top performer with at least one of the other tools such as 96% by combining MetAlign with MZmine for the GC/MS data and 93% by combining MetAlign with XCMS for the LC/MS data. In terms of quantitative performance, the reported peak intensities from each tool had averaged absolute biases (relative to peak intensities obtained

  18. Video pre-processing with JND-based Gaussian filtering of superpixels

    NASA Astrophysics Data System (ADS)

    Ding, Lei; Li, Ge; Wang, Ronggang; Wang, Wenmin

    2015-03-01

    In this paper, an innovative method of HEVC video pre-processing is proposed. The method applies a simple linear iterative clustering (SLIC), which adapts a k-means clustering to group pixels into perceptually meaningful atomic regions of superpixels. By calculating the average of weighted average of luminance differences around each pixel in the superpixel, a suitable parameter of Gaussian filter for the superpixel is determined. Experimental results show that bit rate can be reduced up to 29% without loss in visual quality.

  19. Interest rate prediction: a neuro-hybrid approach with data preprocessing

    NASA Astrophysics Data System (ADS)

    Mehdiyev, Nijat; Enke, David

    2014-07-01

    The following research implements a differential evolution-based fuzzy-type clustering method with a fuzzy inference neural network after input preprocessing with regression analysis in order to predict future interest rates, particularly 3-month T-bill rates. The empirical results of the proposed model is compared against nonparametric models, such as locally weighted regression and least squares support vector machines, along with two linear benchmark models, the autoregressive model and the random walk model. The root mean square error is reported for comparison.

  20. Fractional Fourier transform pre-processing for neural networks and its application to object recognition.

    PubMed

    Barshan, Billur; Ayrulu, Birsel

    2002-01-01

    This study investigates fractional Fourier transform pre-processing of input signals to neural networks. The fractional Fourier transform is a generalization of the ordinary Fourier transform with an order parameter a. Judicious choice of this parameter can lead to overall improvement of the neural network performance. As an illustrative example, we consider recognition and position estimation of different types of objects based on their sonar returns. Raw amplitude and time-of-flight patterns acquired from a real sonar system are processed, demonstrating reduced error in both recognition and position estimation of objects.

  1. Automated prostate cancer diagnosis and Gleason grading of tissue microarrays

    NASA Astrophysics Data System (ADS)

    Tabesh, Ali; Kumar, Vinay P.; Pang, Ho-Yuen; Verbel, David; Kotsianti, Angeliki; Teverovskiy, Mikhail; Saidi, Olivier

    2005-04-01

    We present the results on the development of an automated system for prostate cancer diagnosis and Gleason grading. Images of representative areas of the original Hematoxylin-and-Eosin (H&E)-stained tissue retrieved from each patient, either from a tissue microarray (TMA) core or whole section, were captured and analyzed. The image sets consisted of 367 and 268 color images for the diagnosis and Gleason grading problems, respectively. In diagnosis, the goal is to classify a tissue image into tumor versus non-tumor classes. In Gleason grading, which characterizes tumor aggressiveness, the objective is to classify a tissue image as being from either a low- or high-grade tumor. Several feature sets were computed from the image. The feature sets considered were: (i) color channel histograms, (ii) fractal dimension features, (iii) fractal code features, (iv) wavelet features, and (v) color, shape and texture features computed using Aureon Biosciences' MAGIC system. The linear and quadratic Gaussian classifiers together with a greedy search feature selection algorithm were used. For cancer diagnosis, a classification accuracy of 94.5% was obtained on an independent test set. For Gleason grading, the achieved accuracy of classification into low- and high-grade classes of an independent test set was 77.6%.

  2. Discovering Pair-wise Synergies in Microarray Data

    PubMed Central

    Chen, Yuan; Cao, Dan; Gao, Jun; Yuan, Zheming

    2016-01-01

    Informative gene selection can have important implications for the improvement of cancer diagnosis and the identification of new drug targets. Individual-gene-ranking methods ignore interactions between genes. Furthermore, popular pair-wise gene evaluation methods, e.g. TSP and TSG, are helpless for discovering pair-wise interactions. Several efforts to discover pair-wise synergy have been made based on the information approach, such as EMBP and FeatKNN. However, the methods which are employed to estimate mutual information, e.g. binarization, histogram-based and KNN estimators, depend on known data or domain characteristics. Recently, Reshef et al. proposed a novel maximal information coefficient (MIC) measure to capture a wide range of associations between two variables that has the property of generality. An extension from MIC(X; Y) to MIC(X1; X2; Y) is therefore desired. We developed an approximation algorithm for estimating MIC(X1; X2; Y) where Y is a discrete variable. MIC(X1; X2; Y) is employed to detect pair-wise synergy in simulation and cancer microarray data. The results indicate that MIC(X1; X2; Y) also has the property of generality. It can discover synergic genes that are undetectable by reference feature selection methods such as MIC(X; Y) and TSG. Synergic genes can distinguish different phenotypes. Finally, the biological relevance of these synergic genes is validated with GO annotation and OUgene database. PMID:27470995

  3. Fine-scaled human genetic structure revealed by SNP microarrays.

    PubMed

    Xing, Jinchuan; Watkins, W Scott; Witherspoon, David J; Zhang, Yuhua; Guthery, Stephen L; Thara, Rangaswamy; Mowry, Bryan J; Bulayeva, Kazima; Weiss, Robert B; Jorde, Lynn B

    2009-05-01

    We report an analysis of more than 240,000 loci genotyped using the Affymetrix SNP microarray in 554 individuals from 27 worldwide populations in Africa, Asia, and Europe. To provide a more extensive and complete sampling of human genetic variation, we have included caste and tribal samples from two states in South India, Daghestanis from eastern Europe, and the Iban from Malaysia. Consistent with observations made by Charles Darwin, our results highlight shared variation among human populations and demonstrate that much genetic variation is geographically continuous. At the same time, principal components analyses reveal discernible genetic differentiation among almost all identified populations in our sample, and in most cases, individuals can be clearly assigned to defined populations on the basis of SNP genotypes. All individuals are accurately classified into continental groups using a model-based clustering algorithm, but between closely related populations, genetic and self-classifications conflict for some individuals. The 250K data permitted high-level resolution of genetic variation among Indian caste and tribal populations and between highland and lowland Daghestani populations. In particular, upper-caste individuals from Tamil Nadu and Andhra Pradesh form one defined group, lower-caste individuals from these two states form another, and the tribal Irula samples form a third. Our results emphasize the correlation of genetic and geographic distances and highlight other elements, including social factors that have contributed to population structure. PMID:19411602

  4. Discovering Pair-wise Synergies in Microarray Data.

    PubMed

    Chen, Yuan; Cao, Dan; Gao, Jun; Yuan, Zheming

    2016-01-01

    Informative gene selection can have important implications for the improvement of cancer diagnosis and the identification of new drug targets. Individual-gene-ranking methods ignore interactions between genes. Furthermore, popular pair-wise gene evaluation methods, e.g. TSP and TSG, are helpless for discovering pair-wise interactions. Several efforts to discover pair-wise synergy have been made based on the information approach, such as EMBP and FeatKNN. However, the methods which are employed to estimate mutual information, e.g. binarization, histogram-based and KNN estimators, depend on known data or domain characteristics. Recently, Reshef et al. proposed a novel maximal information coefficient (MIC) measure to capture a wide range of associations between two variables that has the property of generality. An extension from MIC(X; Y) to MIC(X1; X2; Y) is therefore desired. We developed an approximation algorithm for estimating MIC(X1; X2; Y) where Y is a discrete variable. MIC(X1; X2; Y) is employed to detect pair-wise synergy in simulation and cancer microarray data. The results indicate that MIC(X1; X2; Y) also has the property of generality. It can discover synergic genes that are undetectable by reference feature selection methods such as MIC(X; Y) and TSG. Synergic genes can distinguish different phenotypes. Finally, the biological relevance of these synergic genes is validated with GO annotation and OUgene database. PMID:27470995

  5. Multiplex planar microarrays for disease prognosis, diagnosis and theranosis

    PubMed Central

    Lea, Peter

    2015-01-01

    Advanced diagnostic methods and algorithms for immune disorders provide qualitative and quantitative multiplex measurement for pre-clinical prognostic and clinical diagnostic biomarkers specific for diseases. Choice of therapy is confirmed by modulating diagnostic efficacy of companion, theranotic drug concentrations. Assay methods identify, monitor and manage autoimmune diseases, or risk thereof, in subjects who have, or who are related to individuals with autoimmune disease. These same diagnostic protocols also integrate qualitative and quantitative assay test protocol designs for responder patient assessment, risk analysis and management of disease when integrating multiplex planar microarray diagnostic tests, patient theranostic companion diagnostic methods and test panels for simultaneous assessment and management of dysimmune and inflammatory disorders, autoimmunity, allergy and cancer. Proprietary assay methods are provided to identify, monitor and manage dysimmune conditions, or risk thereof, in subjects with pathological alterations in the immune system, or who are related to individuals with these conditions. The protocols can be used for confirmatory testing of subjects who exhibit symptoms of dysimmunity, as well as subjects who are apparently healthy and do not exhibit symptoms of altered immune function. The protocols also provide for methods of determining whether a subject has, is at risk for, or is a candidate for disease therapy, guided by companion diagnosis and immunosuppressive therapy, as well as therapeutic drug monitoring and theranostic testing of disease biomarkers in response to immuno-absorption therapy. The multiplex test panels provide the components that are integral for performing the methods to recognized clinical standards. PMID:26309820

  6. Fine-scaled human genetic structure revealed by SNP microarrays.

    PubMed

    Xing, Jinchuan; Watkins, W Scott; Witherspoon, David J; Zhang, Yuhua; Guthery, Stephen L; Thara, Rangaswamy; Mowry, Bryan J; Bulayeva, Kazima; Weiss, Robert B; Jorde, Lynn B

    2009-05-01

    We report an analysis of more than 240,000 loci genotyped using the Affymetrix SNP microarray in 554 individuals from 27 worldwide populations in Africa, Asia, and Europe. To provide a more extensive and complete sampling of human genetic variation, we have included caste and tribal samples from two states in South India, Daghestanis from eastern Europe, and the Iban from Malaysia. Consistent with observations made by Charles Darwin, our results highlight shared variation among human populations and demonstrate that much genetic variation is geographically continuous. At the same time, principal components analyses reveal discernible genetic differentiation among almost all identified populations in our sample, and in most cases, individuals can be clearly assigned to defined populations on the basis of SNP genotypes. All individuals are accurately classified into continental groups using a model-based clustering algorithm, but between closely related populations, genetic and self-classifications conflict for some individuals. The 250K data permitted high-level resolution of genetic variation among Indian caste and tribal populations and between highland and lowland Daghestani populations. In particular, upper-caste individuals from Tamil Nadu and Andhra Pradesh form one defined group, lower-caste individuals from these two states form another, and the tribal Irula samples form a third. Our results emphasize the correlation of genetic and geographic distances and highlight other elements, including social factors that have contributed to population structure.

  7. Intensity-based segmentation of microarray images.

    PubMed

    Nagarajan, Radhakrishnan

    2003-07-01

    The underlying principle in microarray image analysis is that the spot intensity is a measure of the gene expression. This implicitly assumes the gene expression of a spot to be governed entirely by the distribution of the pixel intensities. Thus, a segmentation technique based on the distribution of the pixel intensities is appropriate for the current problem. In this paper, clustering-based segmentation is described to extract the target intensity of the spots. The approximate boundaries of the spots in the microarray are determined by manual adjustment of rectilinear grids. The distribution of the pixel intensity in a grid containing a spot is assumed to be the superposition of the foreground and the local background. The k-means clustering technique and the partitioning around medoids (PAM) were used to generate a binary partition of the pixel intensity distribution. The median (k-means) and the medoid (PAM) of the cluster members are chosen as the cluster representatives. The effectiveness of the clustering-based segmentation techniques was tested on publicly available arrays generated in a lipid metabolism experiment (Callow et al., 2000). The results are compared against those obtained using the region-growing approach (SPOT) (Yang et al., 2001). The effect of additive white Gaussian noise is also investigated. PMID:12906242

  8. Microarray analysis of the developing cortex.

    PubMed

    Semeralul, Mawahib O; Boutros, Paul C; Likhodi, Olga; Okey, Allan B; Van Tol, Hubert H M; Wong, Albert H C

    2006-12-01

    Abnormal development of the prefrontal cortex (PFC) is associated with a number of neuropsychiatric disorders that have an onset in childhood or adolescence. Although the basic laminar structure of the PFC is established in utero, extensive remodeling continues into adolescence. To map the overall pattern of changes in cortical gene transcripts during postnatal development, we made serial measurements of mRNA levels in mouse PFC using oligonucleotide microarrays. We observed changes in mRNA transcripts consistent with known postnatal morphological and biochemical events. Overall, most transcripts that changed significantly showed a progressive decrease in abundance after birth, with the majority of change between postnatal weeks 2 and 4. Genes with cell proliferative, cytoskeletal, extracellular matrix, plasma membrane lipid/transport, protein folding, and regulatory functions had decreases in mRNA levels. Quantitative PCR verified the microarray results for six selected genes: DNA methyltransferase 3A (Dnmt3a), procollagen, type III, alpha 1 (Col3a1), solute carrier family 16 (monocarboxylic acid transporters), member 1 (Slc16a1), MARCKS-like 1 (Marcksl1), nidogen 1 (Nid1) and 3-hydroxybutyrate dehydrogenase (heart, mitochondrial) (Bdh).

  9. Enzyme Microarrays Assembled by Acoustic Dispensing Technology

    PubMed Central

    Wong, E. Y.; Diamond, S. L.

    2008-01-01

    Miniaturizing bioassays to the nanoliter scale for high-throughput screening reduces the consumption of reagents that are expensive or difficult to handle. Utilizing acoustic dispensing technology, nanodroplets containing 10 µM ATP (3 µCi/µL 32P) and reaction buffer in 10% glycerol were positionally dispensed to the surface of glass slides to form 40 nL compartments (100 droplets/slide) for Pim1 (Proviral integration site 1) kinase reactions. The reactions were activated by dispensing 4 nL of various levels of a pyridocarbazolo-cyclopentadienyl ruthenium-complex Pim1 inhibitor, followed by dispensing 4 nL of a Pim1 kinase and peptide substrate solution to achieve final concentrations of 150 nM enzyme and 10 µM substrate. The microarray was incubated at 30°C (97% Rh) for 1.5 hr. The spots were then blotted to phosphocellulose membranes to capture phosphorylated substrate. Using phosphor imaging to quantify the washed membranes, the assay showed that, for doses of inhibitor from 0.75 µM to 3 µM, Pim1 was increasingly inhibited. Signal-to-background ratios were as high as 165 and average coefficients of variation (CVs) for the assay were ~20%. CVs for dispensing typical working buffers were under 5%. Thus, microarrays assembled by acoustic dispensing are promising as cost-effective tools that can be used in protein assay development. PMID:18616925

  10. Laser direct writing of biomolecule microarrays

    NASA Astrophysics Data System (ADS)

    Serra, P.; Fernández-Pradas, J. M.; Berthet, F. X.; Colina, M.; Elvira, J.; Morenza, J. L.

    Protein-based biosensors are highly efficient tools for protein detection and identification. The production of these devices requires the manipulation of tiny amounts of protein solutions in conditions preserving their biological properties. In this work, laser induced forward transfer (LIFT) was used for spotting an array of a purified bacterial antigen in order to check the viability of this technique for the production of protein microarrays. A pulsed Nd:YAG laser beam (355 nm wavelength, 10 ns pulse duration) was used to transfer droplets of a solution containing the Treponema pallidum 17 kDa protein antigen on a glass slide. Optical microscopy showed that a regular array of micrometric droplets could be precisely and uniformly spotted onto a solid substrate. Subsequently, it was proved that LIFT deposition of a T. pallidum 17 kDa antigen onto nylon-coated glass slides preserves its antigenic reactivity and diagnostic properties. These results support that LIFT is suitable for the production of protein microarrays and pave the way for future diagnostics applications.

  11. TAMEE: data management and analysis for tissue microarrays

    PubMed Central

    Thallinger, Gerhard G; Baumgartner, Kerstin; Pirklbauer, Martin; Uray, Martina; Pauritsch, Elke; Mehes, Gabor; Buck, Charles R; Zatloukal, Kurt; Trajanoski, Zlatko

    2007-01-01

    Background With the introduction of tissue microarrays (TMAs) researchers can investigate gene and protein expression in tissues on a high-throughput scale. TMAs generate a wealth of data calling for extended, high level data management. Enhanced data analysis and systematic data management are required for traceability and reproducibility of experiments and provision of results in a timely and reliable fashion. Robust and scalable applications have to be utilized, which allow secure data access, manipulation and evaluation for researchers from different laboratories. Results TAMEE (Tissue Array Management and Evaluation Environment) is a web-based database application for the management and analysis of data resulting from the production and application of TMAs. It facilitates storage of production and experimental parameters, of images generated throughout the TMA workflow, and of results from core evaluation. Database content consistency is achieved using structured classifications of parameters. This allows the extraction of high quality results for subsequent biologically-relevant data analyses. Tissue cores in the images of stained tissue sections are automatically located and extracted and can be evaluated using a set of predefined analysis algorithms. Additional evaluation algorithms can be easily integrated into the application via a plug-in interface. Downstream analysis of results is facilitated via a flexible query generator. Conclusion We have developed an integrated system tailored to the specific needs of research projects using high density TMAs. It covers the complete workflow of TMA production, experimental use and subsequent analysis. The system is freely available for academic and non-profit institutions from . PMID:17343750

  12. Label-free detection repeatability of protein microarrays by oblique-incidence reflectivity difference method

    NASA Astrophysics Data System (ADS)

    Dai, Jun; Li, Lin; Wang, JingYi; He, LiPing; Lu, HuiBin; Ruan, KangCheng; Jin, KuiJuan; Yang, GuoZhen

    2012-12-01

    We examine the repeatabilities of oblique-incidence reflectivity difference (OIRD) method for label-free detecting biological molecular interaction using protein microarrays. The experimental results show that the repeatabilities are the same in a given microarray or microarray-microarray and are consistent, indicating that OIRD is a promising label-free detection technique for biological microarrays.

  13. Differentiation of whole bacterial cells based on high-throughput microarray chip printing and infrared microspectroscopic readout.

    PubMed

    Al-Khaldi, Sufian F; Mossoba, Magdi M; Burke, Tara L; Fry, Frederick S

    2009-10-01

    Using robotic automation, a microarray printing protocol for whole bacterial cells was developed for subsequent label-free and nondestructive infrared microspectroscopic detection. Using this contact microspotting system, 24 microorganisms were printed on zinc selenide slides; these were 6 species of Listeria, 10 species of Vibrio, 2 strains of Photobacterium damselae, Yersinia enterocolitica 289, Bacillus cereus ATCC 14529, Staphylococcus aureus, ATCC 19075 (serotype 104 B), Shigella sonnei 20143, Klebsiella pneumoniae KP73, Enterobacter cloacae, Citrobacter freundii 200, and Escherichia coli. Microarrays consisting of separate spots of bacterial deposits gave consistent and reproducible infrared spectra, which were differentiated by unsupervised pattern recognition algorithms. Two multivariate analysis algorithms, principal component analysis and hierarchical cluster analysis, successfully separated most, but not all, the bacteria investigated down to the species level.

  14. Differentiation of whole bacterial cells based on high-throughput microarray chip printing and infrared microspectroscopic readout.

    PubMed

    Al-Khaldi, Sufian F; Mossoba, Magdi M; Burke, Tara L; Fry, Frederick S

    2009-10-01

    Using robotic automation, a microarray printing protocol for whole bacterial cells was developed for subsequent label-free and nondestructive infrared microspectroscopic detection. Using this contact microspotting system, 24 microorganisms were printed on zinc selenide slides; these were 6 species of Listeria, 10 species of Vibrio, 2 strains of Photobacterium damselae, Yersinia enterocolitica 289, Bacillus cereus ATCC 14529, Staphylococcus aureus, ATCC 19075 (serotype 104 B), Shigella sonnei 20143, Klebsiella pneumoniae KP73, Enterobacter cloacae, Citrobacter freundii 200, and Escherichia coli. Microarrays consisting of separate spots of bacterial deposits gave consistent and reproducible infrared spectra, which were differentiated by unsupervised pattern recognition algorithms. Two multivariate analysis algorithms, principal component analysis and hierarchical cluster analysis, successfully separated most, but not all, the bacteria investigated down to the species level. PMID:19630511

  15. Genetic algorithms

    NASA Technical Reports Server (NTRS)

    Wang, Lui; Bayer, Steven E.

    1991-01-01

    Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.

  16. An automated blood vessel segmentation algorithm using histogram equalization and automatic threshold selection.

    PubMed

    Saleh, Marwan D; Eswaran, C; Mueen, Ahmed

    2011-08-01

    This paper focuses on the detection of retinal blood vessels which play a vital role in reducing the proliferative diabetic retinopathy and for preventing the loss of visual capability. The proposed algorithm which takes advantage of the powerful preprocessing techniques such as the contrast enhancement and thresholding offers an automated segmentation procedure for retinal blood vessels. To evaluate the performance of the new algorithm, experiments are conducted on 40 images collected from DRIVE database. The results show that the proposed algorithm performs better than the other known algorithms in terms of accuracy. Furthermore, the proposed algorithm being simple and easy to implement, is best suited for fast processing applications.

  17. Data Pre-Processing Method to Remove Interference of Gas Bubbles and Cell Clusters During Anaerobic and Aerobic Yeast Fermentations in a Stirred Tank Bioreactor

    NASA Astrophysics Data System (ADS)

    Princz, S.; Wenzel, U.; Miller, R.; Hessling, M.

    2014-11-01

    One aerobic and four anaerobic batch fermentations of the yeast Saccharomyces cerevisiae were conducted in a stirred bioreactor and monitored inline by NIR spectroscopy and a transflectance dip probe. From the acquired NIR spectra, chemometric partial least squares regression (PLSR) models for predicting biomass, glucose and ethanol were constructed. The spectra were directly measured in the fermentation broth and successfully inspected for adulteration using our novel data pre-processing method. These adulterations manifested as strong fluctuations in the shape and offset of the absorption spectra. They resulted from cells, cell clusters, or gas bubbles intercepting the optical path of the dip probe. In the proposed data pre-processing method, adulterated signals are removed by passing the time-scanned non-averaged spectra through two filter algorithms with a 5% quantile cutoff. The filtered spectra containing meaningful data are then averaged. A second step checks whether the whole time scan is analyzable. If true, the average is calculated and used to prepare the PLSR models. This new method distinctly improved the prediction results. To dissociate possible correlations between analyte concentrations, such as glucose and ethanol, the feeding analytes were alternately supplied at different concentrations (spiking) at the end of the four anaerobic fermentations. This procedure yielded low-error (anaerobic) PLSR models for predicting analyte concentrations of 0.31 g/l for biomass, 3.41 g/l for glucose, and 2.17 g/l for ethanol. The maximum concentrations were 14 g/l biomass, 167 g/l glucose, and 80 g/l ethanol. Data from the aerobic fermentation, carried out under high agitation and high aeration, were incorporated to realize combined PLSR models, which have not been previously reported to our knowledge.

  18. An ANN-GA model based promoter prediction in Arabidopsis thaliana using tilling microarray data

    PubMed Central

    Mishra, Hrishikesh; Singh, Nitya; Misra, Krishna; Lahiri, Tapobrata

    2011-01-01

    Identification of promoter region is an important part of gene annotation. Identification of promoters in eukaryotes is important as promoters modulate various metabolic functions and cellular stress responses. In this work, a novel approach utilizing intensity values of tilling microarray data for a model eukaryotic plant Arabidopsis thaliana, was used to specify promoter region from non-promoter region. A feed-forward back propagation neural network model supported by genetic algorithm was employed to predict the class of data with a window size of 41. A dataset comprising of 2992 data vectors representing both promoter and non-promoter regions, chosen randomly from probe intensity vectors for whole genome of Arabidopsis thaliana generated through tilling microarray technique was used. The classifier model shows prediction accuracy of 69.73% and 65.36% on training and validation sets, respectively. Further, a concept of distance based class membership was used to validate reliability of classifier, which showed promising results. The study shows the usability of micro-array probe intensities to predict the promoter regions in eukaryotic genomes. PMID:21887014

  19. GEPAS, a web-based tool for microarray data analysis and interpretation

    PubMed Central

    Tárraga, Joaquín; Medina, Ignacio; Carbonell, José; Huerta-Cepas, Jaime; Minguez, Pablo; Alloza, Eva; Al-Shahrour, Fátima; Vegas-Azcárate, Susana; Goetz, Stefan; Escobar, Pablo; Garcia-Garcia, Francisco; Conesa, Ana; Montaner, David; Dopazo, Joaquín

    2008-01-01

    Gene Expression Profile Analysis Suite (GEPAS) is one of the most complete and extensively used web-based packages for microarray data analysis. During its more than 5 years of activity it has continuously been updated to keep pace with the state-of-the-art in the changing microarray data analysis arena. GEPAS offers diverse analysis options that include well established as well as novel algorithms for normalization, gene selection, class prediction, clustering and functional profiling of the experiment. New options for time-course (or dose-response) experiments, microarray-based class prediction, new clustering methods and new tests for differential expression have been included. The new pipeliner module allows automating the execution of sequential analysis steps by means of a simple but powerful graphic interface. An extensive re-engineering of GEPAS has been carried out which includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. GEPAS is nowadays the most quoted web tool in its field and it is extensively used by researchers of many countries and its records indicate an average usage rate of 500 experiments per day. GEPAS, is available at http://www.gepas.org. PMID:18508806

  20. ZODET: Software for the Identification, Analysis and Visualisation of Outlier Genes in Microarray Expression Data

    PubMed Central

    Roden, Daniel L.; Sewell, Gavin W.; Lobley, Anna; Levine, Adam P.; Smith, Andrew M.; Segal, Anthony W.

    2014-01-01

    Summary Complex human diseases can show significant heterogeneity between patients with the same phenotypic disorder. An outlier detection strategy was developed to identify variants at the level of gene transcription that are of potential biological and phenotypic importance. Here we describe a graphical software package (z-score outlier detection (ZODET)) that enables identification and visualisation of gross abnormalities in gene expression (outliers) in individuals, using whole genome microarray data. Mean and standard deviation of expression in a healthy control cohort is used to detect both over and under-expressed probes in individual test subjects. We compared the potential of ZODET to detect outlier genes in gene expression datasets with a previously described statistical method, gene tissue index (GTI), using a simulated expression dataset and a publicly available monocyte-derived macrophage microarray dataset. Taken together, these results support ZODET as a novel approach to identify outlier genes of potential pathogenic relevance in complex human diseases. The algorithm is implemented using R packages and Java. Availability The software is freely available from http://www.ucl.ac.uk/medicine/molecular-medicine/publications/microarray-outlier-analysis. PMID:24416128

  1. Gene selection and classification for cancer microarray data based on machine learning and similarity measures

    PubMed Central

    2011-01-01

    Background Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. Results To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. Conclusions On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF. PMID:22369383

  2. Unsupervised assessment of microarray data quality using a Gaussian mixture model

    PubMed Central

    Howard, Brian E; Sick, Beate; Heber, Steffen

    2009-01-01

    Background Quality assessment of microarray data is an important and often challenging aspect of gene expression analysis. This task frequently involves the examination of a variety of summary statistics and diagnostic plots. The interpretation of these diagnostics is often subjective, and generally requires careful expert scrutiny. Results We show how an unsupervised classification technique based on the Expectation-Maximization (EM) algorithm and the naïve Bayes model can be used to automate microarray quality assessment. The method is flexible and can be easily adapted to accommodate alternate quality statistics and platforms. We evaluate our approach using Affymetrix 3' gene expression and exon arrays and compare the performance of this method to a similar supervised approach. Conclusion This research illustrates the efficacy of an unsupervised classification approach for the purpose of automated microarray data quality assessment. Since our approach requires only unannotated training data, it is easy to customize and to keep up-to-date as technology evolves. In contrast to other "black box" classification systems, this method also allows for intuitive explanations. PMID:19545436

  3. Determination of B-Cell Epitopes in Patients with Celiac Disease: Peptide Microarrays

    PubMed Central

    Choung, Rok Seon; Marietta, Eric V.; Van Dyke, Carol T.; Brantner, Tricia L.; Rajasekaran, John; Pasricha, Pankaj J.; Wang, Tianhao; Bei, Kang; Krishna, Karthik; Krishnamurthy, Hari K.; Snyder, Melissa R.; Jayaraman, Vasanth; Murray, Joseph A.

    2016-01-01

    Background Most antibodies recognize conformational or discontinuous epitopes that have a specific 3-dimensional shape; however, determination of discontinuous B-cell epitopes is a major challenge in bioscience. Moreover, the current methods for identifying peptide epitopes often involve laborious, high-cost peptide screening programs. Here, we present a novel microarray method for identifying discontinuous B-cell epitopes in celiac disease (CD) by using a silicon-based peptide array and computational methods. Methods Using a novel silicon-based microarray platform with a multi-pillar chip, overlapping 12-mer peptide sequences of all native and deamidated gliadins, which are known to trigger CD, were synthesized in situ and used to identify peptide epitopes. Results Using a computational algorithm that considered disease specificity of peptide sequences, 2 distinct epitope sets were identified. Further, by combining the most discriminative 3-mer gliadin sequences with randomly interpolated3- or 6-mer peptide sequences, novel discontinuous epitopes were identified and further optimized to maximize disease discrimination. The final discontinuous epitope sets were tested in a confirmatory cohort of CD patients and controls, yielding 99% sensitivity and 100% specificity. Conclusions These novel sets of epitopes derived from gliadin have a high degree of accuracy in differentiating CD from controls, compared with standard serologic tests. The method of ultra-high-density peptide microarray described here would be broadly useful to develop high-fidelity diagnostic tests and explore pathogenesis. PMID:26824466

  4. Hough transform algorithm for real-time pattern recognition using an artificial retina camera

    NASA Astrophysics Data System (ADS)

    Lin, Xin; Otobe, Kazunori

    2001-04-01

    An artificial retina camera (ARC) is employed for real-time preprocessing of images. And the algorithm of Hough transform is advanced for detecting the biology-images with approximate circle edge-information in the two-dimension space. This method also works in parallel for processing multiple input and partial input patterns.

  5. Applying Enhancement Filters in the Pre-processing of Images of Lymphoma

    NASA Astrophysics Data System (ADS)

    Henrique Silva, Sérgio; Zanchetta do Nascimento, Marcelo; Alves Neves, Leandro; Ramos Batista, Valério

    2015-01-01

    Lymphoma is a type of cancer that affects the immune system, and is classified as Hodgkin or non-Hodgkin. It is one of the ten types of cancer that are the most common on earth. Among all malignant neoplasms diagnosed in the world, lymphoma ranges from three to four percent of them. Our work presents a study of some filters devoted to enhancing images of lymphoma at the pre-processing step. Here the enhancement is useful for removing noise from the digital images. We have analysed the noise caused by different sources like room vibration, scraps and defocusing, and in the following classes of lymphoma: follicular, mantle cell and B-cell chronic lymphocytic leukemia. The filters Gaussian, Median and Mean-Shift were applied to different colour models (RGB, Lab and HSV). Afterwards, we performed a quantitative analysis of the images by means of the Structural Similarity Index. This was done in order to evaluate the similarity between the images. In all cases we have obtained a certainty of at least 75%, which rises to 99% if one considers only HSV. Namely, we have concluded that HSV is an important choice of colour model at pre-processing histological images of lymphoma, because in this case the resulting image will get the best enhancement.

  6. Preprocessing of A-scan GPR data based on energy features

    NASA Astrophysics Data System (ADS)

    Dogan, Mesut; Turhan-Sayan, Gonul

    2016-05-01

    There is an increasing demand for noninvasive real-time detection and classification of buried objects in various civil and military applications. The problem of detection and annihilation of landmines is particularly important due to strong safety concerns. The requirement for a fast real-time decision process is as important as the requirements for high detection rates and low false alarm rates. In this paper, we introduce and demonstrate a computationally simple, timeefficient, energy-based preprocessing approach that can be used in ground penetrating radar (GPR) applications to eliminate reflections from the air-ground boundary and to locate the buried objects, simultaneously, at one easy step. The instantaneous power signals, the total energy values and the cumulative energy curves are extracted from the A-scan GPR data. The cumulative energy curves, in particular, are shown to be useful to detect the presence and location of buried objects in a fast and simple way while preserving the spectral content of the original A-scan data for further steps of physics-based target classification. The proposed method is demonstrated using the GPR data collected at the facilities of IPA Defense, Ankara at outdoor test lanes. Cylindrically shaped plastic containers were buried in fine-medium sand to simulate buried landmines. These plastic containers were half-filled by ammonium nitrate including metal pins. Results of this pilot study are demonstrated to be highly promising to motivate further research for the use of energy-based preprocessing features in landmine detection problem.

  7. Pre-Processing and Cross-Correlation Techniques for Time-Distance Helioseismology

    NASA Astrophysics Data System (ADS)

    Wang, N.; de Ridder, S.; Zhao, J.

    2014-12-01

    In chaotic wave fields excited by a random distribution of noise sources a cross-correlation of the recordings made at two stations yield the interstation wave-field response. After early successes in helioseismology, laboratory studies and earth-seismology, this technique found broad application in global and regional seismology. This development came with an increasing understanding of pre-processing and cross-correlation workflows to yield an optimal signal-to-noise ratio (SNR). Helioseismologist rely heavily on stacking to increase the SNR. Until now, they have not studied different spectral-whitening and cross-correlation workflows and relies heavily on stacking to increase the SNR. The recordings vary considerably between sunspots and regular portions of the sun. Within the sunspot the periodic effects of the observation satellite orbit are difficult to remove. We remove a running alpha-mean from the data and apply a soft clip to deal with data glitches. The recordings contain energy of both flow and waves. A frequency domain filter selects the wave energy. Then the data is input to several pre-processing and cross-correlation techniques, common to earth seismology. We anticipate that spectral whitening will flatten the energy spectrum of the cross-correlations. We also expect that the cross-correlations converge faster to their expected value when the data is processed over overlapping windows. The result of this study are expected to aid in decreasing the stacking while maintaining good SNR.

  8. A Technical Review on Biomass Processing: Densification, Preprocessing, Modeling and Optimization

    SciTech Connect

    Jaya Shankar Tumuluru; Christopher T. Wright

    2010-06-01

    It is now a well-acclaimed fact that burning fossil fuels and deforestation are major contributors to climate change. Biomass from plants can serve as an alternative renewable and carbon-neutral raw material for the production of bioenergy. Low densities of 40–60 kg/m3 for lignocellulosic and 200–400 kg/m3 for woody biomass limits their application for energy purposes. Prior to use in energy applications these materials need to be densified. The densified biomass can have bulk densities over 10 times the raw material helping to significantly reduce technical limitations associated with storage, loading and transportation. Pelleting, briquetting, or extrusion processing are commonly used methods for densification. The aim of the present research is to develop a comprehensive review of biomass processing that includes densification, preprocessing, modeling and optimization. The specific objective include carrying out a technical review on (a) mechanisms of particle bonding during densification; (b) methods of densification including extrusion, briquetting, pelleting, and agglomeration; (c) effects of process and feedstock variables and biomass biochemical composition on the densification (d) effects of preprocessing such as grinding, preheating, steam explosion, and torrefaction on biomass quality and binding characteristics; (e) models for understanding the compression characteristics; and (f) procedures for response surface modeling and optimization.

  9. Star sensor image acquisition and preprocessing hardware system based on CMOS image sensor and FGPA

    NASA Astrophysics Data System (ADS)

    Hao, Xuetao; Jiang, Jie; Zhang, Guangjun

    2003-09-01

    Star Sensor is an avionics instrument used to provide the absolute 3-axis attitude of a spacecraft utilizing star observations. It consists of an electronic camera and associated processing electronics. As outcome of advancing state-of-the-art, new generation star sensor features faster, lower cost, power dissipation and size than the first generation star sensor. This paper describes a star sensor anterior image acquisition and pre-processing hardware system based on CMOS image-sensor and FPGA technology. Practically, star images are produced by a simple simulator on PC, acquired by CMOS image sensor, pre-processed by FPGA, saved in SRAM, read out by EPP protocol and validated by an image process software on PC. The hardware part of system acquires images thought CMOS image-sensor controlled by FPGA, then processes image data by a circuit module of FPGA, and save images to SRAM for test. Basic image data for star recognition and attitude determination of spacecrafts are provided by it. As an important reference for developing star sensor prototype, the system validates the performance advantages of new generation star sensor.

  10. CMS Preprocessing Subsystem user`s guide. Software version 1.2

    SciTech Connect

    Didier, B.T.; Gash, J.D.; Greitzer, F.L.; Havre, S.L.; Ramsdell, J.V.; Turney, C.R.

    1993-10-01

    The Common Mapping Standard (CMS) Data Production System (CDPS) produces and distributes CMS data in compliance with the Common Mapping Standard Interface Control Document, Revision 2.2. Historically, tactical mission planning systems have been the primary clients of CMS data. CDPS is composed of two subsystems, the CMS Preprocessing Subsystem (CPS) and the CMS Distribution Subsystem (CDS). This guide describes the operation of CPS, which is responsible for the management of source data and the production of CMS data from source data. The CPS system was developed for use on a workstation running Ultrix 4.2, and X Window System Version X11R4, and Motif Version 1.1. This subsystem is organized into four major functional groups: CPS Executive; Manage Source Data; Manage CMS Data Preprocessing; and CPS System Utilities. CPS supports the production of CMS data from the following source chart, image, and elevation data products: Global Navigation Chart; Jet Navigation Chart; Operational Navigation Chart; Tactical Pilotage Chart; Joint Operations Graphics-Air; Topographic Line Map; ARC Digital Raster Imagery; Digital Terrain Elevation Data (Level 1); and Low Flying Chart.

  11. A data preprocessing strategy for metabolomics to reduce the mask effect in data analysis.

    PubMed

    Yang, Jun; Zhao, Xinjie; Lu, Xin; Lin, Xiaohui; Xu, Guowang

    2015-01-01

    HighlightsDeveloped a data preprocessing strategy to cope with missing values and mask effects in data analysis from high variation of abundant metabolites.A new method- 'x-VAST' was developed to amend the measurement deviation enlargement.Applying the above strategy, several low abundant masked differential metabolites were rescued. Metabolomics is a booming research field. Its success highly relies on the discovery of differential metabolites by comparing different data sets (for example, patients vs. controls). One of the challenges is that differences of the low abundant metabolites between groups are often masked by the high variation of abundant metabolites. In order to solve this challenge, a novel data preprocessing strategy consisting of three steps was proposed in this study. In step 1, a 'modified 80%' rule was used to reduce effect of missing values; in step 2, unit-variance and Pareto scaling methods were used to reduce the mask effect from the abundant metabolites. In step 3, in order to fix the adverse effect of scaling, stability information of the variables deduced from intensity information and the class information, was used to assign suitable weights to the variables. When applying to an LC/MS based metabolomics dataset from chronic hepatitis B patients study and two simulated datasets, the mask effect was found to be partially eliminated and several new low abundant differential metabolites were rescued.

  12. Effective Preprocessing Procedures Virtually Eliminate Distance-Dependent Motion Artifacts in Resting State FMRI.

    PubMed

    Jo, Hang Joon; Gotts, Stephen J; Reynolds, Richard C; Bandettini, Peter A; Martin, Alex; Cox, Robert W; Saad, Ziad S

    2013-05-21

    Artifactual sources of resting-state (RS) FMRI can originate from head motion, physiology, and hardware. Of these sources, motion has received considerable attention and was found to induce corrupting effects by differentially biasing correlations between regions depending on their distance. Numerous corrective approaches have relied on the identification and censoring of high-motion time points and the use of the brain-wide average time series as a nuisance regressor to which the data are orthogonalized (Global Signal Regression, GSReg). We first replicate the previously reported head-motion bias on correlation coefficients using data generously contributed by Power et al. (2012). We then show that while motion can be the source of artifact in correlations, the distance-dependent bias-taken to be a manifestation of the motion effect on correlation-is exacerbated by the use of GSReg. Put differently, correlation estimates obtained after GSReg are more susceptible to the presence of motion and by extension to the levels of censoring. More generally, the effect of motion on correlation estimates depends on the preprocessing steps leading to the correlation estimate, with certain approaches performing markedly worse than others. For this purpose, we consider various models for RS FMRI preprocessing and show that WMeLOCAL, as subset of the ANATICOR discussed by Jo et al. (2010), denoising approach results in minimal sensitivity to motion and reduces by extension the dependence of correlation results on censoring.

  13. Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?

    PubMed

    Zhu, Yeyi; Hernandez, Ladia M; Mueller, Peter; Dong, Yongquan; Forman, Michele R

    2013-01-01

    The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.

  14. Fast generation of digitally reconstructed radiograph through an efficient preprocessing of ray attenuation values

    NASA Astrophysics Data System (ADS)

    Ghafurian, Soheil; Metaxas, Dimitris N.; Tan, Virak; Li, Kang

    2016-03-01

    Digitally reconstructed radiographs (DRR) are a simulation of radiographic images produced through a perspective projection of the three-dimensional (3D) image (volume) onto a two-dimensional (2D) image plane. The traditional method for the generation of DRRs, namely ray-casting, is a computationally intensive process and accounts for most of solution time in 3D/2D medical image registration frameworks, where a large number of DRRs is required. A few alternate methods for a faster DRR generation have been proposed, the most successful of which are based on the idea of pre-calculating the attenuation value of possible rays. Despite achieving good quality, these methods support a limited range of motion for the volume and entail long pre-calculation time. In this paper, we propose a new preprocessing procedure and data structure for the calculation of the ray attenuation values. This method supports all possible volume positions with practically small memory requirements in addition to reducing the complexity of the problem from O(n3) to O(n2). In our experiments, we generated DRRs of high quality in 63 milliseconds with a preprocessing time of 99.48 seconds and a memory size of 7.45 megabytes.

  15. Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?

    PubMed Central

    Zhu, Yeyi; Hernandez, Ladia M.; Mueller, Peter; Dong, Yongquan; Forman, Michele R.

    2013-01-01

    The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study. PMID:24511148

  16. Study of characteristic point identification and preprocessing method for pulse wave signals.

    PubMed

    Sun, Wei; Tang, Ning; Jiang, Guiping

    2015-02-01

    Characteristics in pulse wave signals (PWSs) include the information of physiology and pathology of human cardiovascular system. Therefore, identification of characteristic points in PWSs plays a significant role in analyzing human cardiovascular system. Particularly, the characteristic points show personal dependent features and are easy to be affected. Acquiring a signal with high signal-to-noise ratio (SNR) and integrity is fundamentally important to precisely identify the characteristic points. Based on the mathematical morphology theory, we design a combined filter, which can effectively suppress the baseline drift and remove the high-frequency noise simultaneously, to preprocess the PWSs. Furthermore, the characteristic points of the preprocessed signal are extracted according to its position relations with the zero-crossing points of wavelet coefficients of the signal. In addition, the differential method is adopted to calibrate the position offset of characteristic points caused by the wavelet transform. We investigated four typical PWSs reconstructed by three Gaussian functions with tunable parameters. The numerical results suggested that the proposed method could identify the characteristic points of PWSs accurately. PMID:25997292

  17. Operational pre-processing of MERIS, (A)ATSR and VEGETATION data for the ESA-CCI project Fire-Disturbance

    NASA Astrophysics Data System (ADS)

    Guenther, K. P.; Krauss, T.; Richter, R.; Mueller, R.; Fichtelmann, B.; Borg, E.; Bachmann, M.; Wurm, M.; Gsteiger, V.; Mueller, A.

    2012-04-01

    In 2010 ESA announced the Earthwatch Programme Element, Global Monitoring of Essential Climate Variables, (known as 'ESA Climate Change Initiative'), to support climate modellers with highly stable, long-term satellite-based products, called Essential Climate Variables (ECV). The primary ECV of the "Fire-Disturbance" project is the Burnt Area (BA). In order to derive the BA with an accuracy fulfilling the GCOS requirements, improvements in data pre-processing are required for the generation of consistent time series. That is, consistency in the time series of a single sensor as well as between different sensors shall be achieved, and also including an assessment of the related error budgets. For our improved pre-processing chain we developed generic algorithms for image matching resulting in precise geolocation using the global Landsat Mosaic GLS2000 as accurate reference. Additionally a global DEM is also used (W42 database including SRTM and other sources). Land-water masking is performed using a learning algorithm. On one side external static reference data as e.g. the water body mask from SRTM radar data and the GSHHS, on the other side two different pre-classification algorithms are included. Regions with consistence in these three different water masks are assumed as water with high probability and therefore used as training data. On basis of this result the not included remaining water pixels of static mask are checked. At least the not included rest of pre-classifications will be tested with a strong classification algorithm. Cloud and snow/ice detection is performed developing generic parameter as e.g. brightness or flatness together with the Normalized Difference Snow Index (NDSI). When thermal bands are available as e.g. for (A)ATSR temperature information is used to discriminate clouds and snow/ice. Furthermore confidence levels for all masks are generated on a per pixel level for every scene. Finally atmospheric correction is performed using the newly

  18. The efficient algorithms for achieving Euclidean distance transformation.

    PubMed

    Shih, Frank Y; Wu, Yi-Ta

    2004-08-01

    Euclidean distance transformation (EDT) is used to convert a digital binary image consisting of object (foreground) and nonobject (background) pixels into another image where each pixel has a value of the minimum Euclidean distance from nonobject pixels. In this paper, the improved iterative erosion algorithm is proposed to avoid the redundant calculations in the iterative erosion algorithm. Furthermore, to avoid the iterative operations, the two-scan-based algorithm by a deriving approach is developed for achieving EDT correctly and efficiently in a constant time. Besides, we discover when obstacles appear in the image, many algorithms cannot achieve the correct EDT except our two-scan-based algorithm. Moreover, the two-scan-based algorithm does not require the additional cost of preprocessing or relative-coordinates recording.

  19. Evaluating the reliability of different preprocessing steps to estimate graph theoretical measures in resting state fMRI data.

    PubMed

    Aurich, Nathassia K; Alves Filho, José O; Marques da Silva, Ana M; Franco, Alexandre R

    2015-01-01

    With resting-state functional MRI (rs-fMRI) there are a variety of post-processing methods that can be used to quantify the human brain connectome. However, there is also a choice of which preprocessing steps will be used prior to calculating the functional connectivity of the brain. In this manuscript, we have tested seven different preprocessing schemes and assessed the reliability between and reproducibility within the various strategies by means of graph theoretical measures. Different preprocessing schemes were tested on a publicly available dataset, which includes rs-fMRI data of healthy controls. The brain was parcellated into 190 nodes and four graph theoretical (GT) measures were calculated; global efficiency (GEFF), characteristic path length (CPL), average clustering coefficient (ACC), and average local efficiency (ALE). Our findings indicate that results can significantly differ based on which preprocessing steps are selected. We also found dependence between motion and GT measurements in most preprocessing strategies. We conclude that by using censoring based on outliers within the functional time-series as a processing, results indicate an increase in reliability of GT measurements with a reduction of the dependency of head motion.

  20. Increasing conclusiveness of metabonomic studies by chem-informatic preprocessing of capillary electrophoretic data on urinary nucleoside profiles.

    PubMed

    Szymańska, E; Markuszewski, M J; Capron, X; van Nederkassel, A-M; Heyden, Y Vander; Markuszewski, M; Krajka, K; Kaliszan, R

    2007-01-17

    Nowadays, bioinformatics offers advanced tools and procedures of data mining aimed at finding consistent patterns or systematic relationships between variables. Numerous metabolites concentrations can readily be determined in a given biological system by high-throughput analytical methods. However, such row analytical data comprise noninformative components due to many disturbances normally occurring in analysis of biological samples. To eliminate those unwanted original analytical data components advanced chemometric data preprocessing methods might be of help. Here, such methods are applied to electrophoretic nucleoside profiles in urine samples of cancer patients and healthy volunteers. The electrophoretic nucleoside profiles were obtained under following conditions: 100 mM borate, 72.5 mM phosphate, 160 mM SDS, pH 6.7; 25 kV voltage, 30 degrees C temperature; untreated fused silica capillary 70 cm effective length, 50 microm I.D. Different most advanced preprocessing tools were applied for baseline correction, denoising and alignment of electrophoretic data. That approach was compared to standard procedure of electrophoretic peak integration. The best results of preprocessing were obtained after application of the so-called correlation optimized warping (COW) to align the data. The principal component analysis (PCA) of preprocessed data provides a clearly better consistency of the nucleoside electrophoretic profiles with health status of subjects than PCA of peak areas of original data (without preprocessing).

  1. Outcome prediction based on microarray analysis: a critical perspective on methods

    PubMed Central

    Zervakis, Michalis; Blazadonakis, Michalis E; Tsiliki, Georgia; Danilatou, Vasiliki; Tsiknakis, Manolis; Kafetzopoulos, Dimitris

    2009-01-01

    Background Information extraction from microarrays has not yet been widely used in diagnostic or prognostic decision-support systems, due to the diversity of results produced by the available techniques, their instability on different data sets and the inability to relate statistical significance with biological relevance. Thus, there is an urgent need to address the statistical framework of microarray analysis and identify its drawbacks and limitations, which will enable us to thoroughly compare methodologies under the same experimental set-up and associate results with confidence intervals meaningful to clinicians. In this study we consider gene-selection algorithms with the aim to reveal inefficiencies in performance evaluation and address aspects that can reduce uncertainty in algorithmic validation. Results A computational study is performed related to the performance of several gene selection methodologies on publicly available microarray data. Three basic types of experimental scenarios are evaluated, i.e. the independent test-set and the 10-fold cross-validation (CV) using maximum and average performance measures. Feature selection methods behave differently under different validation strategies. The performance results from CV do not mach well those from the independent test-set, except for the support vector machines (SVM) and the least squares SVM methods. However, these wrapper methods achieve variable (often low) performance, whereas the hybrid methods attain consistently higher accuracies. The use of an independent test-set within CV is important for the evaluation of the predictive power of algorithms. The optimal size of the selected gene-set also appears to be dependent on the evaluation scheme. The consistency of selected genes over variation of the training-set is another aspect important in reducing uncertainty in the evaluation of the derived gene signature. In all cases the presence of outlier samples can seriously affect algorithmic

  2. Experimental Approaches to Microarray Analysis of Tumor Samples

    ERIC Educational Resources Information Center

    Furge, Laura Lowe; Winter, Michael B.; Meyers, Jacob I.; Furge, Kyle A.

    2008-01-01

    Comprehensive measurement of gene expression using high-density nucleic acid arrays (i.e. microarrays) has become an important tool for investigating the molecular differences in clinical and research samples. Consequently, inclusion of discussion in biochemistry, molecular biology, or other appropriate courses of microarray technologies has…

  3. Demonstrating a Multi-drug Resistant Mycobacterium tuberculosis Amplification Microarray

    PubMed Central

    Linger, Yvonne; Kukhtin, Alexander; Golova, Julia; Perov, Alexander; Qu, Peter; Knickerbocker, Christopher; Cooney, Christopher G.; Chandler, Darrell P.

    2014-01-01

    Simplifying microarray workflow is a necessary first step for creating MDR-TB microarray-based diagnostics that can be routinely used in lower-resource environments. An amplification microarray combines asymmetric PCR amplification, target size selection, target labeling, and microarray hybridization within a single solution and into a single microfluidic chamber. A batch processing method is demonstrated with a 9-plex asymmetric master mix and low-density gel element microarray for genotyping multi-drug resistant Mycobacterium tuberculosis (MDR-TB). The protocol described here can be completed in 6 hr and provide correct genotyping with at least 1,000 cell equivalents of genomic DNA. Incorporating on-chip wash steps is feasible, which will result in an entirely closed amplicon method and system. The extent of multiplexing with an amplification microarray is ultimately constrained by the number of primer pairs that can be combined into a single master mix and still achieve desired sensitivity and specificity performance metrics, rather than the number of probes that are immobilized on the array. Likewise, the total analysis time can be shortened or lengthened depending on the specific intended use, research question, and desired limits of detection. Nevertheless, the general approach significantly streamlines microarray workflow for the end user by reducing the number of manually intensive and time-consuming processing steps, and provides a simplified biochemical and microfluidic path for translating microarray-based diagnostics into routine clinical practice. PMID:24796567

  4. The Importance of Normalization on Large and Heterogeneous Microarray Datasets

    EPA Science Inventory

    DNA microarray technology is a powerful functional genomics tool increasingly used for investigating global gene expression in environmental studies. Microarrays can also be used in identifying biological networks, as they give insight on the complex gene-to-gene interactions, ne...

  5. Demonstrating a multi-drug resistant Mycobacterium tuberculosis amplification microarray.

    PubMed

    Linger, Yvonne; Kukhtin, Alexander; Golova, Julia; Perov, Alexander; Qu, Peter; Knickerbocker, Christopher; Cooney, Christopher G; Chandler, Darrell P

    2014-04-25

    Simplifying microarray workflow is a necessary first step for creating MDR-TB microarray-based diagnostics that can be routinely used in lower-resource environments. An amplification microarray combines asymmetric PCR amplification, target size selection, target labeling, and microarray hybridization within a single solution and into a single microfluidic chamber. A batch processing method is demonstrated with a 9-plex asymmetric master mix and low-density gel element microarray for genotyping multi-drug resistant Mycobacterium tuberculosis (MDR-TB). The protocol described here can be completed in 6 hr and provide correct genotyping with at least 1,000 cell equivalents of genomic DNA. Incorporating on-chip wash steps is feasible, which will result in an entirely closed amplicon method and system. The extent of multiplexing with an amplification microarray is ultimately constrained by the number of primer pairs that can be combined into a single master mix and still achieve desired sensitivity and specificity performance metrics, rather than the number of probes that are immobilized on the array. Likewise, the total analysis time can be shortened or lengthened depending on the specific intended use, research question, and desired limits of detection. Nevertheless, the general approach significantly streamlines microarray workflow for the end user by reducing the number of manually intensive and time-consuming processing steps, and provides a simplified biochemical and microfluidic path for translating microarray-based diagnostics into routine clinical practice.

  6. Genomic-Wide Analysis with Microarrays in Human Oncology

    PubMed Central

    Inaoka, Kenichi; Inokawa, Yoshikuni; Nomoto, Shuji

    2015-01-01

    DNA microarray technologies have advanced rapidly and had a profound impact on examining gene expression on a genomic scale in research. This review discusses the history and development of microarray and DNA chip devices, and specific microarrays are described along with their methods and applications. In particular, microarrays have detected many novel cancer-related genes by comparing cancer tissues and non-cancerous tissues in oncological research. Recently, new methods have been in development, such as the double-combination array and triple-combination array, which allow more effective analysis of gene expression and epigenetic changes. Analysis of gene expression alterations in precancerous regions compared with normal regions and array analysis in drug-resistance cancer tissues are also successfully performed. Compared with next-generation sequencing, a similar method of genome analysis, several important differences distinguish these techniques and their applications. Development of novel microarray technologies is expected to contribute to further cancer research.

  7. An ultralow background substrate for protein microarray technology.

    PubMed

    Feng, Hui; Zhang, Qingyang; Ma, Hongwei; Zheng, Bo

    2015-08-21

    We herein report an ultralow background substrate for protein microarrays. Conventional protein microarray substrates often suffer from non-specific protein adsorption and inhomogeneous spot morphology. Consequently, surface treatment and a suitable printing solution are required to improve the microarray performance. In the current work, we improved the situation by developing a new microarray substrate based on a fluorinated ethylene propylene (FEP) membrane. A polydopamine microspot array was fabricated on the FEP membrane, with proteins conjugated to the FEP surface through polydopamine. Uniform microspots were obtained on FEP without the application of a special printing solution. The modified FEP membrane demonstrated ultralow background signal and was applied in protein and peptide microarray analysis. PMID:26134063

  8. cDNA microarray screening in food safety.

    PubMed

    Roy, Sashwati; Sen, Chandan K

    2006-04-01

    The cDNA microarray technology and related bioinformatics tools presents a wide range of novel application opportunities. The technology may be productively applied to address food safety. In this mini-review article, we present an update highlighting the late breaking discoveries that demonstrate the vitality of cDNA microarray technology as a tool to analyze food safety with reference to microbial pathogens and genetically modified foods. In order to bring the microarray technology to mainstream food safety, it is important to develop robust user-friendly tools that may be applied in a field setting. In addition, there needs to be a standardized process for regulatory agencies to interpret and act upon microarray-based data. The cDNA microarray approach is an emergent technology in diagnostics. Its values lie in being able to provide complimentary molecular insight when employed in addition to traditional tests for food safety, as part of a more comprehensive battery of tests.

  9. Finding dominant sets in microarray data.

    PubMed

    Fu, Xuping; Teng, Li; Li, Yao; Chen, Wenbin; Mao, Yumin; Shen, I-Fan; Xie, Yi

    2005-01-01

    Clustering allows us to extract groups of genes that are tightly coexpressed from Microarray data. In this paper, a new method DSF_Clust is developed to find dominant sets (clusters). We have preformed DSF_Clust on several gene expression datasets and given the evaluation with some criteria. The results showed that this approach could cluster dominant sets of good quality compared to kmeans method. DSF_Clust deals with three issues that have bedeviled clustering, some dominant sets being statistically determined in a significance level, predefining cluster structure being not required, and the quality of a dominant set being ensured. We have also applied this approach to analyze published data of yeast cell cycle gene expression and found some biologically meaningful gene groups to be dug out. Furthermore, DSF_Clust is a potentially good tool to search for putative regulatory signals.

  10. Digital microarray analysis for digital artifact genomics

    NASA Astrophysics Data System (ADS)

    Jaenisch, Holger; Handley, James; Williams, Deborah

    2013-06-01

    We implement a Spatial Voting (SV) based analogy of microarray analysis for digital gene marker identification in malware code sections. We examine a famous set of malware formally analyzed by Mandiant and code named Advanced Persistent Threat (APT1). APT1 is a Chinese organization formed with specific intent to infiltrate and exploit US resources. Manidant provided a detailed behavior and sting analysis report for the 288 malware samples available. We performed an independent analysis using a new alternative to the traditional dynamic analysis and static analysis we call Spatial Analysis (SA). We perform unsupervised SA on the APT1 originating malware code sections and report our findings. We also show the results of SA performed on some members of the families associated by Manidant. We conclude that SV based SA is a practical fast alternative to dynamics analysis and static analysis.

  11. SAMMD: Staphylococcus aureus Microarray Meta-Database

    PubMed Central

    Nagarajan, Vijayaraj; Elasri, Mohamed O

    2007-01-01

    Background Staphylococcus aureus is an important human pathogen, causing a wide variety of diseases ranging from superficial skin infections to severe life threatening infections. S. aureus is one of the leading causes of nosocomial infections. Its ability to resist multiple antibiotics poses a growing public health problem. In order to understand the mechanism of pathogenesis of S. aureus, several global expression profiles have been developed. These transcriptional profiles included regulatory mutants of S. aureus and growth of wild type under different growth conditions. The abundance of these profiles has generated a large amount of data without a uniform annotation system to comprehensively examine them. We report the development of the Staphylococcus aureus Microarray meta-database (SAMMD) which includes data from all the published transcriptional profiles. SAMMD is a web-accessible database that helps users to perform a variety of analysis against and within the existing transcriptional profiles. Description SAMMD is a relational database that uses MySQL as the back end and PHP/JavaScript/DHTML as the front end. The database is normalized and consists of five tables, which holds information about gene annotations, regulated gene lists, experimental details, references, and other details. SAMMD data is collected from the peer-reviewed published articles. Data extraction and conversion was done using perl scripts while data entry was done through phpMyAdmin tool. The database is accessible via a web interface that contains several features such as a simple search by ORF ID, gene name, gene product name, advanced search using gene lists, comparing among datasets, browsing, downloading, statistics, and help. The database is licensed under General Public License (GPL). Conclusion SAMMD is hosted and available at . Currently there are over 9500 entries for regulated genes, from 67 microarray experiments. SAMMD will help staphylococcal scientists to analyze their

  12. Analysis of environmental transcriptomes by DNA microarrays.

    PubMed

    Parro, Víctor; Moreno-Paz, Mercedes; González-Toril, Elena

    2007-02-01

    In this work we investigated the correlations between global gene expression patterns and environmental parameters in natural ecosystems. We studied the preferential gene expression of the iron oxidizer bacterium Leptospirillum ferrooxidans to adapt its physiology to changes in the physicochemical parameters in its natural medium. Transcriptome analysis by DNA microarrays can proportionate an instant picture about the preferential gene expression between two different environmental samples. However, this type of analysis is very difficult and complex in natural ecosystems, mainly because of the broad biodiversity and multiple environmental parameters that may affect gene expression. The necessity of high-quality RNA preparations as well as complicated data analysis are also technological limitations. The low prokaryotic diversity of the extremely acidic and iron-rich waters of the Tinto River (Spain) ecosystem, where L. ferrooxidans is abundant, allows the opportunity to achieve global gene expression studies and to associate gene function with environmental parameters. We applied a total RNA amplification protocol validated previously for the amplification of the environmental transcriptome (meta-transcriptome). The meta-transcriptome of two sites from the Tinto River mainly differing in the salt and oxygen contents were amplified and analysed by a L. ferrooxidans DNA microarray. The results showed a clear preferential induction of genes involved in certain physicochemical parameters like: high salinity (ectAB, otsAB), low oxygen concentration (cydAB), iron uptake (fecA-exbBD-tonB), oxidative stress (carotenoid synthesis, oxyR, recG), potassium (kdpBAC) or phosphate concentrations (pstSCAB), etc. We conclude that specific gene expression patterns can be useful indicators for the physiological conditions in a defined ecosystem. Also, the upregulation of certain genes and operons reveals information about the environmental conditions (nutrient limitations, stresses

  13. Lipid Microarray Biosensor for Biotoxin Detection.

    SciTech Connect

    Singh, Anup K.; Throckmorton, Daniel J.; Moran-Mirabal, Jose C.; Edel, Joshua B.; Meyer, Grant D.; Craighead, Harold G.

    2006-05-01

    We present the use of micron-sized lipid domains, patterned onto planar substrates and within microfluidic channels, to assay the binding of bacterial toxins via total internal reflection fluorescence microscopy (TIRFM). The lipid domains were patterned using a polymer lift-off technique and consisted of ganglioside-populated DSPC:cholesterol supported lipid bilayers (SLBs). Lipid patterns were formed on the substrates by vesicle fusion followed by polymer lift-off, which revealed micron-sized SLBs containing either ganglioside GT1b or GM1. The ganglioside-populated SLB arrays were then exposed to either Cholera toxin subunit B (CTB) or Tetanus toxin fragment C (TTC). Binding was assayed on planar substrates by TIRFM down to 1 nM concentration for CTB and 100 nM for TTC. Apparent binding constants extracted from three different models applied to the binding curves suggest that binding of a protein to a lipid-based receptor is strongly affected by the lipid composition of the SLB and by the substrate on which the bilayer is formed. Patterning of SLBs inside microfluidic channels also allowed the preparation of lipid domains with different compositions on a single device. Arrays within microfluidic channels were used to achieve segregation and selective binding from a binary mixture of the toxin fragments in one device. The binding and segregation within the microfluidic channels was assayed with epifluorescence as proof of concept. We propose that the method used for patterning the lipid microarrays on planar substrates and within microfluidic channels can be easily adapted to proteins or nucleic acids and can be used for biosensor applications and cell stimulation assays under different flow conditions. KEYWORDS. Microarray, ganglioside, polymer lift-off, cholera toxin, tetanus toxin, TIRFM, binding constant.4

  14. Evaluation of pre-processing, thresholding and post-processing steps for very small target detection in infrared images

    NASA Astrophysics Data System (ADS)

    Yardımcı, Ozan; Ulusoy, Ä.°lkay

    2016-05-01

    Pre-processing, thresholding and post-processing stages are very important especially for very small target detection from infrared images. The effects of these stages to the final detection performance are measured in this study. Various methods for each stage are compared based on the final detection performance, which is defined by precision and recall values. Among various methods, the best method for each stage is selected and proved. For the pre-processing stage, local block based methods perform the best, nearly for all thresholding methods. The best thresholding method is chosen as the one, which does not need any user defined parameter. Finally, the post processing method, which is suitable for the best performing pre-processing and thesholding methods is selected.

  15. An automatic method for producing robust regression models from hyperspectral data using multiple simple genetic algorithms

    NASA Astrophysics Data System (ADS)

    Sykas, Dimitris; Karathanassi, Vassilia

    2015-06-01

    This paper presents a new method for automatically determining the optimum regression model, which enable the estimation of a parameter. The concept lies on the combination of k spectral pre-processing algorithms (SPPAs) that enhance spectral features correlated to the desired parameter. Initially a pre-processing algorithm uses as input a single spectral signature and transforms it according to the SPPA function. A k-step combination of SPPAs uses k preprocessing algorithms serially. The result of each SPPA is used as input to the next SPPA, and so on until the k desired pre-processed signatures are reached. These signatures are then used as input to three different regression methods: the Normalized band Difference Regression (NDR), the Multiple Linear Regression (MLR) and the Partial Least Squares Regression (PLSR). Three Simple Genetic Algorithms (SGAs) are used, one for each regression method, for the selection of the optimum combination of k SPPAs. The performance of the SGAs is evaluated based on the RMS error of the regression models. The evaluation not only indicates the selection of the optimum SPPA combination but also the regression method that produces the optimum prediction model. The proposed method was applied on soil spectral measurements in order to predict Soil Organic Matter (SOM). In this study, the maximum value assigned to k was 3. PLSR yielded the highest accuracy while NDR's accuracy was satisfactory compared to its complexity. MLR method showed severe drawbacks due to the presence of noise in terms of collinearity at the spectral bands. Most of the regression methods required a 3-step combination of SPPAs for achieving the highest performance. The selected preprocessing algorithms were different for each regression method since each regression method handles with a different way the explanatory variables.

  16. Evaluation of the robustness of the preprocessing technique improving reversible compressibility of CT images: Tested on various CT examinations

    SciTech Connect

    Jeon, Chang Ho; Kim, Bohyoung; Gu, Bon Seung; Lee, Jong Min; Kim, Kil Joong; Lee, Kyoung Ho; Kim, Tae Ki

    2013-10-15

    Purpose: To modify the preprocessing technique, which was previously proposed, improving compressibility of computed tomography (CT) images to cover the diversity of three dimensional configurations of different body parts and to evaluate the robustness of the technique in terms of segmentation correctness and increase in reversible compression ratio (CR) for various CT examinations.Methods: This study had institutional review board approval with waiver of informed patient consent. A preprocessing technique was previously proposed to improve the compressibility of CT images by replacing pixel values outside the body region with a constant value resulting in maximizing data redundancy. Since the technique was developed aiming at only chest CT images, the authors modified the segmentation method to cover the diversity of three dimensional configurations of different body parts. The modified version was evaluated as follows. In randomly selected 368 CT examinations (352 787 images), each image was preprocessed by using the modified preprocessing technique. Radiologists visually confirmed whether the segmented region covers the body region or not. The images with and without the preprocessing were reversibly compressed using Joint Photographic Experts Group (JPEG), JPEG2000 two-dimensional (2D), and JPEG2000 three-dimensional (3D) compressions. The percentage increase in CR per examination (CR{sub I}) was measured.Results: The rate of correct segmentation was 100.0% (95% CI: 99.9%, 100.0%) for all the examinations. The median of CR{sub I} were 26.1% (95% CI: 24.9%, 27.1%), 40.2% (38.5%, 41.1%), and 34.5% (32.7%, 36.2%) in JPEG, JPEG2000 2D, and JPEG2000 3D, respectively.Conclusions: In various CT examinations, the modified preprocessing technique can increase in the CR by 25% or more without concerning about degradation of diagnostic information.

  17. Facilitating access to pre-processed research evidence in public health

    PubMed Central

    2010-01-01

    Background Evidence-informed decision making is accepted in Canada and worldwide as necessary for the provision of effective health services. This process involves: 1) clearly articulating a practice-based issue; 2) searching for and accessing relevant evidence; 3) appraising methodological rigor and choosing the most synthesized evidence of the highest quality and relevance to the practice issue and setting that is available; and 4) extracting, interpreting, and translating knowledge, in light of the local context and resources, into practice, program and policy decisions. While the public health sector in Canada is working toward evidence-informed decision making, considerable barriers, including efficient access to synthesized resources, exist. Methods In this paper we map to a previously developed 6 level pyramid of pre-processed research evidence, relevant resources that include public health-related effectiveness evidence. The resources were identified through extensive searches of both the published and unpublished domains. Results Many resources with public health-related evidence were identified. While there were very few resources dedicated solely to public health evidence, many clinically focused resources include public health-related evidence, making tools such as the pyramid, that identify these resources, particularly helpful for public health decisions makers. A practical example illustrates the application of this model and highlights its potential to reduce the time and effort that would be required by public health decision makers to address their practice-based issues. Conclusions This paper describes an existing hierarchy of pre-processed evidence and its adaptation to the public health setting. A number of resources with public health-relevant content that are either freely accessible or requiring a subscription are identified. This will facilitate easier and faster access to pre-processed, public health-relevant evidence, with the intent of

  18. CEBS—Chemical Effects in Biological Systems: a public data repository integrating study design and toxicity data with microarray and proteomics data

    PubMed Central

    Waters, Michael; Stasiewicz, Stanley; Alex Merrick, B.; Tomer, Kenneth; Bushel, Pierre; Paules, Richard; Stegman, Nancy; Nehls, Gerald; Yost, Kenneth J.; Johnson, C. Harris; Gustafson, Scott F.; Xirasagar, Sandhya; Xiao, Nianqing; Huang, Cheng-Cheng; Boyer, Paul; Chan, Denny D.; Pan, Qinyan; Gong, Hui; Taylor, John; Choi, Danielle; Rashid, Asif; Ahmed, Ayazaddin; Howle, Reese; Selkirk, James; Tennant, Raymond; Fostel, Jennifer

    2008-01-01

    Abstract CEBS (Chemical Effects in Biological Systems) is an integrated public repository for toxicogenomics data, including the study design and timeline, clinical chemistry and histopathology findings and microarray and proteomics data. CEBS contains data derived from studies of chemicals and of genetic alterations, and is compatible with clinical and environmental studies. CEBS is designed to permit the user to query the data using the study conditions, the subject responses and then, having identified an appropriate set of subjects, to move to the microarray module of CEBS to carry out gene signature and pathway analysis. Scope of CEBS: CEBS currently holds 22 studies of rats, four studies of mice and one study of Caenorhabditis elegans. CEBS can also accommodate data from studies of human subjects. Toxicogenomics studies currently in CEBS comprise over 4000 microarray hybridizations, and 75 2D gel images annotated with protein identification performed by MALDI and MS/MS. CEBS contains raw microarray data collected in accordance with MIAME guidelines and provides tools for data selection, pre-processing and analysis resulting in annotated lists of genes of interest. Additionally, clinical chemistry and histopathology findings from over 1500 animals are included in CEBS. CEBS/BID: The BID (Biomedical Investigation Database) is another component of the CEBS system. BID is a relational database used to load and curate study data prior to export to CEBS, in addition to capturing and displaying novel data types such as PCR data, or additional fields of interest, including those defined by the HESI Toxicogenomics Committee (in preparation). BID has been shared with Health Canada and the US Environmental Protection Agency. CEBS is available at http://cebs.niehs.nih.gov. BID can be accessed via the user interface from https://dir-apps.niehs.nih.gov/arc/. Requests for a copy of BID and for depositing data into CEBS or BID are available at http

  19. CEBS--Chemical Effects in Biological Systems: a public data repository integrating study design and toxicity data with microarray and proteomics data.

    PubMed

    Waters, Michael; Stasiewicz, Stanley; Merrick, B Alex; Tomer, Kenneth; Bushel, Pierre; Paules, Richard; Stegman, Nancy; Nehls, Gerald; Yost, Kenneth J; Johnson, C Harris; Gustafson, Scott F; Xirasagar, Sandhya; Xiao, Nianqing; Huang, Cheng-Cheng; Boyer, Paul; Chan, Denny D; Pan, Qinyan; Gong, Hui; Taylor, John; Choi, Danielle; Rashid, Asif; Ahmed, Ayazaddin; Howle, Reese; Selkirk, James; Tennant, Raymond; Fostel, Jennifer

    2008-01-01

    CEBS (Chemical Effects in Biological Systems) is an integrated public repository for toxicogenomics data, including the study design and timeline, clinical chemistry and histopathology findings and microarray and proteomics data. CEBS contains data derived from studies of chemicals and of genetic alterations, and is compatible with clinical and environmental studies. CEBS is designed to permit the user to query the data using the study conditions, the subject responses and then, having identified an appropriate set of subjects, to move to the microarray module of CEBS to carry out gene signature and pathway analysis. Scope of CEBS: CEBS currently holds 22 studies of rats, four studies of mice and one study of Caenorhabditis elegans. CEBS can also accommodate data from studies of human subjects. Toxicogenomics studies currently in CEBS comprise over 4000 microarray hybridizations, and 75 2D gel images annotated with protein identification performed by MALDI and MS/MS. CEBS contains raw microarray data collected in accordance with MIAME guidelines and provides tools for data selection, pre-processing and analysis resulting in annotated lists of genes of interest. Additionally, clinical chemistry and histopathology findings from over 1500 animals are included in CEBS. CEBS/BID: The BID (Biomedical Investigation Database) is another component of the CEBS system. BID is a relational database used to load and curate study data prior to export to CEBS, in addition to capturing and displaying novel data types such as PCR data, or additional fields of interest, including those defined by the HESI Toxicogenomics Committee (in preparation). BID has been shared with Health Canada and the US Environmental Protection Agency. CEBS is available at http://cebs.niehs.nih.gov. BID can be accessed via the user interface from https://dir-apps.niehs.nih.gov/arc/. Requests for a copy of BID and for depositing data into CEBS or BID are available at http://www.niehs.nih.gov/cebs-df/.

  20. Time-Frequency Analysis of Peptide Microarray Data: Application to Brain Cancer Immunosignatures

    PubMed Central

    O’Donnell, Brian; Maurer, Alexander; Papandreou-Suppappola, Antonia; Stafford, Phillip

    2015-01-01

    One of the gravest dangers facing cancer patients is an extended symptom-free lull between tumor initiation and the first diagnosis. Detection of tumors is critical for effective intervention. Using the body’s immune system to detect and amplify tumor-specific signals may enable detection of cancer using an inexpensive immunoassay. Immunosignatures are one such assay: they provide a map of antibody interactions with random-sequence peptides. They enable detection of disease-specific patterns using classic train/test methods. However, to date, very little effort has gone into extracting information from the sequence of peptides that interact with disease-specific antibodies. Because it is difficult to represent all possible antigen peptides in a microarray format, we chose to synthesize only 330,000 peptides on a single immunosignature microarray. The 330,000 random-sequence peptides on the microarray represent 83% of all tetramers and 27% of all pentamers, creating an unbiased but substantial gap in the coverage of total sequence space. We therefore chose to examine many relatively short motifs from these random-sequence peptides. Time-variant analysis of recurrent subsequences provided a means to dissect amino acid sequences from the peptides while simultaneously retaining the antibody–peptide binding intensities. We first used a simple experiment in which monoclonal antibodies with known linear epitopes were exposed to these random-sequence peptides, and their binding intensities were used to create our algorithm. We then demonstrated the performance of the proposed algorithm by examining immunosignatures from patients with Glioblastoma multiformae (GBM), an aggressive form of brain cancer. Eight different frameshift targets were identified from the random-sequence peptides using this technique. If immune-reactive antigens can be identified using a relatively simple immune assay, it might enable a diagnostic test with sufficient sensitivity to detect tumors