Science.gov

Sample records for microarray preprocessing algorithms

  1. Micro-Analyzer: automatic preprocessing of Affymetrix microarray data.

    PubMed

    Guzzi, Pietro Hiram; Cannataro, Mario

    2013-08-01

    A current trend in genomics is the investigation of the cell mechanism using different technologies, in order to explain the relationship among genes, molecular processes and diseases. For instance, the combined use of gene-expression arrays and genomic arrays has been demonstrated as an effective instrument in clinical practice. Consequently, in a single experiment different kind of microarrays may be used, resulting in the production of different types of binary data (images and textual raw data). The analysis of microarray data requires an initial preprocessing phase, that makes raw data suitable for use on existing analysis platforms, such as the TIGR M4 (TM4) Suite. An additional challenge to be faced by emerging data analysis platforms is the ability to treat in a combined way those different microarray formats coupled with clinical data. In fact, resulting integrated data may include both numerical and symbolic data (e.g. gene expression and SNPs regarding molecular data), as well as temporal data (e.g. the response to a drug, time to progression and survival rate), regarding clinical data. Raw data preprocessing is a crucial step in analysis but is often performed in a manual and error prone way using different software tools. Thus novel, platform independent, and possibly open source tools enabling the semi-automatic preprocessing and annotation of different microarray data are needed. The paper presents Micro-Analyzer (Microarray Analyzer), a cross-platform tool for the automatic normalization, summarization and annotation of Affymetrix gene expression and SNP binary data. It represents the evolution of the μ-CS tool, extending the preprocessing to SNP arrays that were not allowed in μ-CS. The Micro-Analyzer is provided as a Java standalone tool and enables users to read, preprocess and analyse binary microarray data (gene expression and SNPs) by invoking TM4 platform. It avoids: (i) the manual invocation of external tools (e.g. the Affymetrix Power

  2. Impact of Microarray Preprocessing Techniques in Unraveling Biological Pathways.

    PubMed

    Deandrés-Galiana, Enrique J; Fernández-Martínez, Juan Luis; Saligan, Leorey N; Sonis, Stephen T

    2016-12-01

    To better understand the impact of microarray preprocessing normalization techniques on the analysis of biological pathways in the prediction of chronic fatigue (CF) following radiation therapy, this study has compared the list of predictive genes found using the Robust Multiarray Averaging (RMA) and the Affymetrix MAS5 method, with the list that is obtained working with raw data (without any preprocessing). First, we modeled the spiked-in data set where differentially expressed genes were known and spiked-in at different known concentrations, showing that the precisions established by different gene ranking methods were higher than working with raw data. The results obtained from the spiked-in experiment were extrapolated to the CF data set to run learning and blind validation. RMA and MAS5 provided different sets of discriminatory genes that have a higher predictive accuracy in the learning phase, but lower predictive accuracy during the blind validation phase, suggesting that the genetic signatures generated using both preprocessing techniques cannot be generalizable. The pathways found using the raw data set better described what is a priori known for the CF disease. Besides, RMA produced more reliable pathways than MAS5. Understanding the strengths of these two preprocessing techniques in phenotype prediction is critical for precision medicine. Particularly, this article concludes that biological pathways might be better unraveled working with raw expression data. Moreover, the interpretation of the predictive gene profiles generated by RMA and MAS5 should be done with caution. This is an important conclusion with a high translational impact that should be confirmed in other disease data sets.

  3. User-friendly solutions for microarray quality control and pre-processing on ArrayAnalysis.org.

    PubMed

    Eijssen, Lars M T; Jaillard, Magali; Adriaens, Michiel E; Gaj, Stan; de Groot, Philip J; Müller, Michael; Evelo, Chris T

    2013-07-01

    Quality control (QC) is crucial for any scientific method producing data. Applying adequate QC introduces new challenges in the genomics field where large amounts of data are produced with complex technologies. For DNA microarrays, specific algorithms for QC and pre-processing including normalization have been developed by the scientific community, especially for expression chips of the Affymetrix platform. Many of these have been implemented in the statistical scripting language R and are available from the Bioconductor repository. However, application is hampered by lack of integrative tools that can be used by users of any experience level. To fill this gap, we developed a freely available tool for QC and pre-processing of Affymetrix gene expression results, extending, integrating and harmonizing functionality of Bioconductor packages. The tool can be easily accessed through a wizard-like web portal at http://www.arrayanalysis.org or downloaded for local use in R. The portal provides extensive documentation, including user guides, interpretation help with real output illustrations and detailed technical documentation. It assists newcomers to the field in performing state-of-the-art QC and pre-processing while offering data analysts an integral open-source package. Providing the scientific community with this easily accessible tool will allow improving data quality and reuse and adoption of standards.

  4. Parallel pipeline algorithm of real time star map preprocessing

    NASA Astrophysics Data System (ADS)

    Wang, Hai-yong; Qin, Tian-mu; Liu, Jia-qi; Li, Zhi-feng; Li, Jian-hua

    2016-03-01

    To improve the preprocessing speed of star map and reduce the resource consumption of embedded system of star tracker, a parallel pipeline real-time preprocessing algorithm is presented. The two characteristics, the mean and the noise standard deviation of the background gray of a star map, are firstly obtained dynamically by the means that the intervene of the star image itself to the background is removed in advance. The criterion on whether or not the following noise filtering is needed is established, then the extraction threshold value is assigned according to the level of background noise, so that the centroiding accuracy is guaranteed. In the processing algorithm, as low as two lines of pixel data are buffered, and only 100 shift registers are used to record the connected domain label, by which the problems of resources wasting and connected domain overflow are solved. The simulating results show that the necessary data of the selected bright stars could be immediately accessed in a delay time as short as 10us after the pipeline processing of a 496×496 star map in 50Mb/s is finished, and the needed memory and registers resource total less than 80kb. To verify the accuracy performance of the algorithm proposed, different levels of background noise are added to the processed ideal star map, and the statistic centroiding error is smaller than 1/23 pixel under the condition that the signal to noise ratio is greater than 1. The parallel pipeline algorithm of real time star map preprocessing helps to increase the data output speed and the anti-dynamic performance of star tracker.

  5. Fully Automated Complementary DNA Microarray Segmentation using a Novel Fuzzy-based Algorithm

    PubMed Central

    Saberkari, Hamidreza; Bahrami, Sheyda; Shamsi, Mousa; Amoshahy, Mohammad Javad; Ghavifekr, Habib Badri; Sedaaghi, Mohammad Hossein

    2015-01-01

    DNA microarray is a powerful approach to study simultaneously, the expression of 1000 of genes in a single experiment. The average value of the fluorescent intensity could be calculated in a microarray experiment. The calculated intensity values are very close in amount to the levels of expression of a particular gene. However, determining the appropriate position of every spot in microarray images is a main challenge, which leads to the accurate classification of normal and abnormal (cancer) cells. In this paper, first a preprocessing approach is performed to eliminate the noise and artifacts available in microarray cells using the nonlinear anisotropic diffusion filtering method. Then, the coordinate center of each spot is positioned utilizing the mathematical morphology operations. Finally, the position of each spot is exactly determined through applying a novel hybrid model based on the principle component analysis and the spatial fuzzy c-means clustering (SFCM) algorithm. Using a Gaussian kernel in SFCM algorithm will lead to improving the quality in complementary DNA microarray segmentation. The performance of the proposed algorithm has been evaluated on the real microarray images, which is available in Stanford Microarray Databases. Results illustrate that the accuracy of microarray cells segmentation in the proposed algorithm reaches to 100% and 98% for noiseless/noisy cells, respectively. PMID:26284175

  6. Fully Automated Complementary DNA Microarray Segmentation using a Novel Fuzzy-based Algorithm.

    PubMed

    Saberkari, Hamidreza; Bahrami, Sheyda; Shamsi, Mousa; Amoshahy, Mohammad Javad; Ghavifekr, Habib Badri; Sedaaghi, Mohammad Hossein

    2015-01-01

    DNA microarray is a powerful approach to study simultaneously, the expression of 1000 of genes in a single experiment. The average value of the fluorescent intensity could be calculated in a microarray experiment. The calculated intensity values are very close in amount to the levels of expression of a particular gene. However, determining the appropriate position of every spot in microarray images is a main challenge, which leads to the accurate classification of normal and abnormal (cancer) cells. In this paper, first a preprocessing approach is performed to eliminate the noise and artifacts available in microarray cells using the nonlinear anisotropic diffusion filtering method. Then, the coordinate center of each spot is positioned utilizing the mathematical morphology operations. Finally, the position of each spot is exactly determined through applying a novel hybrid model based on the principle component analysis and the spatial fuzzy c-means clustering (SFCM) algorithm. Using a Gaussian kernel in SFCM algorithm will lead to improving the quality in complementary DNA microarray segmentation. The performance of the proposed algorithm has been evaluated on the real microarray images, which is available in Stanford Microarray Databases. Results illustrate that the accuracy of microarray cells segmentation in the proposed algorithm reaches to 100% and 98% for noiseless/noisy cells, respectively.

  7. Effects of different preprocessing algorithms on the prognostic value of breast tumour microscopic images.

    PubMed

    Kolarević, D; Vujasinović, T; Kanjer, K; Milovanović, J; Todorović-Raković, N; Nikolić-Vukosavljević, D; Radulovic, M

    2017-09-21

    The purpose of this study was to improve the prognostic value of tumour histopathology image analysis methodology by image preprocessing. Key image qualities were modified including contrast, sharpness and brightness. The texture information was subsequently extracted from images of haematoxylin/eosin-stained tumour tissue sections by GLCM, monofractal and multifractal algorithms without any analytical limitation to predefined structures. Images were derived from patient groups with invasive breast carcinoma (BC, 93 patients) and inflammatory breast carcinoma (IBC, 51 patients). The prognostic performance was indeed significantly enhanced by preprocessing with the average AUCs of individual texture features improving from 0.68 ± 0.05 for original to 0.78 ± 0.01 for preprocessed images in the BC group and 0.75 ± 0.01 to 0.80 ± 0.02 in the IBC group. Image preprocessing also improved the prognostic independence of texture features as indicated by multivariate analysis. Surprisingly, the tonal histogram compression by the nonnormalisation preprocessing has prognostically outperformed the tested contrast normalisation algorithms. Generally, features without prognostic value showed higher susceptibility to prognostic enhancement by preprocessing whereas IDM texture feature was exceptionally susceptible. The obtained results are suggestive of the existence of distinct texture prognostic clues in the two examined types of breast cancer. The obtained enhancement of prognostic performance is essential for the anticipated clinical use of this method as a simple and cost-effective prognosticator of cancer outcome. © 2017 The Authors Journal of Microscopy © 2017 Royal Microscopical Society.

  8. Super-resolution algorithm based on sparse representation and wavelet preprocessing for remote sensing imagery

    NASA Astrophysics Data System (ADS)

    Ren, Ruizhi; Gu, Lingjia; Fu, Haoyang; Sun, Chenglin

    2017-04-01

    An effective super-resolution (SR) algorithm is proposed for actual spectral remote sensing images based on sparse representation and wavelet preprocessing. The proposed SR algorithm mainly consists of dictionary training and image reconstruction. Wavelet preprocessing is used to establish four subbands, i.e., low frequency, horizontal, vertical, and diagonal high frequency, for an input image. As compared to the traditional approaches involving the direct training of image patches, the proposed approach focuses on the training of features derived from these four subbands. The proposed algorithm is verified using different spectral remote sensing images, e.g., moderate-resolution imaging spectroradiometer (MODIS) images with different bands, and the latest Chinese Jilin-1 satellite images with high spatial resolution. According to the visual experimental results obtained from the MODIS remote sensing data, the SR images using the proposed SR algorithm are superior to those using a conventional bicubic interpolation algorithm or traditional SR algorithms without preprocessing. Fusion algorithms, e.g., standard intensity-hue-saturation, principal component analysis, wavelet transform, and the proposed SR algorithms are utilized to merge the multispectral and panchromatic images acquired by the Jilin-1 satellite. The effectiveness of the proposed SR algorithm is assessed by parameters such as peak signal-to-noise ratio, structural similarity index, correlation coefficient, root-mean-square error, relative dimensionless global error in synthesis, relative average spectral error, spectral angle mapper, and the quality index Q4, and its performance is better than that of the standard image fusion algorithms.

  9. Genetic Algorithm for Optimization: Preprocessing with n Dimensional Bisection and Error Estimation

    NASA Technical Reports Server (NTRS)

    Sen, S. K.; Shaykhian, Gholam Ali

    2006-01-01

    A knowledge of the appropriate values of the parameters of a genetic algorithm (GA) such as the population size, the shrunk search space containing the solution, crossover and mutation probabilities is not available a priori for a general optimization problem. Recommended here is a polynomial-time preprocessing scheme that includes an n-dimensional bisection and that determines the foregoing parameters before deciding upon an appropriate GA for all problems of similar nature and type. Such a preprocessing is not only fast but also enables us to get the global optimal solution and its reasonably narrow error bounds with a high degree of confidence.

  10. Cancer microarray data feature selection using multi-objective binary particle swarm optimization algorithm.

    PubMed

    Annavarapu, Chandra Sekhara Rao; Dara, Suresh; Banka, Haider

    2016-01-01

    Cancer investigations in microarray data play a major role in cancer analysis and the treatment. Cancer microarray data consists of complex gene expressed patterns of cancer. In this article, a Multi-Objective Binary Particle Swarm Optimization (MOBPSO) algorithm is proposed for analyzing cancer gene expression data. Due to its high dimensionality, a fast heuristic based pre-processing technique is employed to reduce some of the crude domain features from the initial feature set. Since these pre-processed and reduced features are still high dimensional, the proposed MOBPSO algorithm is used for finding further feature subsets. The objective functions are suitably modeled by optimizing two conflicting objectives i.e., cardinality of feature subsets and distinctive capability of those selected subsets. As these two objective functions are conflicting in nature, they are more suitable for multi-objective modeling. The experiments are carried out on benchmark gene expression datasets, i.e., Colon, Lymphoma and Leukaemia available in literature. The performance of the selected feature subsets with their classification accuracy and validated using 10 fold cross validation techniques. A detailed comparative study is also made to show the betterment or competitiveness of the proposed algorithm.

  11. Change detection using landsat time series: A review of frequencies, preprocessing, algorithms, and applications

    NASA Astrophysics Data System (ADS)

    Zhu, Zhe

    2017-08-01

    The free and open access to all archived Landsat images in 2008 has completely changed the way of using Landsat data. Many novel change detection algorithms based on Landsat time series have been developed We present a comprehensive review of four important aspects of change detection studies based on Landsat time series, including frequencies, preprocessing, algorithms, and applications. We observed the trend that the more recent the study, the higher the frequency of Landsat time series used. We reviewed a series of image preprocessing steps, including atmospheric correction, cloud and cloud shadow detection, and composite/fusion/metrics techniques. We divided all change detection algorithms into six categories, including thresholding, differencing, segmentation, trajectory classification, statistical boundary, and regression. Within each category, six major characteristics of different algorithms, such as frequency, change index, univariate/multivariate, online/offline, abrupt/gradual change, and sub-pixel/pixel/spatial were analyzed. Moreover, some of the widely-used change detection algorithms were also discussed. Finally, we reviewed different change detection applications by dividing these applications into two categories, change target and change agent detection.

  12. Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data.

    PubMed

    Tsuji, Junko; Weng, Zhiping

    2016-11-01

    Cytosine methylation regulates many biological processes such as gene expression, chromatin structure and chromosome stability. The whole genome bisulfite sequencing (WGBS) technique measures the methylation level at each cytosine throughout the genome. There are an increasing number of publicly available pipelines for analyzing WGBS data, reflecting many choices of read mapping algorithms as well as preprocessing and postprocessing methods. We simulated single-end and paired-end reads based on three experimental data sets, and comprehensively evaluated 192 combinations of three preprocessing, five postprocessing and five widely used read mapping algorithms. We also compared paired-end data with single-end data at the same sequencing depth for performance of read mapping and methylation level estimation. Bismark and LAST were the most robust mapping algorithms. We found that Mott trimming and quality filtering individually improved the performance of both read mapping and methylation level estimation, but combining them did not lead to further improvement. Furthermore, we confirmed that paired-end sequencing reduced error rate and enhanced sensitivity for both read mapping and methylation level estimation, especially for short reads and in repetitive regions of the human genome. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  13. Experimental evaluation of video preprocessing algorithms for automatic target hand-off

    NASA Astrophysics Data System (ADS)

    McIngvale, P. H.; Guyton, R. D.

    It is pointed out that the Automatic Target Hand-Off Correlator (ATHOC) hardware has been modified to permit operation in a nonreal-time mode as a programmable laboratory test unit using video recordings as inputs and allowing several preprocessing algorithms to be software programmable. In parallel with this hardware modification effort, an analysis and simulation effort has been underway to help determine which of the many available preprocessing algorithms should be implemented in the ATHOC software. It is noted that videotapes from a current technology airborne target acquisition system and an imaging infrared missile seeker were recorded and used in the laboratory experiments. These experiments are described and the results are presented. A set of standard parameters is found for each case. Consideration of the background in the target scene is found to be important. Analog filter cutoff frequencies of 2.5 MHz for low pass and 300 kHz for high pass are found to give best results. EPNC = 1 is found to be slightly better than EPNC = 0. It is also shown that trilevel gives better results than bilevel.

  14. Performance comparison of SLFN training algorithms for DNA microarray classification.

    PubMed

    Huynh, Hieu Trung; Kim, Jung-Ja; Won, Yonggwan

    2011-01-01

    The classification of biological samples measured by DNA microarrays has been a major topic of interest in the last decade, and several approaches to this topic have been investigated. However, till now, classifying the high-dimensional data of microarrays still presents a challenge to researchers. In this chapter, we focus on evaluating the performance of the training algorithms of the single hidden layer feedforward neural networks (SLFNs) to classify DNA microarrays. The training algorithms consist of backpropagation (BP), extreme learning machine (ELM) and regularized least squares ELM (RLS-ELM), and an effective algorithm called neural-SVD has recently been proposed. We also compare the performance of the neural network approaches with popular classifiers such as support vector machine (SVM), principle component analysis (PCA) and fisher discriminant analysis (FDA).

  15. A Data Preprocessing Algorithm for Classification Model Based On Rough Sets

    NASA Astrophysics Data System (ADS)

    Xiang-wei, Li; Yian-fang, Qi

    Aimed to solve the limitation of abundant data to constructing classification modeling in data mining, the paper proposed a novel effective preprocessing algorithm based on rough sets. Firstly, we construct the relation Information System using original data sets. Secondly, make use of attribute reduction theory of Rough sets to produce the Core of Information System. Core is the most important and necessary information which cannot reduce in original Information System. So it can get a same effect as original data sets to data analysis, and can construct classification modeling using it. Thirdly, construct indiscernibility matrix using reduced Information System, and finally, get the classification of original data sets. Compared to existing techniques, the developed algorithm enjoy following advantages: (1) avoiding the abundant data in follow-up data processing, and (2) avoiding large amount of computation in whole data mining process. (3) The results become more effective because of introducing the attributes reducing theory of Rough Sets.

  16. Cancer Classification in Microarray Data using a Hybrid Selective Independent Component Analysis and υ-Support Vector Machine Algorithm

    PubMed Central

    Saberkari, Hamidreza; Shamsi, Mousa; Joroughi, Mahsa; Golabi, Faegheh; Sedaaghi, Mohammad Hossein

    2014-01-01

    Microarray data have an important role in identification and classification of the cancer tissues. Having a few samples of microarrays in cancer researches is always one of the most concerns which lead to some problems in designing the classifiers. For this matter, preprocessing gene selection techniques should be utilized before classification to remove the noninformative genes from the microarray data. An appropriate gene selection method can significantly improve the performance of cancer classification. In this paper, we use selective independent component analysis (SICA) for decreasing the dimension of microarray data. Using this selective algorithm, we can solve the instability problem occurred in the case of employing conventional independent component analysis (ICA) methods. First, the reconstruction error and selective set are analyzed as independent components of each gene, which have a small part in making error in order to reconstruct new sample. Then, some of the modified support vector machine (υ-SVM) algorithm sub-classifiers are trained, simultaneously. Eventually, the best sub-classifier with the highest recognition rate is selected. The proposed algorithm is applied on three cancer datasets (leukemia, breast cancer and lung cancer datasets), and its results are compared with other existing methods. The results illustrate that the proposed algorithm (SICA + υ-SVM) has higher accuracy and validity in order to increase the classification accuracy. Such that, our proposed algorithm exhibits relative improvements of 3.3% in correctness rate over ICA + SVM and SVM algorithms in lung cancer dataset. PMID:25426433

  17. Cancer Classification in Microarray Data using a Hybrid Selective Independent Component Analysis and υ-Support Vector Machine Algorithm.

    PubMed

    Saberkari, Hamidreza; Shamsi, Mousa; Joroughi, Mahsa; Golabi, Faegheh; Sedaaghi, Mohammad Hossein

    2014-10-01

    Microarray data have an important role in identification and classification of the cancer tissues. Having a few samples of microarrays in cancer researches is always one of the most concerns which lead to some problems in designing the classifiers. For this matter, preprocessing gene selection techniques should be utilized before classification to remove the noninformative genes from the microarray data. An appropriate gene selection method can significantly improve the performance of cancer classification. In this paper, we use selective independent component analysis (SICA) for decreasing the dimension of microarray data. Using this selective algorithm, we can solve the instability problem occurred in the case of employing conventional independent component analysis (ICA) methods. First, the reconstruction error and selective set are analyzed as independent components of each gene, which have a small part in making error in order to reconstruct new sample. Then, some of the modified support vector machine (υ-SVM) algorithm sub-classifiers are trained, simultaneously. Eventually, the best sub-classifier with the highest recognition rate is selected. The proposed algorithm is applied on three cancer datasets (leukemia, breast cancer and lung cancer datasets), and its results are compared with other existing methods. The results illustrate that the proposed algorithm (SICA + υ-SVM) has higher accuracy and validity in order to increase the classification accuracy. Such that, our proposed algorithm exhibits relative improvements of 3.3% in correctness rate over ICA + SVM and SVM algorithms in lung cancer dataset.

  18. 3-D image pre-processing algorithms for improved automated tracing of neuronal arbors.

    PubMed

    Narayanaswamy, Arunachalam; Wang, Yu; Roysam, Badrinath

    2011-09-01

    The accuracy and reliability of automated neurite tracing systems is ultimately limited by image quality as reflected in the signal-to-noise ratio, contrast, and image variability. This paper describes a novel combination of image processing methods that operate on images of neurites captured by confocal and widefield microscopy, and produce synthetic images that are better suited to automated tracing. The algorithms are based on the curvelet transform (for denoising curvilinear structures and local orientation estimation), perceptual grouping by scalar voting (for elimination of non-tubular structures and improvement of neurite continuity while preserving branch points), adaptive focus detection, and depth estimation (for handling widefield images without deconvolution). The proposed methods are fast, and capable of handling large images. Their ability to handle images of unlimited size derives from automated tiling of large images along the lateral dimension, and processing of 3-D images one optical slice at a time. Their speed derives in part from the fact that the core computations are formulated in terms of the Fast Fourier Transform (FFT), and in part from parallel computation on multi-core computers. The methods are simple to apply to new images since they require very few adjustable parameters, all of which are intuitive. Examples of pre-processing DIADEM Challenge images are used to illustrate improved automated tracing resulting from our pre-processing methods.

  19. DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach

    NASA Astrophysics Data System (ADS)

    Tchagang, Alain B.; Tewfik, Ahmed H.

    2006-12-01

    Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNA microarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray experimental data for example, the goal of biclustering algorithms is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this study, we develop novel biclustering algorithms using basic linear algebra and arithmetic tools. The proposed biclustering algorithms can be used to search for all biclusters with constant values, biclusters with constant values on rows, biclusters with constant values on columns, and biclusters with coherent values from a set of data in a timely manner and without solving any optimization problem. We also show how one of the proposed biclustering algorithms can be adapted to identify biclusters with coherent evolution. The algorithms developed in this study discover all valid biclusters of each type, while almost all previous biclustering approaches will miss some.

  20. Clustering Short Time-Series Microarray

    NASA Astrophysics Data System (ADS)

    Ping, Loh Wei; Hasan, Yahya Abu

    2008-01-01

    Most microarray analyses are carried out on static gene expressions. However, the dynamical study of microarrays has lately gained more attention. Most researches on time-series microarray emphasize on the bioscience and medical aspects but few from the numerical aspect. This study attempts to analyze short time-series microarray mathematically using STEM clustering tool which formally preprocess data followed by clustering. We next introduce the Circular Mould Distance (CMD) algorithm with combinations of both preprocessing and clustering analysis. Both methods are subsequently compared in terms of efficiencies.

  1. Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

    NASA Astrophysics Data System (ADS)

    Qin, Cheng-Zhi; Zhan, Lijun

    2012-06-01

    As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU

  2. Microarrays

    ERIC Educational Resources Information Center

    Plomin, Robert; Schalkwyk, Leonard C.

    2007-01-01

    Microarrays are revolutionizing genetics by making it possible to genotype hundreds of thousands of DNA markers and to assess the expression (RNA transcripts) of all of the genes in the genome. Microarrays are slides the size of a postage stamp that contain millions of DNA sequences to which single-stranded DNA or RNA can hybridize. This…

  3. Microarrays

    ERIC Educational Resources Information Center

    Plomin, Robert; Schalkwyk, Leonard C.

    2007-01-01

    Microarrays are revolutionizing genetics by making it possible to genotype hundreds of thousands of DNA markers and to assess the expression (RNA transcripts) of all of the genes in the genome. Microarrays are slides the size of a postage stamp that contain millions of DNA sequences to which single-stranded DNA or RNA can hybridize. This…

  4. Fast-SNP: a fast matrix pre-processing algorithm for efficient loopless flux optimization of metabolic models.

    PubMed

    Saa, Pedro A; Nielsen, Lars K

    2016-12-15

    Computation of steady-state flux solutions in large metabolic models is routinely performed using flux balance analysis based on a simple LP (Linear Programming) formulation. A minimal requirement for thermodynamic feasibility of the flux solution is the absence of internal loops, which are enforced using 'loopless constraints'. The resulting loopless flux problem is a substantially harder MILP (Mixed Integer Linear Programming) problem, which is computationally expensive for large metabolic models. We developed a pre-processing algorithm that significantly reduces the size of the original loopless problem into an easier and equivalent MILP problem. The pre-processing step employs a fast matrix sparsification algorithm-Fast- sparse null-space pursuit (SNP)-inspired by recent results on SNP. By finding a reduced feasible 'loop-law' matrix subject to known directionalities, Fast-SNP considerably improves the computational efficiency in several metabolic models running different loopless optimization problems. Furthermore, analysis of the topology encoded in the reduced loop matrix enabled identification of key directional constraints for the potential permanent elimination of infeasible loops in the underlying model. Overall, Fast-SNP is an effective and simple algorithm for efficient formulation of loop-law constraints, making loopless flux optimization feasible and numerically tractable at large scale. Source code for MATLAB including examples is freely available for download at http://www.aibn.uq.edu.au/cssb-resources under Software. Optimization uses Gurobi, CPLEX or GLPK (the latter is included with the algorithm). lars.nielsen@uq.edu.auSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  5. Fast-SNP: a fast matrix pre-processing algorithm for efficient loopless flux optimization of metabolic models

    PubMed Central

    Saa, Pedro A.; Nielsen, Lars K.

    2016-01-01

    Motivation: Computation of steady-state flux solutions in large metabolic models is routinely performed using flux balance analysis based on a simple LP (Linear Programming) formulation. A minimal requirement for thermodynamic feasibility of the flux solution is the absence of internal loops, which are enforced using ‘loopless constraints’. The resulting loopless flux problem is a substantially harder MILP (Mixed Integer Linear Programming) problem, which is computationally expensive for large metabolic models. Results: We developed a pre-processing algorithm that significantly reduces the size of the original loopless problem into an easier and equivalent MILP problem. The pre-processing step employs a fast matrix sparsification algorithm—Fast- sparse null-space pursuit (SNP)—inspired by recent results on SNP. By finding a reduced feasible ‘loop-law’ matrix subject to known directionalities, Fast-SNP considerably improves the computational efficiency in several metabolic models running different loopless optimization problems. Furthermore, analysis of the topology encoded in the reduced loop matrix enabled identification of key directional constraints for the potential permanent elimination of infeasible loops in the underlying model. Overall, Fast-SNP is an effective and simple algorithm for efficient formulation of loop-law constraints, making loopless flux optimization feasible and numerically tractable at large scale. Availability and Implementation: Source code for MATLAB including examples is freely available for download at http://www.aibn.uq.edu.au/cssb-resources under Software. Optimization uses Gurobi, CPLEX or GLPK (the latter is included with the algorithm). Contact: lars.nielsen@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27559155

  6. LANDSAT data preprocessing

    NASA Technical Reports Server (NTRS)

    Austin, W. W.

    1983-01-01

    The effect on LANDSAT data of a Sun angle correction, an intersatellite LANDSAT-2 and LANDSAT-3 data range adjustment, and the atmospheric correction algorithm was evaluated. Fourteen 1978 crop year LACIE sites were used as the site data set. The preprocessing techniques were applied to multispectral scanner channel data and transformed data were plotted and used to analyze the effectiveness of the preprocessing techniques. Ratio transformations effectively reduce the need for preprocessing techniques to be applied directly to the data. Subtractive transformations are more sensitive to Sun angle and atmospheric corrections than ratios. Preprocessing techniques, other than those applied at the Goddard Space Flight Center, should only be applied as an option of the user. While performed on LANDSAT data the study results are also applicable to meteorological satellite data.

  7. Improved Data Preprocessing Algorithm for Time-Domain Induced Polarization Method with Digital Notch Filter

    NASA Astrophysics Data System (ADS)

    Ge, Shuang-Chao; Deng, Ming; Chen, Kai; Li, Bin; Li, Yuan

    2016-12-01

    Time-domain induced polarization (TDIP) measurement is seriously affected by power line interference and other field noise. Moreover, existing TDIP instruments generally output only the apparent chargeability, without providing complete secondary field information. To increase the robustness of TDIP method against interference and obtain more detailed secondary field information, an improved dataprocessing algorithm is proposed here. This method includes an efficient digital notch filter which can effectively eliminate all the main components of the power line interference. Hardware model of this filter was constructed and Vhsic Hardware Description Language code for it was generated using Digital Signal Processor Builder. In addition, a time-location method was proposed to extract secondary field information in case of unexpected data loss or failure of the synchronous technologies. Finally, the validity and accuracy of the method and the notch filter were verified by using the Cole-Cole model implemented by SIMULINK software. Moreover, indoor and field tests confirmed the application effect of the algorithm in the fieldwork.

  8. Implementation of spectral clustering on microarray data of carcinoma using k-means algorithm

    NASA Astrophysics Data System (ADS)

    Frisca, Bustamam, Alhadi; Siswantining, Titin

    2017-03-01

    Clustering is one of data analysis methods that aims to classify data which have similar characteristics in the same group. Spectral clustering is one of the most popular modern clustering algorithms. As an effective clustering technique, spectral clustering method emerged from the concepts of spectral graph theory. Spectral clustering method needs partitioning algorithm. There are some partitioning methods including PAM, SOM, Fuzzy c-means, and k-means. Based on the research that has been done by Capital and Choudhury in 2013, when using Euclidian distance k-means algorithm provide better accuracy than PAM algorithm. So in this paper we use k-means as our partition algorithm. The major advantage of spectral clustering is in reducing data dimension, especially in this case to reduce the dimension of large microarray dataset. Microarray data is a small-sized chip made of a glass plate containing thousands and even tens of thousands kinds of genes in the DNA fragments derived from doubling cDNA. Application of microarray data is widely used to detect cancer, for the example is carcinoma, in which cancer cells express the abnormalities in his genes. The purpose of this research is to classify the data that have high similarity in the same group and the data that have low similarity in the others. In this research, Carcinoma microarray data using 7457 genes. The result of partitioning using k-means algorithm is two clusters.

  9. Simultaneous data pre-processing and SVM classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils.

    PubMed

    Devos, Olivier; Downey, Gerard; Duponchel, Ludovic

    2014-04-01

    Classification is an important task in chemometrics. For several years now, support vector machines (SVMs) have proven to be powerful for infrared spectral data classification. However such methods require optimisation of parameters in order to control the risk of overfitting and the complexity of the boundary. Furthermore, it is established that the prediction ability of classification models can be improved using pre-processing in order to remove unwanted variance in the spectra. In this paper we propose a new methodology based on genetic algorithm (GA) for the simultaneous optimisation of SVM parameters and pre-processing (GENOPT-SVM). The method has been tested for the discrimination of the geographical origin of Italian olive oil (Ligurian and non-Ligurian) on the basis of near infrared (NIR) or mid infrared (FTIR) spectra. Different classification models (PLS-DA, SVM with mean centre data, GENOPT-SVM) have been tested and statistically compared using McNemar's statistical test. For the two datasets, SVM with optimised pre-processing give models with higher accuracy than the one obtained with PLS-DA on pre-processed data. In the case of the NIR dataset, most of this accuracy improvement (86.3% compared with 82.8% for PLS-DA) occurred using only a single pre-processing step. For the FTIR dataset, three optimised pre-processing steps are required to obtain SVM model with significant accuracy improvement (82.2%) compared to the one obtained with PLS-DA (78.6%). Furthermore, this study demonstrates that even SVM models have to be developed on the basis of well-corrected spectral data in order to obtain higher classification rates. Copyright © 2013 Elsevier Ltd. All rights reserved.

  10. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

    PubMed

    Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A

    2015-06-01

    Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.

  11. Image microarrays derived from tissue microarrays (IMA-TMA): New resource for computer-aided diagnostic algorithm development.

    PubMed

    Hipp, Jennifer A; Hipp, Jason D; Lim, Megan; Sharma, Gaurav; Smith, Lauren B; Hewitt, Stephen M; Balis, Ulysses G J

    2012-01-01

    Conventional tissue microarrays (TMAs) consist of cores of tissue inserted into a recipient paraffin block such that a tissue section on a single glass slide can contain numerous patient samples in a spatially structured pattern. Scanning TMAs into digital slides for subsequent analysis by computer-aided diagnostic (CAD) algorithms all offers the possibility of evaluating candidate algorithms against a near-complete repertoire of variable disease morphologies. This parallel interrogation approach simplifies the evaluation, validation, and comparison of such candidate algorithms. A recently developed digital tool, digital core (dCORE), and image microarray maker (iMAM) enables the capture of uniformly sized and resolution-matched images, with these representing key morphologic features and fields of view, aggregated into a single monolithic digital image file in an array format, which we define as an image microarray (IMA). We further define the TMA-IMA construct as IMA-based images derived from whole slide images of TMAs themselves. Here we describe the first combined use of the previously described dCORE and iMAM tools, toward the goal of generating a higher-order image construct, with multiple TMA cores from multiple distinct conventional TMAs assembled as a single digital image montage. This image construct served as the basis of the carrying out of a massively parallel image analysis exercise, based on the use of the previously described spatially invariant vector quantization (SIVQ) algorithm. Multicase, multifield TMA-IMAs of follicular lymphoma and follicular hyperplasia were separately rendered, using the aforementioned tools. Each of these two IMAs contained a distinct spectrum of morphologic heterogeneity with respect to both tingible body macrophage (TBM) appearance and apoptotic body morphology. SIVQ-based pattern matching, with ring vectors selected to screen for either tingible body macrophages or apoptotic bodies, was subsequently carried out on the

  12. Image microarrays derived from tissue microarrays (IMA-TMA): New resource for computer-aided diagnostic algorithm development

    PubMed Central

    Hipp, Jennifer A.; Hipp, Jason D.; Lim, Megan; Sharma, Gaurav; Smith, Lauren B.; Hewitt, Stephen M.; Balis, Ulysses G. J.

    2012-01-01

    Background: Conventional tissue microarrays (TMAs) consist of cores of tissue inserted into a recipient paraffin block such that a tissue section on a single glass slide can contain numerous patient samples in a spatially structured pattern. Scanning TMAs into digital slides for subsequent analysis by computer-aided diagnostic (CAD) algorithms all offers the possibility of evaluating candidate algorithms against a near-complete repertoire of variable disease morphologies. This parallel interrogation approach simplifies the evaluation, validation, and comparison of such candidate algorithms. A recently developed digital tool, digital core (dCORE), and image microarray maker (iMAM) enables the capture of uniformly sized and resolution-matched images, with these representing key morphologic features and fields of view, aggregated into a single monolithic digital image file in an array format, which we define as an image microarray (IMA). We further define the TMA-IMA construct as IMA-based images derived from whole slide images of TMAs themselves. Methods: Here we describe the first combined use of the previously described dCORE and iMAM tools, toward the goal of generating a higher-order image construct, with multiple TMA cores from multiple distinct conventional TMAs assembled as a single digital image montage. This image construct served as the basis of the carrying out of a massively parallel image analysis exercise, based on the use of the previously described spatially invariant vector quantization (SIVQ) algorithm. Results: Multicase, multifield TMA-IMAs of follicular lymphoma and follicular hyperplasia were separately rendered, using the aforementioned tools. Each of these two IMAs contained a distinct spectrum of morphologic heterogeneity with respect to both tingible body macrophage (TBM) appearance and apoptotic body morphology. SIVQ-based pattern matching, with ring vectors selected to screen for either tingible body macrophages or apoptotic bodies, was

  13. LS-CAP: an algorithm for identifying cytogenetic aberrations in hepatocellular carcinoma using microarray data.

    PubMed

    He, Xianmin; Wei, Qing; Sun, Meiqian; Fu, Xuping; Fan, Sichang; Li, Yao

    2006-05-01

    Biological techniques such as Array-Comparative genomic hybridization (CGH), fluorescent in situ hybridization (FISH) and affymetrix single nucleotide pleomorphism (SNP) array have been used to detect cytogenetic aberrations. However, on genomic scale, these techniques are labor intensive and time consuming. Comparative genomic microarray analysis (CGMA) has been used to identify cytogenetic changes in hepatocellular carcinoma (HCC) using gene expression microarray data. However, CGMA algorithm can not give precise localization of aberrations, fails to identify small cytogenetic changes, and exhibits false negatives and positives. Locally un-weighted smoothing cytogenetic aberrations prediction (LS-CAP) based on local smoothing and binomial distribution can be expected to address these problems. LS-CAP algorithm was built and used on HCC microarray profiles. Eighteen cytogenetic abnormalities were identified, among them 5 were reported previously, and 12 were proven by CGH studies. LS-CAP effectively reduced the false negatives and positives, and precisely located small fragments with cytogenetic aberrations.

  14. PMCR-Miner: parallel maximal confident association rules miner algorithm for microarray data set.

    PubMed

    Zakaria, Wael; Kotb, Yasser; Ghaleb, Fayed F M

    2015-01-01

    The MCR-Miner algorithm is aimed to mine all maximal high confident association rules form the microarray up/down-expressed genes data set. This paper introduces two new algorithms: IMCR-Miner and PMCR-Miner. The IMCR-Miner algorithm is an extension of the MCR-Miner algorithm with some improvements. These improvements implement a novel way to store the samples of each gene into a list of unsigned integers in order to benefit using the bitwise operations. In addition, the IMCR-Miner algorithm overcomes the drawbacks faced by the MCR-Miner algorithm by setting some restrictions to ignore repeated comparisons. The PMCR-Miner algorithm is a parallel version of the new proposed IMCR-Miner algorithm. The PMCR-Miner algorithm is based on shared-memory systems and task parallelism, where no time is needed in the process of sharing and combining data between processors. The experimental results on real microarray data sets show that the PMCR-Miner algorithm is more efficient and scalable than the counterparts.

  15. Crossword: A Fully Automated Algorithm for the Segmentation and Quality Control of Protein Microarray Images

    PubMed Central

    2015-01-01

    Biological assays formatted as microarrays have become a critical tool for the generation of the comprehensive data sets required for systems-level understanding of biological processes. Manual annotation of data extracted from images of microarrays, however, remains a significant bottleneck, particularly for protein microarrays due to the sensitivity of this technology to weak artifact signal. In order to automate the extraction and curation of data from protein microarrays, we describe an algorithm called Crossword that logically combines information from multiple approaches to fully automate microarray segmentation. Automated artifact removal is also accomplished by segregating structured pixels from the background noise using iterative clustering and pixel connectivity. Correlation of the location of structured pixels across image channels is used to identify and remove artifact pixels from the image prior to data extraction. This component improves the accuracy of data sets while reducing the requirement for time-consuming visual inspection of the data. Crossword enables a fully automated protocol that is robust to significant spatial and intensity aberrations. Overall, the average amount of user intervention is reduced by an order of magnitude and the data quality is increased through artifact removal and reduced user variability. The increase in throughput should aid the further implementation of microarray technologies in clinical studies. PMID:24417579

  16. Toward 'smart' DNA microarrays: algorithms for improving data quality and statistical inference

    NASA Astrophysics Data System (ADS)

    Bakewell, David J. G.; Wit, Ernst

    2007-12-01

    DNA microarrays are a laboratory tool for understanding biological processes at the molecular scale and future applications of this technology include healthcare, agriculture, and environment. Despite their usefulness, however, the information microarrays make available to the end-user is not used optimally, and the data is often noisy and of variable quality. This paper describes the use of hierarchical Maximum Likelihood Estimation (MLE) for generating algorithms that improve the quality of microarray data and enhance statistical inference about gene behavior. The paper describes examples of recent work that improves microarray performance, demonstrated using data from both Monte Carlo simulations and published experiments. One example looks at the variable quality of cDNA spots on a typical microarray surface. It is shown how algorithms, derived using MLE, are used to "weight" these spots according to their morphological quality, and subsequently lead to improved detection of gene activity. Another example, briefly discussed, addresses the "noisy data about too many genes" issue confronting many analysts who are also interested in the collective action of a group of genes, often organized as a pathway or complex. Preliminary work is described where MLE is used to "share" variance information across a pre-assigned group of genes of interest, leading to improved detection of gene activity.

  17. Stable feature selection and classification algorithms for multiclass microarray data

    PubMed Central

    2012-01-01

    Background Recent studies suggest that gene expression profiles are a promising alternative for clinical cancer classification. One major problem in applying DNA microarrays for classification is the dimension of obtained data sets. In this paper we propose a multiclass gene selection method based on Partial Least Squares (PLS) for selecting genes for classification. The new idea is to solve multiclass selection problem with the PLS method and decomposition to a set of two-class sub-problems: one versus rest (OvR) and one versus one (OvO). We use OvR and OvO two-class decomposition for other recently published gene selection method. Ranked gene lists are highly unstable in the sense that a small change of the data set often leads to big changes in the obtained ordered lists. In this paper, we take a look at the assessment of stability of the proposed methods. We use the linear support vector machines (SVM) technique in different variants: one versus one, one versus rest, multiclass SVM (MSVM) and the linear discriminant analysis (LDA) as a classifier. We use balanced bootstrap to estimate the prediction error and to test the variability of the obtained ordered lists. Results This paper focuses on effective identification of informative genes. As a result, a new strategy to find a small subset of significant genes is designed. Our results on real multiclass cancer data show that our method has a very high accuracy rate for different combinations of classification methods, giving concurrently very stable feature rankings. Conclusions This paper shows that the proposed strategies can improve the performance of selected gene sets substantially. OvR and OvO techniques applied to existing gene selection methods improve results as well. The presented method allows to obtain a more reliable classifier with less classifier error. In the same time the method generates more stable ordered feature lists in comparison with existing methods. Reviewers This article was reviewed

  18. Melanoma prognostic model using tissue microarrays and genetic algorithms.

    PubMed

    Gould Rothberg, Bonnie E; Berger, Aaron J; Molinaro, Annette M; Subtil, Antonio; Krauthammer, Michael O; Camp, Robert L; Bradley, William R; Ariyan, Stephan; Kluger, Harriet M; Rimm, David L

    2009-12-01

    As a result of the questionable risk-to-benefit ratio of adjuvant therapies, stage II melanoma is currently managed by observation because available clinicopathologic parameters cannot identify the 20% to 60% of such patients likely to develop metastatic disease. Here, we propose a multimarker molecular prognostic assay that can help triage patients at increased risk of recurrence. Protein expression for 38 candidates relevant to melanoma oncogenesis was evaluated using the automated quantitative analysis (AQUA) method for immunofluorescence-based immunohistochemistry in formalin-fixed, paraffin-embedded specimens from a cohort of 192 primary melanomas collected during 1959 to 1994. The prognostic assay was built using a genetic algorithm and validated on an independent cohort of 246 serial primary melanomas collected from 1997 to 2004. Multiple iterations of the genetic algorithm yielded a consistent five-marker solution. A favorable prognosis was predicted by ATF2 ln(non-nuclear/nuclear AQUA score ratio) of more than -0.052, p21(WAF1) nuclear compartment AQUA score of more than 12.98, p16(INK4A) ln(non-nuclear/nuclear AQUA score ratio) of < or = -0.083, beta-catenin total AQUA score of more than 38.68, and fibronectin total AQUA score of < or = 57.93. Primary tumors that met at least four of these five conditions were considered a low-risk group, and those that met three or fewer conditions formed a high-risk group (log-rank P < .0001). Multivariable proportional hazards analysis adjusting for clinicopathologic parameters shows that the high-risk group has significantly reduced survival on both the discovery (hazard ratio = 2.84; 95% CI, 1.46 to 5.49; P = .002) and validation (hazard ratio = 2.72; 95% CI, 1.12 to 6.58; P = .027) cohorts. This multimarker prognostic assay, an independent determinant of melanoma survival, might be beneficial in improving the selection of stage II patients for adjuvant therapy.

  19. Gene selection algorithms for microarray data based on least squares support vector machine

    PubMed Central

    Tang, E Ke; Suganthan, PN; Yao, Xin

    2006-01-01

    Background In discriminant analysis of microarray data, usually a small number of samples are expressed by a large number of genes. It is not only difficult but also unnecessary to conduct the discriminant analysis with all the genes. Hence, gene selection is usually performed to select important genes. Results A gene selection method searches for an optimal or near optimal subset of genes with respect to a given evaluation criterion. In this paper, we propose a new evaluation criterion, named the leave-one-out calculation (LOOC, A list of abbreviations appears just above the list of references) measure. A gene selection method, named leave-one-out calculation sequential forward selection (LOOCSFS) algorithm, is then presented by combining the LOOC measure with the sequential forward selection scheme. Further, a novel gene selection algorithm, the gradient-based leave-one-out gene selection (GLGS) algorithm, is also proposed. Both of the gene selection algorithms originate from an efficient and exact calculation of the leave-one-out cross-validation error of the least squares support vector machine (LS-SVM). The proposed approaches are applied to two microarray datasets and compared to other well-known gene selection methods using codes available from the second author. Conclusion The proposed gene selection approaches can provide gene subsets leading to more accurate classification results, while their computational complexity is comparable to the existing methods. The GLGS algorithm can also better scale to datasets with a very large number of genes. PMID:16504159

  20. An algorithm for finding biologically significant features in microarray data based on a priori manifold learning.

    PubMed

    Hira, Zena M; Trigeorgis, George; Gillies, Duncan F

    2014-01-01

    Microarray databases are a large source of genetic data, which, upon proper analysis, could enhance our understanding of biology and medicine. Many microarray experiments have been designed to investigate the genetic mechanisms of cancer, and analytical approaches have been applied in order to classify different types of cancer or distinguish between cancerous and non-cancerous tissue. However, microarrays are high-dimensional datasets with high levels of noise and this causes problems when using machine learning methods. A popular approach to this problem is to search for a set of features that will simplify the structure and to some degree remove the noise from the data. The most widely used approach to feature extraction is principal component analysis (PCA) which assumes a multivariate Gaussian model of the data. More recently, non-linear methods have been investigated. Among these, manifold learning algorithms, for example Isomap, aim to project the data from a higher dimensional space onto a lower dimension one. We have proposed a priori manifold learning for finding a manifold in which a representative set of microarray data is fused with relevant data taken from the KEGG pathway database. Once the manifold has been constructed the raw microarray data is projected onto it and clustering and classification can take place. In contrast to earlier fusion based methods, the prior knowledge from the KEGG databases is not used in, and does not bias the classification process--it merely acts as an aid to find the best space in which to search the data. In our experiments we have found that using our new manifold method gives better classification results than using either PCA or conventional Isomap.

  1. A novel feature extraction approach for microarray data based on multi-algorithm fusion.

    PubMed

    Jiang, Zhu; Xu, Rong

    2015-01-01

    Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions.

  2. SPACE: an algorithm to predict and quantify alternatively spliced isoforms using microarrays

    PubMed Central

    Anton, Miguel A; Gorostiaga, Dorleta; Guruceaga, Elizabeth; Segura, Victor; Carmona-Saez, Pedro; Pascual-Montano, Alberto; Pio, Ruben; Montuenga, Luis M; Rubio, Angel

    2008-01-01

    Exon and exon+junction microarrays are promising tools for studying alternative splicing. Current analytical tools applied to these arrays lack two relevant features: the ability to predict unknown spliced forms and the ability to quantify the concentration of known and unknown isoforms. SPACE is an algorithm that has been developed to (1) estimate the number of different transcripts expressed under several conditions, (2) predict the precursor mRNA splicing structure and (3) quantify the transcript concentrations including unknown forms. The results presented here show its robustness and accuracy for real and simulated data. PMID:18312629

  3. Sound quality prediction based on systematic metric selection and shrinkage: Comparison of stepwise, lasso, and elastic-net algorithms and clustering preprocessing

    NASA Astrophysics Data System (ADS)

    Gauthier, Philippe-Aubert; Scullion, William; Berry, Alain

    2017-07-01

    Sound quality is the impression of quality that is transmitted by the sound of a device. Its importance in sound and acoustical design of consumer products no longer needs to be demonstrated. One of the challenges is the creation of a prediction model that is able to predict the results of a listening test while using metrics derived from the sound stimuli. Often, these models are either derived using linear regression on a limited set of experimenter-selected metrics, or using more complex algorithms such as neural networks. In the former case, the user-selected metrics can bias the model and reflect the engineer pre-conceived idea of sound quality while missing potential features. In the latter case, although prediction might be efficient, the model is often in the form of a black-box which is difficult to use as a sound design guideline for engineers. In this paper, preprocessing by participants clustering and three different algorithms are compared in order to construct a sound quality prediction model that does not suffer from these limitations. The lasso, elastic-net and stepwise algorithms are tested for listening tests of consumer product for which 91 metrics are used as potential predictors. Based on the reported results, it is shown that the most promising algorithm is the lasso which is able to (1) efficiently limit the number of metrics, (2) most accurately predict the results of listening tests, and (3) provide a meaningful model that can be used as understandable design guidelines.

  4. Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords.

    PubMed

    Luque-Baena, R M; Urda, D; Gonzalo Claros, M; Franco, L; Jerez, J M

    2014-06-01

    Genetic algorithms are widely used in the estimation of expression profiles from microarrays data. However, these techniques are unable to produce stable and robust solutions suitable to use in clinical and biomedical studies. This paper presents a novel two-stage evolutionary strategy for gene feature selection combining the genetic algorithm with biological information extracted from the KEGG database. A comparative study is carried out over public data from three different types of cancer (leukemia, lung cancer and prostate cancer). Even though the analyses only use features having KEGG information, the results demonstrate that this two-stage evolutionary strategy increased the consistency, robustness and accuracy of a blind discrimination among relapsed and healthy individuals. Therefore, this approach could facilitate the definition of gene signatures for the clinical prognosis and diagnostic of cancer diseases in a near future. Additionally, it could also be used for biological knowledge discovery about the studied disease. Copyright © 2014 Elsevier Inc. All rights reserved.

  5. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling.

    PubMed

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.

  6. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling

    PubMed Central

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems. PMID:25961028

  7. Forward-Masked Frequency Selectivity Improvements in Simulated and Actual Cochlear Implant Users Using a Preprocessing Algorithm

    PubMed Central

    Jürgens, Tim

    2016-01-01

    Frequency selectivity can be quantified using masking paradigms, such as psychophysical tuning curves (PTCs). Normal-hearing (NH) listeners show sharp PTCs that are level- and frequency-dependent, whereas frequency selectivity is strongly reduced in cochlear implant (CI) users. This study aims at (a) assessing individual shapes of PTCs in CI users, (b) comparing these shapes to those of simulated CI listeners (NH listeners hearing through a CI simulation), and (c) increasing the sharpness of PTCs using a biologically inspired dynamic compression algorithm, BioAid, which has been shown to sharpen the PTC shape in hearing-impaired listeners. A three-alternative-forced-choice forward-masking technique was used to assess PTCs in 8 CI users (with their own speech processor) and 11 NH listeners (with and without listening through a vocoder to simulate electric hearing). CI users showed flat PTCs with large interindividual variability in shape, whereas simulated CI listeners had PTCs of the same average flatness, but more homogeneous shapes across listeners. The algorithm BioAid was used to process the stimuli before entering the CI users’ speech processor or the vocoder simulation. This algorithm was able to partially restore frequency selectivity in both groups, particularly in seven out of eight CI users, meaning significantly sharper PTCs than in the unprocessed condition. The results indicate that algorithms can improve the large-scale sharpness of frequency selectivity in some CI users. This finding may be useful for the design of sound coding strategies particularly for situations in which high frequency selectivity is desired, such as for music perception. PMID:27604785

  8. Classifier dependent feature preprocessing methods

    NASA Astrophysics Data System (ADS)

    Rodriguez, Benjamin M., II; Peterson, Gilbert L.

    2008-04-01

    In mobile applications, computational complexity is an issue that limits sophisticated algorithms from being implemented on these devices. This paper provides an initial solution to applying pattern recognition systems on mobile devices by combining existing preprocessing algorithms for recognition. In pattern recognition systems, it is essential to properly apply feature preprocessing tools prior to training classification models in an attempt to reduce computational complexity and improve the overall classification accuracy. The feature preprocessing tools extended for the mobile environment are feature ranking, feature extraction, data preparation and outlier removal. Most desktop systems today are capable of processing a majority of the available classification algorithms without concern of processing while the same is not true on mobile platforms. As an application of pattern recognition for mobile devices, the recognition system targets the problem of steganalysis, determining if an image contains hidden information. The measure of performance shows that feature preprocessing increases the overall steganalysis classification accuracy by an average of 22%. The methods in this paper are tested on a workstation and a Nokia 6620 (Symbian operating system) camera phone with similar results.

  9. ParaSAM: a parallelized version of the significance analysis of microarrays algorithm

    PubMed Central

    Sharma, Ashok; Zhao, Jieping; Podolsky, Robert; McIndoe, Richard A.

    2010-01-01

    Motivation: Significance analysis of microarrays (SAM) is a widely used permutation-based approach to identifying differentially expressed genes in microarray datasets. While SAM is freely available as an Excel plug-in and as an R-package, analyses are often limited for large datasets due to very high memory requirements. Summary: We have developed a parallelized version of the SAM algorithm called ParaSAM to overcome the memory limitations. This high performance multithreaded application provides the scientific community with an easy and manageable client-server Windows application with graphical user interface and does not require programming experience to run. The parallel nature of the application comes from the use of web services to perform the permutations. Our results indicate that ParaSAM is not only faster than the serial version, but also can analyze extremely large datasets that cannot be performed using existing implementations. Availability:A web version open to the public is available at http://bioanalysis.genomics.mcg.edu/parasam. For local installations, both the windows and web implementations of ParaSAM are available for free at http://www.amdcc.org/bioinformatics/software/parasam.aspx Contact: rmcindoe@mail.mcg.edu Supplementary information: Supplementary Data is available at Bioinformatics online. PMID:20400455

  10. ParaSAM: a parallelized version of the significance analysis of microarrays algorithm.

    PubMed

    Sharma, Ashok; Zhao, Jieping; Podolsky, Robert; McIndoe, Richard A

    2010-06-01

    Significance analysis of microarrays (SAM) is a widely used permutation-based approach to identifying differentially expressed genes in microarray datasets. While SAM is freely available as an Excel plug-in and as an R-package, analyses are often limited for large datasets due to very high memory requirements. We have developed a parallelized version of the SAM algorithm called ParaSAM to overcome the memory limitations. This high performance multithreaded application provides the scientific community with an easy and manageable client-server Windows application with graphical user interface and does not require programming experience to run. The parallel nature of the application comes from the use of web services to perform the permutations. Our results indicate that ParaSAM is not only faster than the serial version, but also can analyze extremely large datasets that cannot be performed using existing implementations. A web version open to the public is available at http://bioanalysis.genomics.mcg.edu/parasam. For local installations, both the windows and web implementations of ParaSAM are available for free at http://www.amdcc.org/bioinformatics/software/parasam.aspx.

  11. Implementation of spectral clustering with partitioning around medoids (PAM) algorithm on microarray data of carcinoma

    NASA Astrophysics Data System (ADS)

    Cahyaningrum, Rosalia D.; Bustamam, Alhadi; Siswantining, Titin

    2017-03-01

    Technology of microarray became one of the imperative tools in life science to observe the gene expression levels, one of which is the expression of the genes of people with carcinoma. Carcinoma is a cancer that forms in the epithelial tissue. These data can be analyzed such as the identification expressions hereditary gene and also build classifications that can be used to improve diagnosis of carcinoma. Microarray data usually served in large dimension that most methods require large computing time to do the grouping. Therefore, this study uses spectral clustering method which allows to work with any object for reduces dimension. Spectral clustering method is a method based on spectral decomposition of the matrix which is represented in the form of a graph. After the data dimensions are reduced, then the data are partitioned. One of the famous partition method is Partitioning Around Medoids (PAM) which is minimize the objective function with exchanges all the non-medoid points into medoid point iteratively until converge. Objectivity of this research is to implement methods spectral clustering and partitioning algorithm PAM to obtain groups of 7457 genes with carcinoma based on the similarity value. The result in this study is two groups of genes with carcinoma.

  12. Development of a Physical Model-Based Algorithm for the Detection of Single-Nucleotide Substitutions by Using Tiling Microarrays

    PubMed Central

    Ono, Naoaki; Suzuki, Shingo; Furusawa, Chikara; Shimizu, Hiroshi; Yomo, Tetsuya

    2013-01-01

    High-density DNA microarrays are useful tools for analyzing sequence changes in DNA samples. Although microarray analysis provides informative signals from a large number of probes, the analysis and interpretation of these signals have certain inherent limitations, namely, complex dependency of signals on the probe sequences and the existence of false signals arising from non-specific binding between probe and target. In this study, we have developed a novel algorithm to detect the single-base substitutions by using microarray data based on a thermodynamic model of hybridization. We modified the thermodynamic model by introducing a penalty for mismatches that represent the effects of substitutions on hybridization affinity. This penalty results in significantly higher detection accuracy than other methods, indicating that the incorporation of hybridization free energy can improve the analysis of sequence variants by using microarray data. PMID:23382915

  13. Evaluation of multivariate calibration models with different pre-processing and processing algorithms for a novel resolution and quantitation of spectrally overlapped quaternary mixture in syrup

    NASA Astrophysics Data System (ADS)

    Moustafa, Azza A.; Hegazy, Maha A.; Mohamed, Dalia; Ali, Omnia

    2016-02-01

    A novel approach for the resolution and quantitation of severely overlapped quaternary mixture of carbinoxamine maleate (CAR), pholcodine (PHL), ephedrine hydrochloride (EPH) and sunset yellow (SUN) in syrup was demonstrated utilizing different spectrophotometric assisted multivariate calibration methods. The applied methods have used different processing and pre-processing algorithms. The proposed methods were partial least squares (PLS), concentration residuals augmented classical least squares (CRACLS), and a novel method; continuous wavelet transforms coupled with partial least squares (CWT-PLS). These methods were applied to a training set in the concentration ranges of 40-100 μg/mL, 40-160 μg/mL, 100-500 μg/mL and 8-24 μg/mL for the four components, respectively. The utilized methods have not required any preliminary separation step or chemical pretreatment. The validity of the methods was evaluated by an external validation set. The selectivity of the developed methods was demonstrated by analyzing the drugs in their combined pharmaceutical formulation without any interference from additives. The obtained results were statistically compared with the official and reported methods where no significant difference was observed regarding both accuracy and precision.

  14. Classification of a large microarray data set: Algorithm comparison and analysis of drug signatures

    PubMed Central

    Natsoulis, Georges; El Ghaoui, Laurent; Lanckriet, Gert R.G.; Tolley, Alexander M.; Leroy, Fabrice; Dunlea, Shane; Eynon, Barrett P.; Pearson, Cecelia I.; Tugendreich, Stuart; Jarnagin, Kurt

    2005-01-01

    A large gene expression database has been produced that characterizes the gene expression and physiological effects of hundreds of approved and withdrawn drugs, toxicants, and biochemical standards in various organs of live rats. In order to derive useful biological knowledge from this large database, a variety of supervised classification algorithms were compared using a 597-microarray subset of the data. Our studies show that several types of linear classifiers based on Support Vector Machines (SVMs) and Logistic Regression can be used to derive readily interpretable drug signatures with high classification performance. Both methods can be tuned to produce classifiers of drug treatments in the form of short, weighted gene lists which upon analysis reveal that some of the signature genes have a positive contribution (act as “rewards” for the class-of-interest) while others have a negative contribution (act as “penalties”) to the classification decision. The combination of reward and penalty genes enhances performance by keeping the number of false positive treatments low. The results of these algorithms are combined with feature selection techniques that further reduce the length of the drug signatures, an important step towards the development of useful diagnostic biomarkers and low-cost assays. Multiple signatures with no genes in common can be generated for the same classification end-point. Comparison of these gene lists identifies biological processes characteristic of a given class. PMID:15867433

  15. Preprocessing of compressed digital video

    NASA Astrophysics Data System (ADS)

    Segall, C. Andrew; Karunaratne, Passant V.; Katsaggelos, Aggelos K.

    2000-12-01

    Pre-processing algorithms improve on the performance of a video compression system by removing spurious noise and insignificant features from the original images. This increases compression efficiency and attenuates coding artifacts. Unfortunately, determining the appropriate amount of pre-filtering is a difficult problem, as it depends on both the content of an image as well as the target bit-rate of compression algorithm. In this paper, we explore a pre- processing technique that is loosely coupled to the quantization decisions of a rate control mechanism. This technique results in a pre-processing system that operates directly on the Displaced Frame Difference (DFD) and is applicable to any standard-compatible compression system. Results explore the effect of several standard filters on the DFD. An adaptive technique is then considered.

  16. Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data

    PubMed Central

    2014-01-01

    Background Extracting relevant information from microarray data is a very complex task due to the characteristics of the data sets, as they comprise a large number of features while few samples are generally available. In this sense, feature selection is a very important aspect of the analysis helping in the tasks of identifying relevant genes and also for maximizing predictive information. Methods Due to its simplicity and speed, Stepwise Forward Selection (SFS) is a widely used feature selection technique. In this work, we carry a comparative study of SFS and Genetic Algorithms (GA) as general frameworks for the analysis of microarray data with the aim of identifying group of genes with high predictive capability and biological relevance. Six standard and machine learning-based techniques (Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Naive Bayes (NB), C-MANTEC Constructive Neural Network, K-Nearest Neighbors (kNN) and Multilayer perceptron (MLP)) are used within both frameworks using six free-public datasets for the task of predicting cancer outcome. Results Better cancer outcome prediction results were obtained using the GA framework noting that this approach, in comparison to the SFS one, leads to a larger selection set, uses a large number of comparison between genetic profiles and thus it is computationally more intensive. Also the GA framework permitted to obtain a set of genes that can be considered to be more biologically relevant. Regarding the different classifiers used standard feedforward neural networks (MLP), LDA and SVM lead to similar and best results, while C-MANTEC and k-NN followed closely but with a lower accuracy. Further, C-MANTEC, MLP and LDA permitted to obtain a more limited set of genes in comparison to SVM, NB and kNN, and in particular C-MANTEC resulted in the most robust classifier in terms of changes in the parameter settings. Conclusions This study shows that if prediction accuracy is the objective, the GA

  17. Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data.

    PubMed

    Luque-Baena, Rafael Marcos; Urda, Daniel; Subirats, Jose Luis; Franco, Leonardo; Jerez, Jose M

    2014-05-07

    Extracting relevant information from microarray data is a very complex task due to the characteristics of the data sets, as they comprise a large number of features while few samples are generally available. In this sense, feature selection is a very important aspect of the analysis helping in the tasks of identifying relevant genes and also for maximizing predictive information. Due to its simplicity and speed, Stepwise Forward Selection (SFS) is a widely used feature selection technique. In this work, we carry a comparative study of SFS and Genetic Algorithms (GA) as general frameworks for the analysis of microarray data with the aim of identifying group of genes with high predictive capability and biological relevance. Six standard and machine learning-based techniques (Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Naive Bayes (NB), C-MANTEC Constructive Neural Network, K-Nearest Neighbors (kNN) and Multilayer perceptron (MLP)) are used within both frameworks using six free-public datasets for the task of predicting cancer outcome. Better cancer outcome prediction results were obtained using the GA framework noting that this approach, in comparison to the SFS one, leads to a larger selection set, uses a large number of comparison between genetic profiles and thus it is computationally more intensive. Also the GA framework permitted to obtain a set of genes that can be considered to be more biologically relevant. Regarding the different classifiers used standard feedforward neural networks (MLP), LDA and SVM lead to similar and best results, while C-MANTEC and k-NN followed closely but with a lower accuracy. Further, C-MANTEC, MLP and LDA permitted to obtain a more limited set of genes in comparison to SVM, NB and kNN, and in particular C-MANTEC resulted in the most robust classifier in terms of changes in the parameter settings. This study shows that if prediction accuracy is the objective, the GA-based approach lead to better results

  18. A Regression-Based Differential Expression Detection Algorithm for Microarray Studies with Ultra-Low Sample Size

    PubMed Central

    McDonough, Molly; Rabe, Brian; Saha, Margaret

    2015-01-01

    Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality. PMID:25738861

  19. LTR: Linear Cross-Platform Integration of Microarray Data

    PubMed Central

    Boutros, Paul C.

    2010-01-01

    The size and scope of microarray experiments continue to increase. However, datasets generated on different platforms or at different centres contain biases. Improved techniques are needed to remove platform- and batch-specific biases. One experimental control is the replicate hybridization of a subset of samples at each site or on each platform to learn the relationship between the two platforms. To date, no algorithm exists to specifically use this type of control. LTR is a linear-modelling-based algorithm that learns the relationship between different microarray batches from replicate hybridizations. LTR was tested on a new benchmark dataset of 20 samples hybridized to different Affymetrix microarray platforms. Before LTR, the two platforms were significantly different; application of LTR removed this bias. LTR was tested with six separate data pre-processing algorithms, and its effectiveness was independent of the pre-processing algorithm. Sample-size experiments indicate that just three replicate hybridizations can significantly reduce bias. An R library implementing LTR is available. PMID:20838609

  20. Computational identification of anthocyanin-specific transcription factors using a rice microarray and maximum boundary range algorithm

    PubMed Central

    Kim, Chang Kug; Kikuchi, Shoshi; Hahn, Jang Ho; Park, Soo Chul; Kim, Yong Hwan; Lee, Byun Woo

    2010-01-01

    This study identifies 2,617 candidate genes related to anthocyanin biosynthesis in rice using microarray analysis and a newly developed maximum boundary range algorithm. Three seed developmental stages were examined in white cultivar and two black Dissociation insertion mutants. The resultant 235 transcription factor genes found to be associated with anthocyanin were classified into nine groups. It is compared the 235 genes by transcription factor analysis and 593 genes from among clusters of COGs related to anthocyanin functions. Total 32 genes were found to be expressed commonly. Among these, 9 unknown and hypothetical genes were revealed to be expressed at each developmental stage and were verified by RT-PCR. These genes most likely play regulatory roles in either anthocyanin production or metabolism during flavonoid biosynthesis. While these genes require further validation, our results underline the potential usefulness of the newly developed algorithm. PMID:21079756

  1. An MCMC Algorithm for Target Estimation in Real-Time DNA Microarrays

    NASA Astrophysics Data System (ADS)

    Vikalo, Haris; Gokdemir, Mahsuni

    2010-12-01

    DNA microarrays detect the presence and quantify the amounts of nucleic acid molecules of interest. They rely on a chemical attraction between the target molecules and their Watson-Crick complements, which serve as biological sensing elements (probes). The attraction between these biomolecules leads to binding, in which probes capture target analytes. Recently developed real-time DNA microarrays are capable of observing kinetics of the binding process. They collect noisy measurements of the amount of captured molecules at discrete points in time. Molecular binding is a random process which, in this paper, is modeled by a stochastic differential equation. The target analyte quantification is posed as a parameter estimation problem, and solved using a Markov Chain Monte Carlo technique. In simulation studies where we test the robustness with respect to the measurement noise, the proposed technique significantly outperforms previously proposed methods. Moreover, the proposed approach is tested and verified on experimental data.

  2. Biomarker Discovery Based on Hybrid Optimization Algorithm and Artificial Neural Networks on Microarray Data for Cancer Classification.

    PubMed

    Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Pirhadi, Shiva; Garshasbi, Masoud

    2015-01-01

    The improvement of high-through-put gene profiling based microarrays technology has provided monitoring the expression value of thousands of genes simultaneously. Detailed examination of changes in expression levels of genes can help physicians to have efficient diagnosing, classification of tumors and cancer's types as well as effective treatments. Finding genes that can classify the group of cancers correctly based on hybrid optimization algorithms is the main purpose of this paper. In this paper, a hybrid particle swarm optimization and genetic algorithm method are used for gene selection and also artificial neural network (ANN) is adopted as the classifier. In this work, we have improved the ability of the algorithm for the classification problem by finding small group of biomarkers and also best parameters of the classifier. The proposed approach is tested on three benchmark gene expression data sets: Blood (acute myeloid leukemia, acute lymphoblastic leukemia), colon and breast datasets. We used 10-fold cross-validation to achieve accuracy and also decision tree algorithm to find the relation between the biomarkers for biological point of view. To test the ability of the trained ANN models to categorize the cancers, we analyzed additional blinded samples that were not previously used for the training procedure. Experimental results show that the proposed method can reduce the dimension of the data set and confirm the most informative gene subset and improve classification accuracy with best parameters based on datasets.

  3. Comparison of gene expression microarray data with count-based RNA measurements informs microarray interpretation.

    PubMed

    Richard, Arianne C; Lyons, Paul A; Peters, James E; Biasci, Daniele; Flint, Shaun M; Lee, James C; McKinney, Eoin F; Siegel, Richard M; Smith, Kenneth G C

    2014-08-04

    Although numerous investigations have compared gene expression microarray platforms, preprocessing methods and batch correction algorithms using constructed spike-in or dilution datasets, there remains a paucity of studies examining the properties of microarray data using diverse biological samples. Most microarray experiments seek to identify subtle differences between samples with variable background noise, a scenario poorly represented by constructed datasets. Thus, microarray users lack important information regarding the complexities introduced in real-world experimental settings. The recent development of a multiplexed, digital technology for nucleic acid measurement enables counting of individual RNA molecules without amplification and, for the first time, permits such a study. Using a set of human leukocyte subset RNA samples, we compared previously acquired microarray expression values with RNA molecule counts determined by the nCounter Analysis System (NanoString Technologies) in selected genes. We found that gene measurements across samples correlated well between the two platforms, particularly for high-variance genes, while genes deemed unexpressed by the nCounter generally had both low expression and low variance on the microarray. Confirming previous findings from spike-in and dilution datasets, this "gold-standard" comparison demonstrated signal compression that varied dramatically by expression level and, to a lesser extent, by dataset. Most importantly, examination of three different cell types revealed that noise levels differed across tissues. Microarray measurements generally correlate with relative RNA molecule counts within optimal ranges but suffer from expression-dependent accuracy bias and precision that varies across datasets. We urge microarray users to consider expression-level effects in signal interpretation and to evaluate noise properties in each dataset independently.

  4. Increased power of microarray analysis by use of an algorithm based on a multivariate procedure.

    PubMed

    Krohn, K; Eszlinger, M; Paschke, R; Roeder, I; Schuster, E

    2005-09-01

    The power of microarray analyses to detect differential gene expression strongly depends on the statistical and bioinformatical approaches used for data analysis. Moreover, the simultaneous testing of tens of thousands of genes for differential expression raises the 'multiple testing problem', increasing the probability of obtaining false positive test results. To achieve more reliable results, it is, therefore, necessary to apply adjustment procedures to restrict the family-wise type I error rate (FWE) or the false discovery rate. However, for the biologist the statistical power of such procedures often remains abstract, unless validated by an alternative experimental approach. In the present study, we discuss a multiplicity adjustment procedure applied to classical univariate as well as to recently proposed multivariate gene-expression scores. All procedures strictly control the FWE. We demonstrate that the use of multivariate scores leads to a more efficient identification of differentially expressed genes than the widely used MAS5 approach provided by the Affymetrix software tools (Affymetrix Microarray Suite 5 or GeneChip Operating Software). The practical importance of this finding is successfully validated using real time quantitative PCR and data from spike-in experiments. The R-code of the statistical routines can be obtained from the corresponding author. Schuster@imise.uni-leipzig.de

  5. The preprocessed doacross loop

    NASA Technical Reports Server (NTRS)

    Saltz, Joel H.; Mirchandaney, Ravi

    1990-01-01

    Dependencies between loop iterations cannot always be characterized during program compilation. Doacross loops typically make use of a-priori knowledge of inter-iteration dependencies to carry out required synchronizations. A type of doacross loop is proposed that allows the scheduling of iterations of a loop among processors without advance knowledge of inter-iteration dependencies. The method proposed for loop iterations requires that parallelizable preprocessing and postprocessing steps be carried out during program execution.

  6. Comparing Binaural Pre-processing Strategies III

    PubMed Central

    Warzybok, Anna; Ernst, Stephan M. A.

    2015-01-01

    A comprehensive evaluation of eight signal pre-processing strategies, including directional microphones, coherence filters, single-channel noise reduction, binaural beamformers, and their combinations, was undertaken with normal-hearing (NH) and hearing-impaired (HI) listeners. Speech reception thresholds (SRTs) were measured in three noise scenarios (multitalker babble, cafeteria noise, and single competing talker). Predictions of three common instrumental measures were compared with the general perceptual benefit caused by the algorithms. The individual SRTs measured without pre-processing and individual benefits were objectively estimated using the binaural speech intelligibility model. Ten listeners with NH and 12 HI listeners participated. The participants varied in age and pure-tone threshold levels. Although HI listeners required a better signal-to-noise ratio to obtain 50% intelligibility than listeners with NH, no differences in SRT benefit from the different algorithms were found between the two groups. With the exception of single-channel noise reduction, all algorithms showed an improvement in SRT of between 2.1 dB (in cafeteria noise) and 4.8 dB (in single competing talker condition). Model predictions with binaural speech intelligibility model explained 83% of the measured variance of the individual SRTs in the no pre-processing condition. Regarding the benefit from the algorithms, the instrumental measures were not able to predict the perceptual data in all tested noise conditions. The comparable benefit observed for both groups suggests a possible application of noise reduction schemes for listeners with different hearing status. Although the model can predict the individual SRTs without pre-processing, further development is necessary to predict the benefits obtained from the algorithms at an individual level. PMID:26721922

  7. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm.

    PubMed

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu; Tian, Suyan

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed.

  8. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm

    PubMed Central

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed. PMID:27446945

  9. Preoperative overnight parenteral nutrition (TPN) improves skeletal muscle protein metabolism indicated by microarray algorithm analyses in a randomized trial.

    PubMed

    Iresjö, Britt-Marie; Engström, Cecilia; Lundholm, Kent

    2016-06-01

    Loss of muscle mass is associated with increased risk of morbidity and mortality in hospitalized patients. Uncertainties of treatment efficiency by short-term artificial nutrition remain, specifically improvement of protein balance in skeletal muscles. In this study, algorithmic microarray analysis was applied to map cellular changes related to muscle protein metabolism in human skeletal muscle tissue during provision of overnight preoperative total parenteral nutrition (TPN). Twenty-two patients (11/group) scheduled for upper GI surgery due to malignant or benign disease received a continuous peripheral all-in-one TPN infusion (30 kcal/kg/day, 0.16 gN/kg/day) or saline infusion for 12 h prior operation. Biopsies from the rectus abdominis muscle were taken at the start of operation for isolation of muscle RNA RNA expression microarray analyses were performed with Agilent Sureprint G3, 8 × 60K arrays using one-color labeling. 447 mRNAs were differently expressed between study and control patients (P < 0.1). mRNAs related to ribosomal biogenesis, mRNA processing, and translation were upregulated during overnight nutrition; particularly anabolic signaling S6K1 (P < 0.01-0.1). Transcripts of genes associated with lysosomal degradation showed consistently lower expression during TPN while mRNAs for ubiquitin-mediated degradation of proteins as well as transcripts related to intracellular signaling pathways, PI3 kinase/MAPkinase, were either increased or decreased. In conclusion, muscle mRNA alterations during overnight standard TPN infusions at constant rate altered mRNAs associated with mTOR signaling; increased initiation of protein translation; and suppressed autophagy/lysosomal degradation of proteins. This indicates that overnight preoperative parenteral nutrition is effective to promote muscle protein metabolism. © 2016 The Authors. Physiological Reports published by Wiley Periodicals, Inc. on behalf of the American Physiological Society and The

  10. Microarray data classification using the spectral-feature-based TLS ensemble algorithm.

    PubMed

    Sun, Zhan-Li; Wang, Han; Lau, Wai-Shing; Seet, Gerald; Wang, Danwei; Lam, Kin-Man

    2014-09-01

    The reliable and accurate identification of cancer categories is crucial to a successful diagnosis and a proper treatment of the disease. In most existing work, samples of gene expression data are treated as one-dimensional signals, and are analyzed by means of some statistical signal processing techniques or intelligent computation algorithms. In this paper, from an image-processing viewpoint, a spectral-feature-based Tikhonov-regularized least-squares (TLS) ensemble algorithm is proposed for cancer classification using gene expression data. In the TLS model, a test sample is represented as a linear combination of the atoms of a dictionary. Two types of dictionaries, namely singular value decomposition (SVD)-based eigenassays and independent component analysis (ICA)-based eigenassays, are proposed for the TLS model, and both are extracted via a two-stage approach. The proposed algorithm is inspired by our finding that, among these eigenassays, the categories of some of the testing samples can be assigned correctly by using the TLS models formed from some of the spectral features, but not for those formed from the original samples only. In order to retain the positive characteristics of these spectral features in making correct category assignments, a strategy of classifier committee learning (CCL) is designed to combine the results obtained from the different spectral features. Experimental results on standard databases demonstrate the feasibility and effectiveness of the proposed method.

  11. Retinex Preprocessing for Improved Multi-Spectral Image Classification

    NASA Technical Reports Server (NTRS)

    Thompson, B.; Rahman, Z.; Park, S.

    2000-01-01

    The goal of multi-image classification is to identify and label "similar regions" within a scene. The ability to correctly classify a remotely sensed multi-image of a scene is affected by the ability of the classification process to adequately compensate for the effects of atmospheric variations and sensor anomalies. Better classification may be obtained if the multi-image is preprocessed before classification, so as to reduce the adverse effects of image formation. In this paper, we discuss the overall impact on multi-spectral image classification when the retinex image enhancement algorithm is used to preprocess multi-spectral images. The retinex is a multi-purpose image enhancement algorithm that performs dynamic range compression, reduces the dependence on lighting conditions, and generally enhances apparent spatial resolution. The retinex has been successfully applied to the enhancement of many different types of grayscale and color images. We show in this paper that retinex preprocessing improves the spatial structure of multi-spectral images and thus provides better within-class variations than would otherwise be obtained without the preprocessing. For a series of multi-spectral images obtained with diffuse and direct lighting, we show that without retinex preprocessing the class spectral signatures vary substantially with the lighting conditions. Whereas multi-dimensional clustering without preprocessing produced one-class homogeneous regions, the classification on the preprocessed images produced multi-class non-homogeneous regions. This lack of homogeneity is explained by the interaction between different agronomic treatments applied to the regions: the preprocessed images are closer to ground truth. The principle advantage that the retinex offers is that for different lighting conditions classifications derived from the retinex preprocessed images look remarkably "similar", and thus more consistent, whereas classifications derived from the original

  12. Fast and automatic algorithm for optic disc extraction in retinal images using principle-component-analysis-based preprocessing and curvelet transform.

    PubMed

    Shahbeig, Saleh; Pourghassem, Hossein

    2013-01-01

    Optic disc or optic nerve (ON) head extraction in retinal images has widespread applications in retinal disease diagnosis and human identification in biometric systems. This paper introduces a fast and automatic algorithm for detecting and extracting the ON region accurately from the retinal images without the use of the blood-vessel information. In this algorithm, to compensate for the destructive changes of the illumination and also enhance the contrast of the retinal images, we estimate the illumination of background and apply an adaptive correction function on the curvelet transform coefficients of retinal images. In other words, we eliminate the fault factors and pave the way to extract the ON region exactly. Then, we detect the ON region from retinal images using the morphology operators based on geodesic conversions, by applying a proper adaptive correction function on the reconstructed image's curvelet transform coefficients and a novel powerful criterion. Finally, using a local thresholding on the detected area of the retinal images, we extract the ON region. The proposed algorithm is evaluated on available images of DRIVE and STARE databases. The experimental results indicate that the proposed algorithm obtains an accuracy rate of 100% and 97.53% for the ON extractions on DRIVE and STARE databases, respectively.

  13. Preprocessing for classification of thermograms in breast cancer detection

    NASA Astrophysics Data System (ADS)

    Neumann, Łukasz; Nowak, Robert M.; Okuniewski, Rafał; Oleszkiewicz, Witold; Cichosz, Paweł; Jagodziński, Dariusz; Matysiewicz, Mateusz

    2016-09-01

    Performance of binary classification of breast cancer suffers from high imbalance between classes. In this article we present the preprocessing module designed to negate the discrepancy in training examples. Preprocessing module is based on standardization, Synthetic Minority Oversampling Technique and undersampling. We show how each algorithm influences classification accuracy. Results indicate that described module improves overall Area Under Curve up to 10% on the tested dataset. Furthermore we propose other methods of dealing with imbalanced datasets in breast cancer classification.

  14. Wavelet Preprocessing of Acoustic Signals

    DTIC Science & Technology

    1991-12-01

    wavelet transform to preprocess acoustic broadband signals in a system that discriminates between different classes of acoustic bursts. This is motivated by the similarity between the proportional bandwidth filters provided by the wavelet transform and those found in biological hearing systems. The experiment involves comparing statistical pattern classifier effects of wavelet and FFT preprocessed acoustic signals. The data used was from the DARPA Phase I database, which consists of artificially generated signals with real ocean background. The

  15. Context-based preprocessing of molecular docking data

    PubMed Central

    2013-01-01

    Background Data preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generation of comprehensive models from large datasets. In this article, we propose a context-based data preprocessing approach to mine data from molecular docking simulation results. The test cases used a fully-flexible receptor (FFR) model of Mycobacterium tuberculosis InhA enzyme (FFR_InhA) and four different ligands. Results We generated an initial set of attributes as well as their respective instances. To improve this initial set, we applied two selection strategies. The first was based on our context-based approach while the second used the CFS (Correlation-based Feature Selection) machine learning algorithm. Additionally, we produced an extra dataset containing features selected by combining our context strategy and the CFS algorithm. To demonstrate the effectiveness of the proposed method, we evaluated its performance based on various predictive (RMSE, MAE, Correlation, and Nodes) and context (Precision, Recall and FScore) measures. Conclusions Statistical analysis of the results shows that the proposed context-based data preprocessing approach significantly improves predictive and context measures and outperforms the CFS algorithm. Context-based data preprocessing improves mining results by producing superior interpretable models, which makes it well-suited for practical applications in molecular docking simulations using FFR models. PMID:24564276

  16. Rapid large-scale oligonucleotide selection for microarrays.

    PubMed

    Rahmann, Sven

    2002-01-01

    We present the first algorithm that selects oligonucleotide probes (e.g. 25-mers) for microarray experiments on a large scale. For example, oligos for human genes can be found within 50 hours. This becomes possible by using the longest common substring as a specificity measure for candidate oligos. We present an algorithm based on a suffix array with additional information that is efficient both in terms of memory usage and running time to rank all candidate oligos according to their specificity. We also introduce the concept of master sequences to describe the sequences from which oligos are to be selected. Constraints such as oligo length, melting temperature, and self-complementarity are incorporated in the master sequence at a preprocessing stage and thus kept separate from the main selection problem. As a result, custom oligos can now be designed for any sequenced genome, just as the technology for on-site chip synthesis is becoming increasingly mature.

  17. Optimal Preprocessing Of GPS Data

    NASA Technical Reports Server (NTRS)

    Wu, Sien-Chong; Melbourne, William G.

    1994-01-01

    Improved technique for preprocessing data from Global Positioning System (GPS) receiver reduces processing time and number of data to be stored. Technique optimal in sense it maintains strength of data. Also sometimes increases ability to resolve ambiguities in numbers of cycles of received GPS carrier signals.

  18. Optimal Preprocessing Of GPS Data

    NASA Technical Reports Server (NTRS)

    Wu, Sien-Chong; Melbourne, William G.

    1994-01-01

    Improved technique for preprocessing data from Global Positioning System receiver reduces processing time and number of data to be stored. Optimal in sense that it maintains strength of data. Also increases ability to resolve ambiguities in numbers of cycles of received GPS carrier signals.

  19. Arabic handwritten: pre-processing and segmentation

    NASA Astrophysics Data System (ADS)

    Maliki, Makki; Jassim, Sabah; Al-Jawad, Naseer; Sellahewa, Harin

    2012-06-01

    This paper is concerned with pre-processing and segmentation tasks that influence the performance of Optical Character Recognition (OCR) systems and handwritten/printed text recognition. In Arabic, these tasks are adversely effected by the fact that many words are made up of sub-words, with many sub-words there associated one or more diacritics that are not connected to the sub-word's body; there could be multiple instances of sub-words overlap. To overcome these problems we investigate and develop segmentation techniques that first segment a document into sub-words, link the diacritics with their sub-words, and removes possible overlapping between words and sub-words. We shall also investigate two approaches for pre-processing tasks to estimate sub-words baseline, and to determine parameters that yield appropriate slope correction, slant removal. We shall investigate the use of linear regression on sub-words pixels to determine their central x and y coordinates, as well as their high density part. We also develop a new incremental rotation procedure to be performed on sub-words that determines the best rotation angle needed to realign baselines. We shall demonstrate the benefits of these proposals by conducting extensive experiments on publicly available databases and in-house created databases. These algorithms help improve character segmentation accuracy by transforming handwritten Arabic text into a form that could benefit from analysis of printed text.

  20. Wavelet preprocessing of acoustic signals

    NASA Astrophysics Data System (ADS)

    Huang, W. Y.; Solorzano, M. R.

    1991-12-01

    This paper describes results using the wavelet transform to preprocess acoustic broadband signals in a system that discriminates between different classes of acoustic bursts. This is motivated by the similarity between the proportional bandwidth filters provided by the wavelet transform and those found in biological hearing systems. The experiment involves comparing statistical pattern classifier effects of wavelet and FFT preprocessed acoustic signals. The data used was from the DARPA Phase 1 database, which consists of artificially generated signals with real ocean background. The results show that the wavelet transform did provide improved performance when classifying in a frame-by-frame basis. The DARPA Phase 1 database is well matched to proportional bandwidth filtering; i.e., signal classes that contain high frequencies do tend to have shorter duration in this database. It is also noted that the decreasing background levels at high frequencies compensate for the poor match of the wavelet transform for long duration (high frequency) signals.

  1. Image preprocessing study on KPCA-based face recognition

    NASA Astrophysics Data System (ADS)

    Li, Xuan; Li, Dehua

    2015-12-01

    Face recognition as an important biometric identification method, with its friendly, natural, convenient advantages, has obtained more and more attention. This paper intends to research a face recognition system including face detection, feature extraction and face recognition, mainly through researching on related theory and the key technology of various preprocessing methods in face detection process, using KPCA method, focuses on the different recognition results in different preprocessing methods. In this paper, we choose YCbCr color space for skin segmentation and choose integral projection for face location. We use erosion and dilation of the opening and closing operation and illumination compensation method to preprocess face images, and then use the face recognition method based on kernel principal component analysis method for analysis and research, and the experiments were carried out using the typical face database. The algorithms experiment on MATLAB platform. Experimental results show that integration of the kernel method based on PCA algorithm under certain conditions make the extracted features represent the original image information better for using nonlinear feature extraction method, which can obtain higher recognition rate. In the image preprocessing stage, we found that images under various operations may appear different results, so as to obtain different recognition rate in recognition stage. At the same time, in the process of the kernel principal component analysis, the value of the power of the polynomial function can affect the recognition result.

  2. Weighted-SAMGSR: combining significance analysis of microarray-gene set reduction algorithm with pathway topology-based weights to select relevant genes.

    PubMed

    Tian, Suyan; Chang, Howard H; Wang, Chi

    2016-09-29

    It has been demonstrated that a pathway-based feature selection method that incorporates biological information within pathways during the process of feature selection usually outperforms a gene-based feature selection algorithm in terms of predictive accuracy and stability. Significance analysis of microarray-gene set reduction algorithm (SAMGSR), an extension to a gene set analysis method with further reduction of the selected pathways to their respective core subsets, can be regarded as a pathway-based feature selection method. In SAMGSR, whether a gene is selected is mainly determined by its expression difference between the phenotypes, and partially by the number of pathways to which this gene belongs. It ignores the topology information among pathways. In this study, we propose a weighted version of the SAMGSR algorithm by constructing weights based on the connectivity among genes and then combing these weights with the test statistics. Using both simulated and real-world data, we evaluate the performance of the proposed SAMGSR extension and demonstrate that the weighted version outperforms its original version. CONCLUSIONS: To conclude, the additional gene connectivity information does faciliatate feature selection. This article was reviewed by Drs. Limsoon Wong, Lev Klebanov, and, I. King Jordan.

  3. MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach

    PubMed Central

    Abduallah, Yasser; Byron, Kevin; Du, Zongxuan; Cervantes-Cervantes, Miguel

    2017-01-01

    Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool. PMID:28243601

  4. MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach.

    PubMed

    Abduallah, Yasser; Turki, Turki; Byron, Kevin; Du, Zongxuan; Cervantes-Cervantes, Miguel; Wang, Jason T L

    2017-01-01

    Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool.

  5. A comparative analytical assay of gene regulatory networks inferred using microarray and RNA-seq datasets

    PubMed Central

    Izadi, Fereshteh; Zarrini, Hamid Najafi; Kiani, Ghaffar; Jelodar, Nadali Babaeian

    2016-01-01

    A Gene Regulatory Network (GRN) is a collection of interactions between molecular regulators and their targets in cells governing gene expression level. Omics data explosion generated from high-throughput genomic assays such as microarray and RNA-Seq technologies and the emergence of a number of pre-processing methods demands suitable guidelines to determine the impact of transcript data platforms and normalization procedures on describing associations in GRNs. In this study exploiting publically available microarray and RNA-Seq datasets and a gold standard of transcriptional interactions in Arabidopsis, we performed a comparison between six GRNs derived by RNA-Seq and microarray data and different normalization procedures. As a result we observed that compared algorithms were highly data-specific and Networks reconstructed by RNA-Seq data revealed a considerable accuracy against corresponding networks captured by microarrays. Topological analysis showed that GRNs inferred from two platforms were similar in several of topological features although we observed more connectivity in RNA-Seq derived genes network. Taken together transcriptional regulatory networks obtained by Robust Multiarray Averaging (RMA) and Variance-Stabilizing Transformed (VST) normalized data demonstrated predicting higher rate of true edges over the rest of methods used in this comparison. PMID:28293077

  6. Analysis of High-Throughput ELISA Microarray Data

    SciTech Connect

    White, Amanda M.; Daly, Don S.; Zangar, Richard C.

    2011-02-23

    Our research group develops analytical methods and software for the high-throughput analysis of quantitative enzyme-linked immunosorbent assay (ELISA) microarrays. ELISA microarrays differ from DNA microarrays in several fundamental aspects and most algorithms for analysis of DNA microarray data are not applicable to ELISA microarrays. In this review, we provide an overview of the steps involved in ELISA microarray data analysis and how the statistically sound algorithms we have developed provide an integrated software suite to address the needs of each data-processing step. The algorithms discussed are available in a set of open-source software tools (http://www.pnl.gov/statistics/ProMAT).

  7. Adaptive fingerprint image enhancement with emphasis on preprocessing of data.

    PubMed

    Bartůnek, Josef Ström; Nilsson, Mikael; Sällberg, Benny; Claesson, Ingvar

    2013-02-01

    This article proposes several improvements to an adaptive fingerprint enhancement method that is based on contextual filtering. The term adaptive implies that parameters of the method are automatically adjusted based on the input fingerprint image. Five processing blocks comprise the adaptive fingerprint enhancement method, where four of these blocks are updated in our proposed system. Hence, the proposed overall system is novel. The four updated processing blocks are: 1) preprocessing; 2) global analysis; 3) local analysis; and 4) matched filtering. In the preprocessing and local analysis blocks, a nonlinear dynamic range adjustment method is used. In the global analysis and matched filtering blocks, different forms of order statistical filters are applied. These processing blocks yield an improved and new adaptive fingerprint image processing method. The performance of the updated processing blocks is presented in the evaluation part of this paper. The algorithm is evaluated toward the NIST developed NBIS software for fingerprint recognition on FVC databases.

  8. Research on pre-processing of QR Code

    NASA Astrophysics Data System (ADS)

    Sun, Haixing; Xia, Haojie; Dong, Ning

    2013-10-01

    QR code encodes many kinds of information because of its advantages: large storage capacity, high reliability, full arrange of utter-high-speed reading, small printing size and high-efficient representation of Chinese characters, etc. In order to obtain the clearer binarization image from complex background, and improve the recognition rate of QR code, this paper researches on pre-processing methods of QR code (Quick Response Code), and shows algorithms and results of image pre-processing for QR code recognition. Improve the conventional method by changing the Souvola's adaptive text recognition method. Additionally, introduce the QR code Extraction which adapts to different image size, flexible image correction approach, and improve the efficiency and accuracy of QR code image processing.

  9. Feature Detection Techniques for Preprocessing Proteomic Data

    PubMed Central

    Sellers, Kimberly F.; Miecznikowski, Jeffrey C.

    2010-01-01

    Numerous gel-based and nongel-based technologies are used to detect protein changes potentially associated with disease. The raw data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. Low-level analysis issues (including normalization, background correction, gel and/or spectral alignment, feature detection, and image registration) are substantial problems that need to be addressed, because any large-level data analyses are contingent on appropriate and statistically sound low-level procedures. Feature detection approaches are particularly interesting due to the increased computational speed associated with subsequent calculations. Such summary data corresponding to image features provide a significant reduction in overall data size and structure while retaining key information. In this paper, we focus on recent advances in feature detection as a tool for preprocessing proteomic data. This work highlights existing and newly developed feature detection algorithms for proteomic datasets, particularly relating to time-of-flight mass spectrometry, and two-dimensional gel electrophoresis. Note, however, that the associated data structures (i.e., spectral data, and images containing spots) used as input for these methods are obtained via all gel-based and nongel-based methods discussed in this manuscript, and thus the discussed methods are likewise applicable. PMID:20467457

  10. EMAAS: An extensible grid-based Rich Internet Application for microarray data analysis and management

    PubMed Central

    Barton, G; Abbott, J; Chiba, N; Huang, DW; Huang, Y; Krznaric, M; Mack-Smith, J; Saleem, A; Sherman, BT; Tiwari, B; Tomlinson, C; Aitman, T; Darlington, J; Game, L; Sternberg, MJE; Butcher, SA

    2008-01-01

    Background Microarray experimentation requires the application of complex analysis methods as well as the use of non-trivial computer technologies to manage the resultant large data sets. This, together with the proliferation of tools and techniques for microarray data analysis, makes it very challenging for a laboratory scientist to keep up-to-date with the latest developments in this field. Our aim was to develop a distributed e-support system for microarray data analysis and management. Results EMAAS (Extensible MicroArray Analysis System) is a multi-user rich internet application (RIA) providing simple, robust access to up-to-date resources for microarray data storage and analysis, combined with integrated tools to optimise real time user support and training. The system leverages the power of distributed computing to perform microarray analyses, and provides seamless access to resources located at various remote facilities. The EMAAS framework allows users to import microarray data from several sources to an underlying database, to pre-process, quality assess and analyse the data, to perform functional analyses, and to track data analysis steps, all through a single easy to use web portal. This interface offers distance support to users both in the form of video tutorials and via live screen feeds using the web conferencing tool EVO. A number of analysis packages, including R-Bioconductor and Affymetrix Power Tools have been integrated on the server side and are available programmatically through the Postgres-PLR library or on grid compute clusters. Integrated distributed resources include the functional annotation tool DAVID, GeneCards and the microarray data repositories GEO, CELSIUS and MiMiR. EMAAS currently supports analysis of Affymetrix 3' and Exon expression arrays, and the system is extensible to cater for other microarray and transcriptomic platforms. Conclusion EMAAS enables users to track and perform microarray data management and analysis tasks

  11. Evaluation of the efficiency of continuous wavelet transform as processing and preprocessing algorithm for resolution of overlapped signals in univariate and multivariate regression analyses; an application to ternary and quaternary mixtures

    NASA Astrophysics Data System (ADS)

    Hegazy, Maha A.; Lotfy, Hayam M.; Mowaka, Shereen; Mohamed, Ekram Hany

    2016-07-01

    Wavelets have been adapted for a vast number of signal-processing applications due to the amount of information that can be extracted from a signal. In this work, a comparative study on the efficiency of continuous wavelet transform (CWT) as a signal processing tool in univariate regression and a pre-processing tool in multivariate analysis using partial least square (CWT-PLS) was conducted. These were applied to complex spectral signals of ternary and quaternary mixtures. CWT-PLS method succeeded in the simultaneous determination of a quaternary mixture of drotaverine (DRO), caffeine (CAF), paracetamol (PAR) and p-aminophenol (PAP, the major impurity of paracetamol). While, the univariate CWT failed to simultaneously determine the quaternary mixture components and was able to determine only PAR and PAP, the ternary mixtures of DRO, CAF, and PAR and CAF, PAR, and PAP. During the calculations of CWT, different wavelet families were tested. The univariate CWT method was validated according to the ICH guidelines. While for the development of the CWT-PLS model a calibration set was prepared by means of an orthogonal experimental design and their absorption spectra were recorded and processed by CWT. The CWT-PLS model was constructed by regression between the wavelet coefficients and concentration matrices and validation was performed by both cross validation and external validation sets. Both methods were successfully applied for determination of the studied drugs in pharmaceutical formulations.

  12. Evaluation of the efficiency of continuous wavelet transform as processing and preprocessing algorithm for resolution of overlapped signals in univariate and multivariate regression analyses; an application to ternary and quaternary mixtures.

    PubMed

    Hegazy, Maha A; Lotfy, Hayam M; Mowaka, Shereen; Mohamed, Ekram Hany

    2016-07-05

    Wavelets have been adapted for a vast number of signal-processing applications due to the amount of information that can be extracted from a signal. In this work, a comparative study on the efficiency of continuous wavelet transform (CWT) as a signal processing tool in univariate regression and a pre-processing tool in multivariate analysis using partial least square (CWT-PLS) was conducted. These were applied to complex spectral signals of ternary and quaternary mixtures. CWT-PLS method succeeded in the simultaneous determination of a quaternary mixture of drotaverine (DRO), caffeine (CAF), paracetamol (PAR) and p-aminophenol (PAP, the major impurity of paracetamol). While, the univariate CWT failed to simultaneously determine the quaternary mixture components and was able to determine only PAR and PAP, the ternary mixtures of DRO, CAF, and PAR and CAF, PAR, and PAP. During the calculations of CWT, different wavelet families were tested. The univariate CWT method was validated according to the ICH guidelines. While for the development of the CWT-PLS model a calibration set was prepared by means of an orthogonal experimental design and their absorption spectra were recorded and processed by CWT. The CWT-PLS model was constructed by regression between the wavelet coefficients and concentration matrices and validation was performed by both cross validation and external validation sets. Both methods were successfully applied for determination of the studied drugs in pharmaceutical formulations.

  13. Celsius: a community resource for Affymetrix microarray data.

    PubMed

    Day, Allen; Carlson, Marc R J; Dong, Jun; O'Connor, Brian D; Nelson, Stanley F

    2007-01-01

    Celsius is a data warehousing system to aggregate Affymetrix CEL files and associated metadata. It provides mechanisms for importing, storing, querying, and exporting large volumes of primary and pre-processed microarray data. Celsius contains ten billion assay measurements and affiliated metadata. It is the largest publicly available source of Affymetrix microarray data, and through sheer volume it allows a sophisticated, broad view of transcription that has not previously been possible.

  14. Linear model for fast background subtraction in oligonucleotide microarrays

    PubMed Central

    2009-01-01

    Background One important preprocessing step in the analysis of microarray data is background subtraction. In high-density oligonucleotide arrays this is recognized as a crucial step for the global performance of the data analysis from raw intensities to expression values. Results We propose here an algorithm for background estimation based on a model in which the cost function is quadratic in a set of fitting parameters such that minimization can be performed through linear algebra. The model incorporates two effects: 1) Correlated intensities between neighboring features in the chip and 2) sequence-dependent affinities for non-specific hybridization fitted by an extended nearest-neighbor model. Conclusion The algorithm has been tested on 360 GeneChips from publicly available data of recent expression experiments. The algorithm is fast and accurate. Strong correlations between the fitted values for different experiments as well as between the free-energy parameters and their counterparts in aqueous solution indicate that the model captures a significant part of the underlying physical chemistry. PMID:19917117

  15. Chromosome Microarray.

    PubMed

    Anderson, Sharon

    2016-01-01

    Over the last half century, knowledge about genetics, genetic testing, and its complexity has flourished. Completion of the Human Genome Project provided a foundation upon which the accuracy of genetics, genomics, and integration of bioinformatics knowledge and testing has grown exponentially. What is lagging, however, are efforts to reach and engage nurses about this rapidly changing field. The purpose of this article is to familiarize nurses with several frequently ordered genetic tests including chromosomes and fluorescence in situ hybridization followed by a comprehensive review of chromosome microarray. It shares the complexity of microarray including how testing is performed and results analyzed. A case report demonstrates how this technology is applied in clinical practice and reveals benefits and limitations of this scientific and bioinformatics genetic technology. Clinical implications for maternal-child nurses across practice levels are discussed.

  16. PREPROCESSING MAGNETIC FIELDS WITH CHROMOSPHERIC LONGITUDINAL FIELDS

    SciTech Connect

    Yamamoto, Tetsuya T.; Kusano, K.

    2012-06-20

    Nonlinear force-free field (NLFFF) extrapolation is a powerful tool for the modeling of the magnetic field in the solar corona. However, since the photospheric magnetic field does not in general satisfy the force-free condition, some kind of processing is required to assimilate data into the model. In this paper, we report the results of new preprocessing for the NLFFF extrapolation. Through this preprocessing, we expect to obtain magnetic field data similar to those in the chromosphere. In our preprocessing, we add a new term concerning chromospheric longitudinal fields into the optimization function proposed by Wiegelmann et al. We perform a parameter survey of six free parameters to find minimum force- and torque-freeness with the simulated-annealing method. Analyzed data are a photospheric vector magnetogram of AR 10953 observed with the Hinode spectropolarimeter and a chromospheric longitudinal magnetogram observed with SOLIS spectropolarimeter. It is found that some preprocessed fields show the smallest force- and torque-freeness and are very similar to the chromospheric longitudinal fields. On the other hand, other preprocessed fields show noisy maps, although the force- and torque-freeness are of the same order. By analyzing preprocessed noisy maps in the wave number space, we found that small and large wave number components balance out on the force-free index. We also discuss our iteration limit of the simulated-annealing method and magnetic structure broadening in the chromosphere.

  17. Optimized LOWESS normalization parameter selection for DNA microarray data

    PubMed Central

    Berger, John A; Hautaniemi, Sampsa; Järvinen, Anna-Kaarina; Edgren, Henrik; Mitra, Sanjit K; Astola, Jaakko

    2004-01-01

    Background Microarray data normalization is an important step for obtaining data that are reliable and usable for subsequent analysis. One of the most commonly utilized normalization techniques is the locally weighted scatterplot smoothing (LOWESS) algorithm. However, a much overlooked concern with the LOWESS normalization strategy deals with choosing the appropriate parameters. Parameters are usually chosen arbitrarily, which may reduce the efficiency of the normalization and result in non-optimally normalized data. Thus, there is a need to explore LOWESS parameter selection in greater detail. Results and discussion In this work, we discuss how to choose parameters for the LOWESS method. Moreover, we present an optimization approach for obtaining the fraction of data points utilized in the local regression and analyze results for local print-tip normalization. The optimization procedure determines the bandwidth parameter for the local regression by minimizing a cost function that represents the mean-squared difference between the LOWESS estimates and the normalization reference level. We demonstrate the utility of the systematic parameter selection using two publicly available data sets. The first data set consists of three self versus self hybridizations, which allow for a quantitative study of the optimization method. The second data set contains a collection of DNA microarray data from a breast cancer study utilizing four breast cancer cell lines. Our results show that different parameter choices for the bandwidth window yield dramatically different calibration results in both studies. Conclusions Results derived from the self versus self experiment indicate that the proposed optimization approach is a plausible solution for estimating the LOWESS parameters, while results from the breast cancer experiment show that the optimization procedure is readily applicable to real-life microarray data normalization. In summary, the systematic approach to obtain critical

  18. Diagnostic challenges for multiplexed protein microarrays.

    PubMed

    Master, Stephen R; Bierl, Charlene; Kricka, Larry J

    2006-11-01

    Multiplexed protein analysis using planar microarrays or microbeads is growing in popularity for simultaneous assays of antibodies, cytokines, allergens, drugs and hormones. However, this new assay format presents several new operational issues for the clinical laboratory, such as the quality control of protein-microarray-based assays, the release of unrequested test data and the use of diagnostic algorithms to transform microarray data into diagnostic results.

  19. Automated Microarray Image Analysis Toolbox for MATLAB

    SciTech Connect

    White, Amanda M.; Daly, Don S.; Willse, Alan R.; Protic, Miroslava; Chandler, Darrell P.

    2005-09-01

    The Automated Microarray Image Analysis (AMIA) Toolbox for MATLAB is a flexible, open-source microarray image analysis tool that allows the user to customize analysis of sets of microarray images. This tool provides several methods of identifying and quantify spot statistics, as well as extensive diagnostic statistics and images to identify poor data quality or processing. The open nature of this software allows researchers to understand the algorithms used to provide intensity estimates and to modify them easily if desired.

  20. Metric learning for DNA microarray data analysis

    NASA Astrophysics Data System (ADS)

    Takeuchi, Ichiro; Nakagawa, Masao; Seto, Masao

    2009-12-01

    In many microarray studies, gene set selection is an important preliminary step for subsequent main task such as tumor classification, cancer subtype identification, etc. In this paper, we investigate the possibility of using metric learning as an alternative to gene set selection. We develop a simple metric learning algorithm aiming to use it for microarray data analysis. Exploiting a property of the algorithm, we introduce a novel approach for extending the metric learning to be adaptive. We apply the algorithm to previously studied microarray data on malignant lymphoma subtype identification.

  1. A survival guide to Landsat preprocessing.

    PubMed

    Young, Nicholas E; Anderson, Ryan S; Chignell, Stephen M; Vorster, Anthony G; Lawrence, Rick; Evangelista, Paul H

    2017-04-01

    Landsat data are increasingly used for ecological monitoring and research. These data often require preprocessing prior to analysis to account for sensor, solar, atmospheric, and topographic effects. However, ecologists using these data are faced with a literature containing inconsistent terminology, outdated methods, and a vast number of approaches with contradictory recommendations. These issues can, at best, make determining the correct preprocessing workflow a difficult and time-consuming task and, at worst, lead to erroneous results. We address these problems by providing a concise overview of the Landsat missions and sensors and by clarifying frequently conflated terms and methods. Preprocessing steps commonly applied to Landsat data are differentiated and explained, including georeferencing and co-registration, conversion to radiance, solar correction, atmospheric correction, topographic correction, and relative correction. We then synthesize this information by presenting workflows and a decision tree for determining the appropriate level of imagery preprocessing given an ecological research question, while emphasizing the need to tailor each workflow to the study site and question at hand. We recommend a parsimonious approach to Landsat preprocessing that avoids unnecessary steps and recommend approaches and data products that are well tested, easily available, and sufficiently documented. Our focus is specific to ecological applications of Landsat data, yet many of the concepts and recommendations discussed are also appropriate for other disciplines and remote sensing platforms. © 2017 The Authors. Ecology, published by Wiley Periodicals, Inc., on behalf of the Ecological Society of America.

  2. An Automated, Adaptive Framework for Optimizing Preprocessing Pipelines in Task-Based Functional MRI.

    PubMed

    Churchill, Nathan W; Spring, Robyn; Afshin-Pour, Babak; Dong, Fan; Strother, Stephen C

    2015-01-01

    BOLD fMRI is sensitive to blood-oxygenation changes correlated with brain function; however, it is limited by relatively weak signal and significant noise confounds. Many preprocessing algorithms have been developed to control noise and improve signal detection in fMRI. Although the chosen set of preprocessing and analysis steps (the "pipeline") significantly affects signal detection, pipelines are rarely quantitatively validated in the neuroimaging literature, due to complex preprocessing interactions. This paper outlines and validates an adaptive resampling framework for evaluating and optimizing preprocessing choices by optimizing data-driven metrics of task prediction and spatial reproducibility. Compared to standard "fixed" preprocessing pipelines, this optimization approach significantly improves independent validation measures of within-subject test-retest, and between-subject activation overlap, and behavioural prediction accuracy. We demonstrate that preprocessing choices function as implicit model regularizers, and that improvements due to pipeline optimization generalize across a range of simple to complex experimental tasks and analysis models. Results are shown for brief scanning sessions (<3 minutes each), demonstrating that with pipeline optimization, it is possible to obtain reliable results and brain-behaviour correlations in relatively small datasets.

  3. An Automated, Adaptive Framework for Optimizing Preprocessing Pipelines in Task-Based Functional MRI

    PubMed Central

    Churchill, Nathan W.; Spring, Robyn; Afshin-Pour, Babak; Dong, Fan; Strother, Stephen C.

    2015-01-01

    BOLD fMRI is sensitive to blood-oxygenation changes correlated with brain function; however, it is limited by relatively weak signal and significant noise confounds. Many preprocessing algorithms have been developed to control noise and improve signal detection in fMRI. Although the chosen set of preprocessing and analysis steps (the “pipeline”) significantly affects signal detection, pipelines are rarely quantitatively validated in the neuroimaging literature, due to complex preprocessing interactions. This paper outlines and validates an adaptive resampling framework for evaluating and optimizing preprocessing choices by optimizing data-driven metrics of task prediction and spatial reproducibility. Compared to standard “fixed” preprocessing pipelines, this optimization approach significantly improves independent validation measures of within-subject test-retest, and between-subject activation overlap, and behavioural prediction accuracy. We demonstrate that preprocessing choices function as implicit model regularizers, and that improvements due to pipeline optimization generalize across a range of simple to complex experimental tasks and analysis models. Results are shown for brief scanning sessions (<3 minutes each), demonstrating that with pipeline optimization, it is possible to obtain reliable results and brain-behaviour correlations in relatively small datasets. PMID:26161667

  4. hemaClass.org: Online One-By-One Microarray Normalization and Classification of Hematological Cancers for Precision Medicine

    PubMed Central

    Falgreen, Steffen; Ellern Bilgrau, Anders; Brøndum, Rasmus Froberg; Hjort Jakobsen, Lasse; Have, Jonas; Lindblad Nielsen, Kasper; El-Galaly, Tarec Christoffer; Bødker, Julie Støve; Schmitz, Alexander; H. Young, Ken; Johnsen, Hans Erik; Dybkær, Karen; Bøgsted, Martin

    2016-01-01

    Background Dozens of omics based cancer classification systems have been introduced with prognostic, diagnostic, and predictive capabilities. However, they often employ complex algorithms and are only applicable on whole cohorts of patients, making them difficult to apply in a personalized clinical setting. Results This prompted us to create hemaClass.org, an online web application providing an easy interface to one-by-one RMA normalization of microarrays and subsequent risk classifications of diffuse large B-cell lymphoma (DLBCL) into cell-of-origin and chemotherapeutic sensitivity classes. Classification results for one-by-one array pre-processing with and without a laboratory specific RMA reference dataset were compared to cohort based classifiers in 4 publicly available datasets. Classifications showed high agreement between one-by-one and whole cohort pre-processsed data when a laboratory specific reference set was supplied. The website is essentially the R-package hemaClass accompanied by a Shiny web application. The well-documented package can be used to run the website locally or to use the developed methods programmatically. Conclusions The website and R-package is relevant for biological and clinical lymphoma researchers using affymetrix U-133 Plus 2 arrays, as it provides reliable and swift methods for calculation of disease subclasses. The proposed one-by-one pre-processing method is relevant for all researchers using microarrays. PMID:27701436

  5. GEPAS: a web-based resource for microarray gene expression data analysis

    PubMed Central

    Herrero, Javier; Al-Shahrour, Fátima; Díaz-Uriarte, Ramón; Mateos, Álvaro; Vaquerizas, Juan M.; Santoyo, Javier; Dopazo, Joaquín

    2003-01-01

    We present a web-based pipeline for microarray gene expression profile analysis, GEPAS, which stands for Gene Expression Profile Analysis Suite (http://gepas.bioinfo.cnio.es). GEPAS is composed of different interconnected modules which include tools for data pre-processing, two-conditions comparison, unsupervised and supervised clustering (which include some of the most popular methods as well as home made algorithms) and several tests for differential gene expression among different classes, continuous variables or survival analysis. A multiple purpose tool for data mining, based on Gene Ontology, is also linked to the tools, which constitutes a very convenient way of analysing clustering results. On-line tutorials are available from our main web server (http://bioinfo.cnio.es). PMID:12824345

  6. Efficient Preprocessing technique using Web log mining

    NASA Astrophysics Data System (ADS)

    Raiyani, Sheetal A.; jain, Shailendra

    2012-11-01

    Web Usage Mining can be described as the discovery and Analysis of user access pattern through mining of log files and associated data from a particular websites. No. of visitors interact daily with web sites around the world. enormous amount of data are being generated and these information could be very prize to the company in the field of accepting Customerís behaviors. In this paper a complete preprocessing style having data cleaning, user and session Identification activities to improve the quality of data. Efficient preprocessing technique one of the User Identification which is key issue in preprocessing technique phase is to identify the Unique web users. Traditional User Identification is based on the site structure, being supported by using some heuristic rules, for use of this reduced the efficiency of user identification solve this difficulty we introduced proposed Technique DUI (Distinct User Identification) based on IP address ,Agent and Session time ,Referred pages on desired session time. Which can be used in counter terrorism, fraud detection and detection of unusual access of secure data, as well as through detection of regular access behavior of users improve the overall designing and performance of upcoming access of preprocessing results.

  7. Preprocessing Moist Lignocellulosic Biomass for Biorefinery Feedstocks

    SciTech Connect

    Neal Yancey; Christopher T. Wright; Craig Conner; J. Richard Hess

    2009-06-01

    Biomass preprocessing is one of the primary operations in the feedstock assembly system of a lignocellulosic biorefinery. Preprocessing is generally accomplished using industrial grinders to format biomass materials into a suitable biorefinery feedstock for conversion to ethanol and other bioproducts. Many factors affect machine efficiency and the physical characteristics of preprocessed biomass. For example, moisture content of the biomass as received from the point of production has a significant impact on overall system efficiency and can significantly affect the characteristics (particle size distribution, flowability, storability, etc.) of the size-reduced biomass. Many different grinder configurations are available on the market, each with advantages under specific conditions. Ultimately, the capacity and/or efficiency of the grinding process can be enhanced by selecting the grinder configuration that optimizes grinder performance based on moisture content and screen size. This paper discusses the relationships of biomass moisture with respect to preprocessing system performance and product physical characteristics and compares data obtained on corn stover, switchgrass, and wheat straw as model feedstocks during Vermeer HG 200 grinder testing. During the tests, grinder screen configuration and biomass moisture content were varied and tested to provide a better understanding of their relative impact on machine performance and the resulting feedstock physical characteristics and uniformity relative to each crop tested.

  8. Computational biology of genome expression and regulation--a review of microarray bioinformatics.

    PubMed

    Wang, Junbai

    2008-01-01

    Microarray technology is being used widely in various biomedical research areas; the corresponding microarray data analysis is an essential step toward the best utilizing of array technologies. Here we review two components of the microarray data analysis: a low level of microarray data analysis that emphasizes the designing, the quality control, and the preprocessing of microarray experiments, then a high level of microarray data analysis that focuses on the domain-specific microarray applications such as tumor classification, biomarker prediction, analyzing array CGH experiments, and reverse engineering of gene expression networks. Additionally, we will review the recent development of building a predictive model in genome expression and regulation studies. This review may help biologists grasp a basic knowledge of microarray bioinformatics as well as its potential impact on the future evolvement of biomedical research fields.

  9. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    PubMed

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  10. The Effects of Pre-processing Strategies for Pediatric Cochlear Implant Recipients

    PubMed Central

    Rakszawski, Bernadette; Wright, Rose; Cadieux, Jamie H.; Davidson, Lisa S.; Brenner, Christine

    2016-01-01

    Background Cochlear implants (CIs) have been shown to improve children’s speech recognition over traditional amplification when severe to profound sensorineural hearing loss is present. Despite improvements, understanding speech at low-level intensities or in the presence of background noise remains difficult. In an effort to improve speech understanding in challenging environments, Cochlear Ltd. offers pre-processing strategies that apply various algorithms prior to mapping the signal to the internal array. Two of these strategies include Autosensitivity Control™ (ASC) and Adaptive Dynamic Range Optimization (ADRO®). Based on previous research, the manufacturer’s default pre-processing strategy for pediatrics’ everyday programs combines ASC+ADRO®. Purpose The purpose of this study is to compare pediatric speech perception performance across various pre-processing strategies while applying a specific programming protocol utilizing increased threshold (T) levels to ensure access to very low-level sounds. Research Design This was a prospective, cross-sectional, observational study. Participants completed speech perception tasks in four pre-processing conditions: no pre-processing, ADRO®, ASC, ASC+ADRO®. Study Sample Eleven pediatric Cochlear Ltd. cochlear implant users were recruited: six bilateral, one unilateral, and four bimodal. Intervention Four programs, with the participants’ everyday map, were loaded into the processor with different pre-processing strategies applied in each of the four positions: no pre-processing, ADRO®, ASC, and ASC+ADRO®. Data Collection and Analysis Participants repeated CNC words presented at 50 and 70 dB SPL in quiet and HINT sentences presented adaptively with competing R-Space noise at 60 and 70 dB SPL. Each measure was completed as participants listened with each of the four pre-processing strategies listed above. Test order and condition were randomized. A repeated-measures analysis of variance (ANOVA) was used to

  11. Image preprocessing for vehicle panoramic system

    NASA Astrophysics Data System (ADS)

    Wang, Ting; Chen, Liguo

    2016-10-01

    Due to the problem that distortion exist for panoramic image stitching in terms of vehicle panoramic system, the research on image preprocessing for panoramic system was carried out. Firstly, the principal of vehicle panoramic vision was analyzed and the fundamental role that image preprocessing procedure plays in panoramic vision is found out. And then, the camera was calibrated with Hyperboloid model and the correction for distorted image is realized. Oriented by panoramic system, the effect of mounting position was taken into consideration for image correction and the suggested angle of camera installation for different mounting position is given. Through analyzing the existing problem of bird-eye image, a method of transforming twice with calibration plates is proposed. Experiment results indicate that the proposed method can weaken the existing problem of bird-eye image effectively and it can contribute to reduce the distortion in terms of image stitching for panoramic system.

  12. Experimental variability and data pre-processing as factors affecting the discrimination power of some chemometric approaches (PCA, CA and a new algorithm based on linear regression) applied to (+/-)ESI/MS and RPLC/UV data: Application on green tea extracts.

    PubMed

    Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A

    2016-08-01

    The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data

  13. Pre-processing Tasks in Indonesian Twitter Messages

    NASA Astrophysics Data System (ADS)

    Hidayatullah, A. F.; Ma’arif, M. R.

    2017-01-01

    Twitter text messages are very noisy. Moreover, tweet data are unstructured and complicated enough. The focus of this work is to investigate pre-processing technique for Twitter messages in Bahasa Indonesia. The main goal of this experiment is to clean the tweet data for further analysis. Thus, the objectives of this pre-processing task is simply removing all meaningless character and left valuable words. In this research, we divide our proposed pre-processing experiments into two parts. The first part is common pre-processing task. The second part is a specific pre-processing task for tweet data. From the experimental result we can conclude that by employing a specific pre-processing task related to tweet data characteristic we obtained more valuable result. The result obtained is better in terms of less meaningful word occurrence which is not significant in number comparing to the result obtained by just running common pre-processing tasks.

  14. Groundtruth approach to accurate quantitation of fluorescence microarrays

    SciTech Connect

    Mascio-Kegelmeyer, L; Tomascik-Cheeseman, L; Burnett, M S; van Hummelen, P; Wyrobek, A J

    2000-12-01

    To more accurately measure fluorescent signals from microarrays, we calibrated our acquisition and analysis systems by using groundtruth samples comprised of known quantities of red and green gene-specific DNA probes hybridized to cDNA targets. We imaged the slides with a full-field, white light CCD imager and analyzed them with our custom analysis software. Here we compare, for multiple genes, results obtained with and without preprocessing (alignment, color crosstalk compensation, dark field subtraction, and integration time). We also evaluate the accuracy of various image processing and analysis techniques (background subtraction, segmentation, quantitation and normalization). This methodology calibrates and validates our system for accurate quantitative measurement of microarrays. Specifically, we show that preprocessing the images produces results significantly closer to the known ground-truth for these samples.

  15. Acquisition and preprocessing of LANDSAT data

    NASA Technical Reports Server (NTRS)

    Horn, T. N.; Brown, L. E.; Anonsen, W. H. (Principal Investigator)

    1979-01-01

    The original configuration of the GSFC data acquisition, preprocessing, and transmission subsystem, designed to provide LANDSAT data inputs to the LACIE system at JSC, is described. Enhancements made to support LANDSAT -2, and modifications for LANDSAT -3 are discussed. Registration performance throughout the 3 year period of LACIE operations satisfied the 1 pixel root-mean-square requirements established in 1974, with more than two of every three attempts at data registration proving successful, notwithstanding cosmetic faults or content inadequacies to which the process is inherently susceptible. The cloud/snow rejection rate experienced throughout the last 3 years has approached 50%, as expected in most LANDSAT data use situations.

  16. Real-time preprocessing of holographic information

    NASA Astrophysics Data System (ADS)

    Schilling, Bradley W.; Poon, Ting-Chung

    1995-11-01

    Optical scanning holography (OSH) is a holographic recording technique that uses active optical heterodyne scanning to generate holographic information pertaining to an object. The holographic information manifests itself as an electrical signal suitable for real-time image reconstruction using a spatial light modulator. The electrical signal that carries the holographic information can also be digitized for computer storage and processing, allowing the image reconstruction to be performed numerically. In previous experiments with this technique, holographic information has been recorded using the interference pattern of a plane wave and a spherical wave of different temporal frequencies to scan an object. However, the proper manipulation of the pupil functions in the recording stage can result in real-time processing of the holographic edge extraction technique as an important example of real-time preprocessing of holographic information that utilizes alternate pupils in the OSH recording stage. We investigate the theory of holographic preprocessing using a spatial frequency-domain analysis based on the recording system's optical transfer function. The theory is reinforced through computer simulation.

  17. Preprocessing and compression of Hyperspectral images captured onboard UAVs

    NASA Astrophysics Data System (ADS)

    Herrero, Rolando; Cadirola, Martin; Ingle, Vinay K.

    2015-10-01

    Advancements in image sensors and signal processing have led to the successful development of lightweight hyperspectral imaging systems that are critical to the deployment of Photometry and Remote Sensing (PaRS) capabilities in unmanned aerial vehicles (UAVs). In general, hyperspectral data cubes include a few dozens of spectral bands that are extremely useful for remote sensing applications that range from detection of land vegetation to monitoring of atmospheric products derived from the processing of lower level radiance images. Because these data cubes are captured in the challenging environment of UAVs, where resources are limited, source encoding by means of compression is a fundamental mechanism that considerably improves the overall system performance and reliability. In this paper, we focus on the hyperspectral images captured by a state-of-the-art commercial hyperspectral camera by showing the results of applying ultraspectral data compression to the obtained data set. Specifically the compression scheme that we introduce integrates two stages; (1) preprocessing and (2) compression itself. The outcomes of this procedure are linear prediction coefficients and an error signal that, when encoded, results in a compressed version of the original image. Second, preprocessing and compression algorithms are optimized and have their time complexity analyzed to guarantee their successful deployment using low power ARM based embedded processors in the context of UAVs. Lastly, we compare the proposed architecture against other well known schemes and show how the compression scheme presented in this paper outperforms all of them by providing substantial improvement and delivering both lower compression rates and lower distortion.

  18. Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines.

    PubMed

    del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J; Raboso, Mariano

    2015-06-17

    Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation-based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking-to reduce the dimensions of images-and binarization-to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements.

  19. Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines

    PubMed Central

    del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J.; Raboso, Mariano

    2015-01-01

    Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation—based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking—to reduce the dimensions of images—and binarization—to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements. PMID:26091392

  20. Pre-Processing Effect on the Accuracy of Event-Based Activity Segmentation and Classification through Inertial Sensors.

    PubMed

    Fida, Benish; Bernabucci, Ivan; Bibbo, Daniele; Conforto, Silvia; Schmid, Maurizio

    2015-09-11

    Inertial sensors are increasingly being used to recognize and classify physical activities in a variety of applications. For monitoring and fitness applications, it is crucial to develop methods able to segment each activity cycle, e.g., a gait cycle, so that the successive classification step may be more accurate. To increase detection accuracy, pre-processing is often used, with a concurrent increase in computational cost. In this paper, the effect of pre-processing operations on the detection and classification of locomotion activities was investigated, to check whether the presence of pre-processing significantly contributes to an increase in accuracy. The pre-processing stages evaluated in this study were inclination correction and de-noising. Level walking, step ascending, descending and running were monitored by using a shank-mounted inertial sensor. Raw and filtered segments, obtained from a modified version of a rule-based gait detection algorithm optimized for sequential processing, were processed to extract time and frequency-based features for physical activity classification through a support vector machine classifier. The proposed method accurately detected >99% gait cycles from raw data and produced >98% accuracy on these segmented gait cycles. Pre-processing did not substantially increase classification accuracy, thus highlighting the possibility of reducing the amount of pre-processing for real-time applications.

  1. An effective preprocessing method for finger vein recognition

    NASA Astrophysics Data System (ADS)

    Peng, JiaLiang; Li, Qiong; Wang, Ning; Abd El-Latif, Ahmed A.; Niu, Xiamu

    2013-07-01

    The image preprocessing plays an important role in finger vein recognition system. However, previous preprocessing schemes remind weakness to be resolved for the high finger vein recongtion performance. In this paper, we propose a new finger vein preprocessing that includes finger region localization, alignment, finger vein ROI segmentation and enhancement. The experimental results show that the proposed scheme is capable of enhancing the quality of finger vein image effectively and reliably.

  2. User microprogrammable processors for high data rate telemetry preprocessing

    NASA Technical Reports Server (NTRS)

    Pugsley, J. H.; Ogrady, E. P.

    1973-01-01

    The use of microprogrammable processors for the preprocessing of high data rate satellite telemetry is investigated. The following topics are discussed along with supporting studies: (1) evaluation of commercial microprogrammable minicomputers for telemetry preprocessing tasks; (2) microinstruction sets for telemetry preprocessing; and (3) the use of multiple minicomputers to achieve high data processing. The simulation of small microprogrammed processors is discussed along with examples of microprogrammed processors.

  3. Layered recognition networks that pre-process, classify, and describe

    NASA Technical Reports Server (NTRS)

    Uhr, L.

    1971-01-01

    A brief overview is presented of six types of pattern recognition programs that: (1) preprocess, then characterize; (2) preprocess and characterize together; (3) preprocess and characterize into a recognition cone; (4) describe as well as name; (5) compose interrelated descriptions; and (6) converse. A computer program (of types 3 through 6) is presented that transforms and characterizes the input scene through the successive layers of a recognition cone, and then engages in a stylized conversation to describe the scene.

  4. Effect of two different preprocessing steps in detection of optic nerve head in fundus images

    NASA Astrophysics Data System (ADS)

    Tavakoli, Meysam; Nazar, Mahdieh; Mehdizadeh, Alireza

    2017-03-01

    Identification of optic nerve head (ONH) is necessary in retinal image analysis to locate anatomical components such as fovea and retinal vessels in fundus images. In this study, we first worked on two different methods for preprocessing of images after that our main method was proposed for ONH detection in color fundus images. In the first preprocessing method, we did color space conversion, illumination equalization, and contrast enhancement and separately in the second method we applied top-hat transformation to an image. In the next step, Radon transform is applied to each of these two preprocessed fundus image to find candidates for the location of the ONH. Then, the accurate location was found using the minimum mean square error estimation. The accuracy of this method was approved by the results. Our method detected ONH correctly in 110 out of 120 images in our local database and 38 out of 40 color images in the DRIVE database by using Illumination equalization and contrast enhancement preprocessing. Moreover, by use of top-hat transformation our approach correctly detected the ONHs in 106 out of 120 images in the local database and 36 out of 40 images in the DRIVE set. In addition, Sensitivity and specificity of pixel base analysis of this algorithm seems to be acceptable in comparison with other methods.

  5. Preprocessing and Analysis of Digitized ECGs

    NASA Astrophysics Data System (ADS)

    Villalpando, L. E. Piña; Kurmyshev, E.; Ramírez, S. Luna; Leal, L. Delgado

    2008-08-01

    In this work we propose a methodology and programs in MatlabTM that perform the preprocessing and analysis of the derivative D1 of ECGs. The program makes the correction to isoelectric line for each beat, calculates the average cardiac frequency and its standard deviation, generates a file of amplitude of P, Q and T waves, as well as the segments and intervals important of each beat. Software makes the normalization of beats to a standard rate of 80 beats per minute, the superposition of beats is done centering R waves, before and after normalizing the amplitude of each beat. The data and graphics provide relevant information to the doctor for diagnosis. In addition, some results are displayed similar to those presented by a Holter recording.

  6. Comparison of planar images and SPECT with bayesean preprocessing for the demonstration of facial anatomy and craniomandibular disorders

    SciTech Connect

    Kircos, L.T.; Ortendahl, D.A.; Hattner, R.S.; Faulkner, D.; Taylor, R.L.

    1984-01-01

    Craniomandiublar disorders involving the facial anatomy may be difficult to demonstrate in planar images. Although bone scanning is generally more sensitive than radiography, facial bone anatomy is complex and focal areas of increased or decreased radiotracer may become obscured by overlapping structures in planar images. Thus SPECT appears ideally suited to examination of the facial skeleton. A series of patients with craniomandibular disorders of unknown origin were imaged using 20 mCi Tc-99m MDP. Planar and SPECT (Siemens 7500 ZLC Orbiter) images were obtained four hours after injection. The SPECT images were reconstructed with a filtered back-projection algorithm. In order to improve image contrast and resolution in SPECT images, the rotation views were pre-processed with a Bayesean deblurring algorithm which has previously been show to offer improved contrast and resolution in planar images. SPECT images using the pre-processed rotation views were obtained and compared to the SPECT images without pre-processing and the planar images. TMJ arthropathy involving either the glenoid fossa or the mandibular condyle, orthopedic changes involving the mandible or maxilla, localized dental pathosis, as well as changes in structures peripheral to the facial skeleton were identified. Bayesean pre-processed SPECT depicted the facial skeleton more clearly as well as providing a more obvious demonstration of the bony changes associated with craniomandibular disorders than either planar images or SPECT without pre-processing.

  7. A preprocessing tool for removing artifact from cardiac RR interval recordings using three-dimensional spatial distribution mapping.

    PubMed

    Stapelberg, Nicolas J C; Neumann, David L; Shum, David H K; McConnell, Harry; Hamilton-Craig, Ian

    2016-04-01

    Artifact is common in cardiac RR interval data that is recorded for heart rate variability (HRV) analysis. A novel algorithm for artifact detection and interpolation in RR interval data is described. It is based on spatial distribution mapping of RR interval magnitude and relationships to adjacent values in three dimensions. The characteristics of normal physiological RR intervals and artifact intervals were established using 24-h recordings from 20 technician-assessed human cardiac recordings. The algorithm was incorporated into a preprocessing tool and validated using 30 artificial RR (ARR) interval data files, to which known quantities of artifact (0.5%, 1%, 2%, 3%, 5%, 7%, 10%) were added. The impact of preprocessing ARR files with 1% added artifact was also assessed using 10 time domain and frequency domain HRV metrics. The preprocessing tool was also used to preprocess 69 24-h human cardiac recordings. The tool was able to remove artifact from technician-assessed human cardiac recordings (sensitivity 0.84, SD = 0.09, specificity of 1.00, SD = 0.01) and artificial data files. The removal of artifact had a low impact on time domain and frequency domain HRV metrics (ranging from 0% to 2.5% change in values). This novel preprocessing tool can be used with human 24-h cardiac recordings to remove artifact while minimally affecting physiological data and therefore having a low impact on HRV measures of that data.

  8. Measurement data preprocessing in a radar-based system for monitoring of human movements

    NASA Astrophysics Data System (ADS)

    Morawski, Roman Z.; Miȩkina, Andrzej; Bajurko, Paweł R.

    2015-02-01

    The importance of research on new technologies that could be employed in care services for elderly people is highlighted. The need to examine the applicability of various sensor systems for non-invasive monitoring of the movements and vital bodily functions, such as heart beat or breathing rhythm, of elderly persons in their home environment is justified. An extensive overview of the literature concerning existing monitoring techniques is provided. A technological potential behind radar sensors is indicated. A new class of algorithms for preprocessing of measurement data from impulse radar sensors, when applied for elderly people monitoring, is proposed. Preliminary results of numerical experiments performed on those algorithms are demonstrated.

  9. Data preprocessing methods of FT-NIR spectral data for the classification cooking oil

    NASA Astrophysics Data System (ADS)

    Ruah, Mas Ezatul Nadia Mohd; Rasaruddin, Nor Fazila; Fong, Sim Siong; Jaafar, Mohd Zuli

    2014-12-01

    This recent work describes the data pre-processing method of FT-NIR spectroscopy datasets of cooking oil and its quality parameters with chemometrics method. Pre-processing of near-infrared (NIR) spectral data has become an integral part of chemometrics modelling. Hence, this work is dedicated to investigate the utility and effectiveness of pre-processing algorithms namely row scaling, column scaling and single scaling process with Standard Normal Variate (SNV). The combinations of these scaling methods have impact on exploratory analysis and classification via Principle Component Analysis plot (PCA). The samples were divided into palm oil and non-palm cooking oil. The classification model was build using FT-NIR cooking oil spectra datasets in absorbance mode at the range of 4000cm-1-14000cm-1. Savitzky Golay derivative was applied before developing the classification model. Then, the data was separated into two sets which were training set and test set by using Duplex method. The number of each class was kept equal to 2/3 of the class that has the minimum number of sample. Then, the sample was employed t-statistic as variable selection method in order to select which variable is significant towards the classification models. The evaluation of data pre-processing were looking at value of modified silhouette width (mSW), PCA and also Percentage Correctly Classified (%CC). The results show that different data processing strategies resulting to substantial amount of model performances quality. The effects of several data pre-processing i.e. row scaling, column standardisation and single scaling process with Standard Normal Variate indicated by mSW and %CC. At two PCs model, all five classifier gave high %CC except Quadratic Distance Analysis.

  10. Statistical approaches for the analysis of DNA methylation microarray data.

    PubMed

    Siegmund, Kimberly D

    2011-06-01

    Following the rapid development and adoption in DNA methylation microarray assays, we are now experiencing a growth in the number of statistical tools to analyze the resulting large-scale data sets. As is the case for other microarray applications, biases caused by technical issues are of concern. Some of these issues are old (e.g., two-color dye bias and probe- and array-specific effects), while others are new (e.g., fragment length bias and bisulfite conversion efficiency). Here, I highlight characteristics of DNA methylation that suggest standard statistical tools developed for other data types may not be directly suitable. I then describe the microarray technologies most commonly in use, along with the methods used for preprocessing and obtaining a summary measure. I finish with a section describing downstream analyses of the data, focusing on methods that model percentage DNA methylation as the outcome, and methods for integrating DNA methylation with gene expression or genotype data.

  11. Functionally-focused algorithmic analysis of high resolution microarray-CGH genomic landscapes demonstrates comparable genomic copy number aberrations in MSI and MSS sporadic colorectal cancer

    PubMed Central

    Ali, Hamad; Bitar, Milad S.; Al Madhoun, Ashraf; Marafie, Makia; Al-Mulla, Fahd

    2017-01-01

    Array-based comparative genomic hybridization (aCGH) emerged as a powerful technology for studying copy number variations at higher resolution in many cancers including colorectal cancer. However, the lack of standardized systematic protocols including bioinformatic algorithms to obtain and analyze genomic data resulted in significant variation in the reported copy number aberration (CNA) data. Here, we present genomic aCGH data obtained using highly stringent and functionally relevant statistical algorithms from 116 well-defined microsatellites instable (MSI) and microsatellite stable (MSS) colorectal cancers. We utilized aCGH to characterize genomic CNAs in 116 well-defined sets of colorectal cancer (CRC) cases. We further applied the significance testing for aberrant copy number (STAC) and Genomic Identification of Significant Targets in Cancer (GISTIC) algorithms to identify functionally relevant (nonrandom) chromosomal aberrations in the analyzed colorectal cancer samples. Our results produced high resolution genomic landscapes of both, MSI and MSS sporadic CRC. We found that CNAs in MSI and MSS CRCs are heterogeneous in nature but may be divided into 3 distinct genomic patterns. Moreover, we show that although CNAs in MSI and MSS CRCs differ with respect to their size, number and chromosomal distribution, the functional copy number aberrations obtained from MSI and MSS CRCs were in fact comparable but not identical. These unifying CNAs were verified by MLPA tumor-loss gene panel, which spans 15 different chromosomal locations and contains 50 probes for at least 20 tumor suppressor genes. Consistently, deletion/amplification in these frequently cancer altered genes were identical in MSS and MSI CRCs. Our results suggest that MSI and MSS copy number aberrations driving CRC may be functionally comparable. PMID:28231327

  12. Functionally-focused algorithmic analysis of high resolution microarray-CGH genomic landscapes demonstrates comparable genomic copy number aberrations in MSI and MSS sporadic colorectal cancer.

    PubMed

    Ali, Hamad; Bitar, Milad S; Al Madhoun, Ashraf; Marafie, Makia; Al-Mulla, Fahd

    2017-01-01

    Array-based comparative genomic hybridization (aCGH) emerged as a powerful technology for studying copy number variations at higher resolution in many cancers including colorectal cancer. However, the lack of standardized systematic protocols including bioinformatic algorithms to obtain and analyze genomic data resulted in significant variation in the reported copy number aberration (CNA) data. Here, we present genomic aCGH data obtained using highly stringent and functionally relevant statistical algorithms from 116 well-defined microsatellites instable (MSI) and microsatellite stable (MSS) colorectal cancers. We utilized aCGH to characterize genomic CNAs in 116 well-defined sets of colorectal cancer (CRC) cases. We further applied the significance testing for aberrant copy number (STAC) and Genomic Identification of Significant Targets in Cancer (GISTIC) algorithms to identify functionally relevant (nonrandom) chromosomal aberrations in the analyzed colorectal cancer samples. Our results produced high resolution genomic landscapes of both, MSI and MSS sporadic CRC. We found that CNAs in MSI and MSS CRCs are heterogeneous in nature but may be divided into 3 distinct genomic patterns. Moreover, we show that although CNAs in MSI and MSS CRCs differ with respect to their size, number and chromosomal distribution, the functional copy number aberrations obtained from MSI and MSS CRCs were in fact comparable but not identical. These unifying CNAs were verified by MLPA tumor-loss gene panel, which spans 15 different chromosomal locations and contains 50 probes for at least 20 tumor suppressor genes. Consistently, deletion/amplification in these frequently cancer altered genes were identical in MSS and MSI CRCs. Our results suggest that MSI and MSS copy number aberrations driving CRC may be functionally comparable.

  13. Overview of Protein Microarrays

    PubMed Central

    Reymond Sutandy, FX; Qian, Jiang; Chen, Chien-Sheng; Zhu, Heng

    2013-01-01

    Protein microarray is an emerging technology that provides a versatile platform for characterization of hundreds of thousands of proteins in a highly parallel and high-throughput way. Two major classes of protein microarrays are defined to describe their applications: analytical and functional protein microarrays. In addition, tissue or cell lysates can also be fractionated and spotted on a slide to form a reverse-phase protein microarray. While the fabrication technology is maturing, applications of protein microarrays, especially functional protein microarrays, have flourished during the past decade. Here, we will first review recent advances in the protein microarray technologies, and then present a series of examples to illustrate the applications of analytical and functional protein microarrays in both basic and clinical research. The research areas will include detection of various binding properties of proteins, study of protein posttranslational modifications, analysis of host-microbe interactions, profiling antibody specificity, and identification of biomarkers in autoimmune diseases. As a powerful technology platform, it would not be surprising if protein microarrays will become one of the leading technologies in proteomic and diagnostic fields in the next decade. PMID:23546620

  14. A comprehensive sensitivity analysis of microarray breast cancer classification under feature variability.

    PubMed

    Sontrop, Herman M J; Moerland, Perry D; van den Ham, René; Reinders, Marcel J T; Verhaegh, Wim F J

    2009-11-26

    Large discrepancies in signature composition and outcome concordance have been observed between different microarray breast cancer expression profiling studies. This is often ascribed to differences in array platform as well as biological variability. We conjecture that other reasons for the observed discrepancies are the measurement error associated with each feature and the choice of preprocessing method. Microarray data are known to be subject to technical variation and the confidence intervals around individual point estimates of expression levels can be wide. Furthermore, the estimated expression values also vary depending on the selected preprocessing scheme. In microarray breast cancer classification studies, however, these two forms of feature variability are almost always ignored and hence their exact role is unclear. We have performed a comprehensive sensitivity analysis of microarray breast cancer classification under the two types of feature variability mentioned above. We used data from six state of the art preprocessing methods, using a compendium consisting of eight different datasets, involving 1131 hybridizations, containing data from both one and two-color array technology. For a wide range of classifiers, we performed a joint study on performance, concordance and stability. In the stability analysis we explicitly tested classifiers for their noise tolerance by using perturbed expression profiles that are based on uncertainty information directly related to the preprocessing methods. Our results indicate that signature composition is strongly influenced by feature variability, even if the array platform and the stratification of patient samples are identical. In addition, we show that there is often a high level of discordance between individual class assignments for signatures constructed on data coming from different preprocessing schemes, even if the actual signature composition is identical. Feature variability can have a strong impact on

  15. Effective automated prediction of vertebral column pathologies based on logistic model tree with SMOTE preprocessing.

    PubMed

    Karabulut, Esra Mahsereci; Ibrikci, Turgay

    2014-05-01

    This study develops a logistic model tree based automation system based on for accurate recognition of types of vertebral column pathologies. Six biomechanical measures are used for this purpose: pelvic incidence, pelvic tilt, lumbar lordosis angle, sacral slope, pelvic radius and grade of spondylolisthesis. A two-phase classification model is employed in which the first step is preprocessing the data by use of Synthetic Minority Over-sampling Technique (SMOTE), and the second one is feeding the classifier Logistic Model Tree (LMT) with the preprocessed data. We have achieved an accuracy of 89.73 %, and 0.964 Area Under Curve (AUC) in computer based automatic detection of the pathology. This was validated via a 10-fold-cross-validation experiment conducted on clinical records of 310 patients. The study also presents a comparative analysis of the vertebral column data with the use of several machine learning algorithms.

  16. Wavelet-based illumination invariant preprocessing in face recognition

    NASA Astrophysics Data System (ADS)

    Goh, Yi Zheng; Teoh, Andrew Beng Jin; Goh, Kah Ong Michael

    2009-04-01

    Performance of a contemporary two-dimensional face-recognition system has not been satisfied due to the variation in lighting. As a result, many works of solving illumination variation in face recognition have been carried out in past decades. Among them, the Illumination-Reflectance model is one of the generic models that is used to separate the individual reflectance and illumination components of an object. The illumination component can be removed by means of image-processing techniques to regain an intrinsic face feature, which is depicted by the reflectance component. We present a wavelet-based illumination invariant algorithm as a preprocessing technique for face recognition. On the basis of the multiresolution nature of wavelet analysis, we decompose both illumination and reflectance components from a face image in a systematic way. The illumination component wherein resides in the low-spatial-frequency subband can be eliminated efficiently. This technique works out very advantageously for achieving higher recognition performance on YaleB, CMU PIE, and FRGC face databases.

  17. Zseq: An Approach for Preprocessing Next-Generation Sequencing Data.

    PubMed

    Alkhateeb, Abedalrhman; Rueda, Luis

    2017-08-01

    Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover, de novo assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.

  18. A perceptual preprocess method for 3D-HEVC

    NASA Astrophysics Data System (ADS)

    Shi, Yawen; Wang, Yongfang; Wang, Yubing

    2015-08-01

    A perceptual preprocessing method for 3D-HEVC coding is proposed in the paper. Firstly we proposed a new JND model, which accounts for luminance contrast masking effect, spatial masking effect, and temporal masking effect, saliency characteristic as well as depth information. We utilize spectral residual approach to obtain the saliency map and built a visual saliency factor based on saliency map. In order to distinguish the sensitivity of objects in different depth. We segment each texture frame into foreground and background by a automatic threshold selection algorithm using corresponding depth information, and then built a depth weighting factor. A JND modulation factor is built with a linear combined with visual saliency factor and depth weighting factor to adjust the JND threshold. Then, we applied the proposed JND model to 3D-HEVC for residual filtering and distortion coefficient processing. The filtering process is that the residual value will be set to zero if the JND threshold is greater than residual value, or directly subtract the JND threshold from residual value if JND threshold is less than residual value. Experiment results demonstrate that the proposed method can achieve average bit rate reduction of 15.11%, compared to the original coding scheme with HTM12.1, while maintains the same subjective quality.

  19. Antibiotic treatment algorithm development based on a microarray nucleic acid assay for rapid bacterial identification and resistance determination from positive blood cultures.

    PubMed

    Rödel, Jürgen; Karrasch, Matthias; Edel, Birgit; Stoll, Sylvia; Bohnert, Jürgen; Löffler, Bettina; Saupe, Angela; Pfister, Wolfgang

    2016-03-01

    Rapid diagnosis of bloodstream infections remains a challenge for the early targeting of an antibiotic therapy in sepsis patients. In recent studies, the reliability of the Nanosphere Verigene Gram-positive and Gram-negative blood culture (BC-GP and BC-GN) assays for the rapid identification of bacteria and resistance genes directly from positive BCs has been demonstrated. In this work, we have developed a model to define treatment recommendations by combining Verigene test results with knowledge on local antibiotic resistance patterns of bacterial pathogens. The data of 275 positive BCs were analyzed. Two hundred sixty-three isolates (95.6%) were included in the Verigene assay panels, and 257 isolates (93.5%) were correctly identified. The agreement of the detection of resistance genes with subsequent phenotypic susceptibility testing was 100%. The hospital antibiogram was used to develop a treatment algorithm on the basis of Verigene results that may contribute to a faster patient management.

  20. SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read

    PubMed Central

    2010-01-01

    Background High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. Results SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. Conclusions SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts. PMID:20089148

  1. How cyanobacteria pose new problems to old methods: challenges in microarray time series analysis

    PubMed Central

    2013-01-01

    Background The transcriptomes of several cyanobacterial strains have been shown to exhibit diurnal oscillation patterns, reflecting the diurnal phototrophic lifestyle of the organisms. The analysis of such genome-wide transcriptional oscillations is often facilitated by the use of clustering algorithms in conjunction with a number of pre-processing steps. Biological interpretation is usually focussed on the time and phase of expression of the resulting groups of genes. However, the use of microarray technology in such studies requires the normalization of pre-processing data, with unclear impact on the qualitative and quantitative features of the derived information on the number of oscillating transcripts and their respective phases. Results A microarray based evaluation of diurnal expression in the cyanobacterium Synechocystis sp. PCC 6803 is presented. As expected, the temporal expression patterns reveal strong oscillations in transcript abundance. We compare the Fourier transformation-based expression phase before and after the application of quantile normalization, median polishing, cyclical LOESS, and least oscillating set (LOS) normalization. Whereas LOS normalization mostly preserves the phases of the raw data, the remaining methods introduce systematic biases. In particular, quantile-normalization is found to introduce a phase-shift of 180°, effectively changing night-expressed genes into day-expressed ones. Comparison of a large number of clustering results of differently normalized data shows that the normalization method determines the result. Subsequent steps, such as the choice of data transformation, similarity measure, and clustering algorithm, only play minor roles. We find that the standardization and the DTF transformation are favorable for the clustering of time series in contrast to the 12 m transformation. We use the cluster-wise functional enrichment of a clustering derived by LOS normalization, clustering using flowClust, and DFT

  2. Pre-processing variance reducing techniques in multispectral positron emission tomography.

    PubMed

    Yao, R; Msaki, P; Lecomte, R

    1997-11-01

    Stochastic fluctuations and systematic errors severely restrict the potential of multispectral acquisition to improve scatter correction by energy-dependent processing in high-resolution positron emission tomography (PET). To overcome this limitation, three pre-processing approaches which reduce stochastic fluctuations and systematic errors without degrading spatial resolution were investigated: statistical variance was reduced by smoothing acquired data in energy space, systematic errors due to nonuniform detector efficiency were minimized by normalizing the data in the spatial domain and the overall variance was further reduced by selecting an optimal pre-processing sequence. Selection of the best protocol to reduce stochastic fluctuations entailed comparisons between four smoothing algorithms (prior constrained (PC) smoothing, weighted smoothing (WS), ideal low-pass filtering (ILF) and mean median (MM) smoothing) and permutations of three pre-processing procedures (smoothing, normalization and subtraction of random events). Results demonstrated that spectral smoothing by WS, ILF and MM efficiently reduces the statistical variance in both the energy and spatial domains without observable spatial resolution loss. The ILF algorithm was found to be the most convenient in terms of simplicity and efficiency. Regardless of the position of subtraction of randoms in the sequence, reduction of the systematic errors by normalization followed by spectral smoothing to suppress statistical noise produced the best results. However, subtraction of random events first in the sequence reduces computation load by half since the need to pre-process this distribution before subtraction is removed. In summary, normalizing data in the spatial domain and smoothing data in energy space are essential steps required to reduce systematic errors and statistical variance independently without degrading spatial resolution of multispectral PET data.

  3. Spatial-spectral preprocessing for endmember extraction on GPU's

    NASA Astrophysics Data System (ADS)

    Jimenez, Luis I.; Plaza, Javier; Plaza, Antonio; Li, Jun

    2016-10-01

    Spectral unmixing is focused in the identification of spectrally pure signatures, called endmembers, and their corresponding abundances in each pixel of a hyperspectral image. Mainly focused on the spectral information contained in the hyperspectral images, endmember extraction techniques have recently included spatial information to achieve more accurate results. Several algorithms have been developed for automatic or semi-automatic identification of endmembers using spatial and spectral information, including the spectral-spatial endmember extraction (SSEE) where, within a preprocessing step in the technique, both sources of information are extracted from the hyperspectral image and equally used for this purpose. Previous works have implemented the SSEE technique in four main steps: 1) local eigenvectors calculation in each sub-region in which the original hyperspectral image is divided; 2) computation of the maxima and minima projection of all eigenvectors over the entire hyperspectral image in order to obtain a candidates pixels set; 3) expansion and averaging of the signatures of the candidate set; 4) ranking based on the spectral angle distance (SAD). The result of this method is a list of candidate signatures from which the endmembers can be extracted using various spectral-based techniques, such as orthogonal subspace projection (OSP), vertex component analysis (VCA) or N-FINDR. Considering the large volume of data and the complexity of the calculations, there is a need for efficient implementations. Latest- generation hardware accelerators such as commodity graphics processing units (GPUs) offer a good chance for improving the computational performance in this context. In this paper, we develop two different implementations of the SSEE algorithm using GPUs. Both are based on the eigenvectors computation within each sub-region of the first step, one using the singular value decomposition (SVD) and another one using principal component analysis (PCA). Based

  4. Comparing Binaural Pre-processing Strategies I

    PubMed Central

    Krawczyk-Becker, Martin; Marquardt, Daniel; Völker, Christoph; Hu, Hongmei; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Ernst, Stephan M. A.; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias

    2015-01-01

    In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios. PMID:26721920

  5. A survey of visual preprocessing and shape representation techniques

    NASA Technical Reports Server (NTRS)

    Olshausen, Bruno A.

    1988-01-01

    Many recent theories and methods proposed for visual preprocessing and shape representation are summarized. The survey brings together research from the fields of biology, psychology, computer science, electrical engineering, and most recently, neural networks. It was motivated by the need to preprocess images for a sparse distributed memory (SDM), but the techniques presented may also prove useful for applying other associative memories to visual pattern recognition. The material of this survey is divided into three sections: an overview of biological visual processing; methods of preprocessing (extracting parts of shape, texture, motion, and depth); and shape representation and recognition (form invariance, primitives and structural descriptions, and theories of attention).

  6. Software for pre-processing Illumina next-generation sequencing short read sequences

    PubMed Central

    2014-01-01

    Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference

  7. Optimization and clinical validation of a pathogen detection microarray

    PubMed Central

    Wong, Christopher W; Heng, Charlie Lee Wah; Wan Yee, Leong; Soh, Shirlena WL; Kartasasmita, Cissy B; Simoes, Eric AF; Hibberd, Martin L; Sung, Wing-Kin; Miller, Lance D

    2007-01-01

    DNA microarrays used as 'genomic sensors' have great potential in clinical diagnostics. Biases inherent in random PCR-amplification, cross-hybridization effects, and inadequate microarray analysis, however, limit detection sensitivity and specificity. Here, we have studied the relationships between viral amplification efficiency, hybridization signal, and target-probe annealing specificity using a customized microarray platform. Novel features of this platform include the development of a robust algorithm that accurately predicts PCR bias during DNA amplification and can be used to improve PCR primer design, as well as a powerful statistical concept for inferring pathogen identity from probe recognition signatures. Compared to real-time PCR, the microarray platform identified pathogens with 94% accuracy (76% sensitivity and 100% specificity) in a panel of 36 patient specimens. Our findings show that microarrays can be used for the robust and accurate diagnosis of pathogens, and further substantiate the use of microarray technology in clinical diagnostics. PMID:17531104

  8. High confidence rule mining for microarray analysis.

    PubMed

    McIntosh, Tara; Chawla, Sanjay

    2007-01-01

    We present an association rule mining method for mining high confidence rules, which describe interesting gene relationships from microarray datasets. Microarray datasets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimised for sparse datasets. A new family of row-enumeration rule mining algorithms have emerged to facilitate mining in dense datasets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MaxConf, to mine high confidence rules from microarray data. MaxConf is a support-free algorithm which directly uses the confidence measure to effectively prune the search space. Experiments on three microarray datasets show that MaxConf outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach -- the rules discovered by MaxConf are substantially more interesting and meaningful compared with support-based methods.

  9. Solid Earth ARISTOTELES mission data preprocessing simulation of gravity gradiometer

    NASA Astrophysics Data System (ADS)

    Avanzi, G.; Stolfa, R.; Versini, B.

    Data preprocessing of the ARISTOTELES mission, which measures the Earth gravity gradient in a near polar orbit, was studied. The mission measures the gravity field at sea level through indirect measurements performed on the orbit, so that the evaluation steps consist in processing data from GRADIO accelerometer measurements. Due to the physical phenomena involved in the data collection experiment, it is possible to isolate at an initial stage a preprocessing of the gradiometer data based only on GRADIO measurements and not needing a detailed knowledge of the attitude and attitude rate sensors output. This preprocessing produces intermediate quantities used in future stages of the reduction. Software was designed and run to evaluate for this level of data reduction the achievable accuracy as a function of knowledge on instrument and satellite status parameters. The architecture of this element of preprocessing is described.

  10. Preprocessing of emotional visual information in the human piriform cortex.

    PubMed

    Schulze, Patrick; Bestgen, Anne-Kathrin; Lech, Robert K; Kuchinke, Lars; Suchan, Boris

    2017-08-23

    This study examines the processing of visual information by the olfactory system in humans. Recent data point to the processing of visual stimuli by the piriform cortex, a region mainly known as part of the primary olfactory cortex. Moreover, the piriform cortex generates predictive templates of olfactory stimuli to facilitate olfactory processing. This study fills the gap relating to the question whether this region is also capable of preprocessing emotional visual information. To gain insight into the preprocessing and transfer of emotional visual information into olfactory processing, we recorded hemodynamic responses during affective priming using functional magnetic resonance imaging (fMRI). Odors of different valence (pleasant, neutral and unpleasant) were primed by images of emotional facial expressions (happy, neutral and disgust). Our findings are the first to demonstrate that the piriform cortex preprocesses emotional visual information prior to any olfactory stimulation and that the emotional connotation of this preprocessing is subsequently transferred and integrated into an extended olfactory network for olfactory processing.

  11. Microarray in parasitic infections

    PubMed Central

    Sehgal, Rakesh; Misra, Shubham; Anand, Namrata; Sharma, Monika

    2012-01-01

    Modern biology and genomic sciences are rooted in parasitic disease research. Genome sequencing efforts have provided a wealth of new biological information that promises to have a major impact on our understanding of parasites. Microarrays provide one of the major high-throughput platforms by which this information can be exploited in the laboratory. Many excellent reviews and technique articles have recently been published on applying microarrays to organisms for which fully annotated genomes are at hand. However, many parasitologists work on organisms whose genomes have been only partially sequenced. This review is mainly focused on how to use microarray in these situations. PMID:23508469

  12. Characterizing the continuously acquired cardiovascular time series during hemodialysis, using median hybrid filter preprocessing noise reduction

    PubMed Central

    Wilson, Scott; Bowyer, Andrea; Harrap, Stephen B

    2015-01-01

    The clinical characterization of cardiovascular dynamics during hemodialysis (HD) has important pathophysiological implications in terms of diagnostic, cardiovascular risk assessment, and treatment efficacy perspectives. Currently the diagnosis of significant intradialytic systolic blood pressure (SBP) changes among HD patients is imprecise and opportunistic, reliant upon the presence of hypotensive symptoms in conjunction with coincident but isolated noninvasive brachial cuff blood pressure (NIBP) readings. Considering hemodynamic variables as a time series makes a continuous recording approach more desirable than intermittent measures; however, in the clinical environment, the data signal is susceptible to corruption due to both impulsive and Gaussian-type noise. Signal preprocessing is an attractive solution to this problem. Prospectively collected continuous noninvasive SBP data over the short-break intradialytic period in ten patients was preprocessed using a novel median hybrid filter (MHF) algorithm and compared with 50 time-coincident pairs of intradialytic NIBP measures from routine HD practice. The median hybrid preprocessing technique for continuously acquired cardiovascular data yielded a dynamic regression without significant noise and artifact, suitable for high-level profiling of time-dependent SBP behavior. Signal accuracy is highly comparable with standard NIBP measurement, with the added clinical benefit of dynamic real-time hemodynamic information. PMID:25678827

  13. [Study of near infrared spectral preprocessing and wavelength selection methods for endometrial cancer tissue].

    PubMed

    Zhao, Li-Ting; Xiang, Yu-Hong; Dai, Yin-Mei; Zhang, Zhuo-Yong

    2010-04-01

    Near infrared spectroscopy was applied to measure the tissue slice of endometrial tissues for collecting the spectra. A total of 154 spectra were obtained from 154 samples. The number of normal, hyperplasia, and malignant samples was 36, 60, and 58, respectively. Original near infrared spectra are composed of many variables, for example, interference information including instrument errors and physical effects such as particle size and light scatter. In order to reduce these influences, original spectra data should be performed with different spectral preprocessing methods to compress variables and extract useful information. So the methods of spectral preprocessing and wavelength selection have played an important role in near infrared spectroscopy technique. In the present paper the raw spectra were processed using various preprocessing methods including first derivative, multiplication scatter correction, Savitzky-Golay first derivative algorithm, standard normal variate, smoothing, and moving-window median. Standard deviation was used to select the optimal spectral region of 4 000-6 000 cm(-1). Then principal component analysis was used for classification. Principal component analysis results showed that three types of samples could be discriminated completely and the accuracy almost achieved 100%. This study demonstrated that near infrared spectroscopy technology and chemometrics method could be a fast, efficient, and novel means to diagnose cancer. The proposed methods would be a promising and significant diagnosis technique of early stage cancer.

  14. Flexibility and utility of pre-processing methods in converting STXM setups for ptychography - Final Paper

    SciTech Connect

    Fromm, Catherine

    2015-08-20

    Ptychography is an advanced diffraction based imaging technique that can achieve resolution of 5nm and below. It is done by scanning a sample through a beam of focused x-rays using discrete yet overlapping scan steps. Scattering data is collected on a CCD camera, and the phase of the scattered light is reconstructed with sophisticated iterative algorithms. Because the experimental setup is similar, ptychography setups can be created by retrofitting existing STXM beam lines with new hardware. The other challenge comes in the reconstruction of the collected scattering images. Scattering data must be adjusted and packaged with experimental parameters to calibrate the reconstruction software. The necessary pre-processing of data prior to reconstruction is unique to each beamline setup, and even the optical alignments used on that particular day. Pre-processing software must be developed to be flexible and efficient in order to allow experiments appropriate control and freedom in the analysis of their hard-won data. This paper will describe the implementation of pre-processing software which successfully connects data collection steps to reconstruction steps, letting the user accomplish accurate and reliable ptychography.

  15. Comparing Binaural Pre-processing Strategies II

    PubMed Central

    Hu, Hongmei; Krawczyk-Becker, Martin; Marquardt, Daniel; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Bomke, Katrin; Plotz, Karsten; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias

    2015-01-01

    Several binaural audio signal enhancement algorithms were evaluated with respect to their potential to improve speech intelligibility in noise for users of bilateral cochlear implants (CIs). 50% speech reception thresholds (SRT50) were assessed using an adaptive procedure in three distinct, realistic noise scenarios. All scenarios were highly nonstationary, complex, and included a significant amount of reverberation. Other aspects, such as the perfectly frontal target position, were idealized laboratory settings, allowing the algorithms to perform better than in corresponding real-world conditions. Eight bilaterally implanted CI users, wearing devices from three manufacturers, participated in the study. In all noise conditions, a substantial improvement in SRT50 compared to the unprocessed signal was observed for most of the algorithms tested, with the largest improvements generally provided by binaural minimum variance distortionless response (MVDR) beamforming algorithms. The largest overall improvement in speech intelligibility was achieved by an adaptive binaural MVDR in a spatially separated, single competing talker noise scenario. A no-pre-processing condition and adaptive differential microphones without a binaural link served as the two baseline conditions. SRT50 improvements provided by the binaural MVDR beamformers surpassed the performance of the adaptive differential microphones in most cases. Speech intelligibility improvements predicted by instrumental measures were shown to account for some but not all aspects of the perceptually obtained SRT50 improvements measured in bilaterally implanted CI users. PMID:26721921

  16. DNA Microarray Technology

    SciTech Connect

    WERNER-WASHBURNE, MARGARET; DAVIDSON, GEORGE S.

    2002-01-01

    Collaboration between Sandia National Laboratories and the University of New Mexico Biology Department resulted in the capability to train students in microarray techniques and the interpretation of data from microarray experiments. These studies provide for a better understanding of the role of stationary phase and the gene regulation involved in exit from stationary phase, which may eventually have important clinical implications. Importantly, this research trained numerous students and is the basis for three new Ph.D. projects.

  17. Microarray Data Analysis and Mining Tools

    PubMed Central

    Selvaraj, Saravanakumar; Natarajan, Jeyakumar

    2011-01-01

    Microarrays are one of the latest breakthroughs in experimental molecular biology that allow monitoring the expression levels of tens of thousands of genes simultaneously. Arrays have been applied to studies in gene expression, genome mapping, SNP discrimination, transcription factor activity, toxicity, pathogen identification and many other applications. In this paper we concentrate on discussing various bioinformatics tools used for microarray data mining tasks with its underlying algorithms, web resources and relevant reference. We emphasize this paper mainly for digital biologists to get an aware about the plethora of tools and programs available for microarray data analysis. First, we report the common data mining applications such as selecting differentially expressed genes, clustering, and classification. Next, we focused on gene expression based knowledge discovery studies such as transcription factor binding site analysis, pathway analysis, protein- protein interaction network analysis and gene enrichment analysis. PMID:21584183

  18. Microarray data analysis and mining tools.

    PubMed

    Selvaraj, Saravanakumar; Natarajan, Jeyakumar

    2011-04-22

    Microarrays are one of the latest breakthroughs in experimental molecular biology that allow monitoring the expression levels of tens of thousands of genes simultaneously. Arrays have been applied to studies in gene expression, genome mapping, SNP discrimination, transcription factor activity, toxicity, pathogen identification and many other applications. In this paper we concentrate on discussing various bioinformatics tools used for microarray data mining tasks with its underlying algorithms, web resources and relevant reference. We emphasize this paper mainly for digital biologists to get an aware about the plethora of tools and programs available for microarray data analysis. First, we report the common data mining applications such as selecting differentially expressed genes, clustering, and classification. Next, we focused on gene expression based knowledge discovery studies such as transcription factor binding site analysis, pathway analysis, protein- protein interaction network analysis and gene enrichment analysis.

  19. Designing microarray phantoms for hyperspectral imaging validation

    PubMed Central

    Clarke, Matthew L.; Lee, Ji Youn; Samarov, Daniel V.; Allen, David W.; Litorja, Maritoni; Nossal, Ralph; Hwang, Jeeseong

    2012-01-01

    The design and fabrication of custom-tailored microarrays for use as phantoms in the characterization of hyperspectral imaging systems is described. Corresponding analysis methods for biologically relevant samples are also discussed. An image-based phantom design was used to program a microarrayer robot to print prescribed mixtures of dyes onto microscope slides. The resulting arrays were imaged by a hyperspectral imaging microscope. The shape of the spots results in significant scattering signals, which can be used to test image analysis algorithms. Separation of the scattering signals allowed elucidation of individual dye spectra. In addition, spectral fitting of the absorbance spectra of complex dye mixtures was performed in order to determine local dye concentrations. Such microarray phantoms provide a robust testing platform for comparisons of hyperspectral imaging acquisition and analysis methods. PMID:22741076

  20. Identification of Pou5f1, Sox2, and Nanog downstream target genes with statistical confidence by applying a novel algorithm to time course microarray and genome-wide chromatin immunoprecipitation data

    PubMed Central

    Sharov, Alexei A; Masui, Shinji; Sharova, Lioudmila V; Piao, Yulan; Aiba, Kazuhiro; Matoba, Ryo; Xin, Li; Niwa, Hitoshi; Ko, Minoru SH

    2008-01-01

    Background Target genes of a transcription factor (TF) Pou5f1 (Oct3/4 or Oct4), which is essential for pluripotency maintenance and self-renewal of embryonic stem (ES) cells, have previously been identified based on their response to Pou5f1 manipulation and occurrence of Chromatin-immunoprecipitation (ChIP)-binding sites in promoters. However, many responding genes with binding sites may not be direct targets because response may be mediated by other genes and ChIP-binding site may not be functional in terms of transcription regulation. Results To reduce the number of false positives, we propose to separate responding genes into groups according to direction, magnitude, and time of response, and to apply the false discovery rate (FDR) criterion to each group individually. Using this novel algorithm with stringent statistical criteria (FDR < 0.2) to a compendium of published and new microarray data (3, 6, 12, and 24 hr after Pou5f1 suppression) and published ChIP data, we identified 420 tentative target genes (TTGs) for Pou5f1. The majority of TTGs (372) were down-regulated after Pou5f1 suppression, indicating that the Pou5f1 functions as an activator of gene expression when it binds to promoters. Interestingly, many activated genes are potent suppressors of transcription, which include polycomb genes, zinc finger TFs, chromatin remodeling factors, and suppressors of signaling. Similar analysis showed that Sox2 and Nanog also function mostly as transcription activators in cooperation with Pou5f1. Conclusion We have identified the most reliable sets of direct target genes for key pluripotency genes – Pou5f1, Sox2, and Nanog, and found that they predominantly function as activators of downstream gene expression. Thus, most genes related to cell differentiation are suppressed indirectly. PMID:18522731

  1. Preprocessing Raw Data in Clinical Medicine for a Data Mining Purpose

    NASA Astrophysics Data System (ADS)

    Peterková, Andrea; Michaľčonok, German

    2016-12-01

    Dealing with data from the field of medicine is nowadays very current and difficult. On a global scale, a large amount of medical data is produced on an everyday basis. For the purpose of our research, we understand medical data as data about patients like results from laboratory analysis, results from screening examinations (CT, ECHO) and clinical parameters. This data is usually in a raw format, difficult to understand, non-standard and not suitable for further processing or analysis. This paper aims to describe the possible method of data preparation and preprocessing of such raw medical data into a form, where further analysis algorithms can be applied.

  2. Comparing Binaural Pre-processing Strategies III: Speech Intelligibility of Normal-Hearing and Hearing-Impaired Listeners.

    PubMed

    Völker, Christoph; Warzybok, Anna; Ernst, Stephan M A

    2015-12-30

    A comprehensive evaluation of eight signal pre-processing strategies, including directional microphones, coherence filters, single-channel noise reduction, binaural beamformers, and their combinations, was undertaken with normal-hearing (NH) and hearing-impaired (HI) listeners. Speech reception thresholds (SRTs) were measured in three noise scenarios (multitalker babble, cafeteria noise, and single competing talker). Predictions of three common instrumental measures were compared with the general perceptual benefit caused by the algorithms. The individual SRTs measured without pre-processing and individual benefits were objectively estimated using the binaural speech intelligibility model. Ten listeners with NH and 12 HI listeners participated. The participants varied in age and pure-tone threshold levels. Although HI listeners required a better signal-to-noise ratio to obtain 50% intelligibility than listeners with NH, no differences in SRT benefit from the different algorithms were found between the two groups. With the exception of single-channel noise reduction, all algorithms showed an improvement in SRT of between 2.1 dB (in cafeteria noise) and 4.8 dB (in single competing talker condition). Model predictions with binaural speech intelligibility model explained 83% of the measured variance of the individual SRTs in the no pre-processing condition. Regarding the benefit from the algorithms, the instrumental measures were not able to predict the perceptual data in all tested noise conditions. The comparable benefit observed for both groups suggests a possible application of noise reduction schemes for listeners with different hearing status. Although the model can predict the individual SRTs without pre-processing, further development is necessary to predict the benefits obtained from the algorithms at an individual level.

  3. Design of radial basis function neural network classifier realized with the aid of data preprocessing techniques: design and analysis

    NASA Astrophysics Data System (ADS)

    Oh, Sung-Kwun; Kim, Wook-Dong; Pedrycz, Witold

    2016-05-01

    In this paper, we introduce a new architecture of optimized Radial Basis Function neural network classifier developed with the aid of fuzzy clustering and data preprocessing techniques and discuss its comprehensive design methodology. In the preprocessing part, the Linear Discriminant Analysis (LDA) or Principal Component Analysis (PCA) algorithm forms a front end of the network. The transformed data produced here are used as the inputs of the network. In the premise part, the Fuzzy C-Means (FCM) algorithm determines the receptive field associated with the condition part of the rules. The connection weights of the classifier are of functional nature and come as polynomial functions forming the consequent part. The Particle Swarm Optimization algorithm optimizes a number of essential parameters needed to improve the accuracy of the classifier. Those optimized parameters include the type of data preprocessing, the dimensionality of the feature vectors produced by the LDA (or PCA), the number of clusters (rules), the fuzzification coefficient used in the FCM algorithm and the orders of the polynomials of networks. The performance of the proposed classifier is reported for several benchmarking data-sets and is compared with the performance of other classifiers reported in the previous studies.

  4. E-Predict: a computational strategy for species identification based on observed DNA microarray hybridization patterns.

    PubMed

    Urisman, Anatoly; Fischer, Kael F; Chiu, Charles Y; Kistler, Amy L; Beck, Shoshannah; Wang, David; DeRisi, Joseph L

    2005-01-01

    DNA microarrays may be used to identify microbial species present in environmental and clinical samples. However, automated tools for reliable species identification based on observed microarray hybridization patterns are lacking. We present an algorithm, E-Predict, for microarray-based species identification. E-Predict compares observed hybridization patterns with theoretical energy profiles representing different species. We demonstrate the application of the algorithm to viral detection in a set of clinical samples and discuss its relevance to other metagenomic applications.

  5. Effects of preprocessing method on TVOC emission of car mat

    NASA Astrophysics Data System (ADS)

    Wang, Min; Jia, Li

    2013-02-01

    The effects of the mat preprocessing method on total volatile organic compounds (TVOC) emission of car mat are studied in this paper. An appropriate TVOC emission period for car mat is suggested. The emission factors for total volatile organic compounds from three kinds of new car mats are discussed. The car mats are preprocessed by washing, baking and ventilation. When car mats are preprocessed by washing, the TVOC emission for all samples tested are lower than that preprocessed in other methods. The TVOC emission is in stable situation for a minimum of 4 days. The TVOC emitted from some samples may exceed 2500μg/kg. But the TVOC emitted from washed Polyamide (PA) and wool mat is less than 2500μg/kg. The emission factors of total volatile organic compounds (TVOC) are experimentally investigated in the case of different preprocessing methods. The air temperature in environment chamber and the water temperature for washing are important factors influencing on emission of car mats.

  6. EARLINET Single Calculus Chain - technical - Part 1: Pre-processing of raw lidar data

    NASA Astrophysics Data System (ADS)

    D'Amico, Giuseppe; Amodeo, Aldo; Mattis, Ina; Freudenthaler, Volker; Pappalardo, Gelsomina

    2016-02-01

    In this paper we describe an automatic tool for the pre-processing of aerosol lidar data called ELPP (EARLINET Lidar Pre-Processor). It is one of two calculus modules of the EARLINET Single Calculus Chain (SCC), the automatic tool for the analysis of EARLINET data. ELPP is an open source module that executes instrumental corrections and data handling of the raw lidar signals, making the lidar data ready to be processed by the optical retrieval algorithms. According to the specific lidar configuration, ELPP automatically performs dead-time correction, atmospheric and electronic background subtraction, gluing of lidar signals, and trigger-delay correction. Moreover, the signal-to-noise ratio of the pre-processed signals can be improved by means of configurable time integration of the raw signals and/or spatial smoothing. ELPP delivers the statistical uncertainties of the final products by means of error propagation or Monte Carlo simulations. During the development of ELPP, particular attention has been payed to make the tool flexible enough to handle all lidar configurations currently used within the EARLINET community. Moreover, it has been designed in a modular way to allow an easy extension to lidar configurations not yet implemented. The primary goal of ELPP is to enable the application of quality-assured procedures in the lidar data analysis starting from the raw lidar data. This provides the added value of full traceability of each delivered lidar product. Several tests have been performed to check the proper functioning of ELPP. The whole SCC has been tested with the same synthetic data sets, which were used for the EARLINET algorithm inter-comparison exercise. ELPP has been successfully employed for the automatic near-real-time pre-processing of the raw lidar data measured during several EARLINET inter-comparison campaigns as well as during intense field campaigns.

  7. EARLINET Single Calculus Chain - technical - Part 1: Pre-processing of raw lidar data

    NASA Astrophysics Data System (ADS)

    D'Amico, G.; Amodeo, A.; Mattis, I.; Freudenthaler, V.; Pappalardo, G.

    2015-10-01

    In this paper we describe an automatic tool for the pre-processing of lidar data called ELPP (EARLINET Lidar Pre-Processor). It is one of two calculus modules of the EARLINET Single Calculus Chain (SCC), the automatic tool for the analysis of EARLINET data. The ELPP is an open source module that executes instrumental corrections and data handling of the raw lidar signals, making the lidar data ready to be processed by the optical retrieval algorithms. According to the specific lidar configuration, the ELPP automatically performs dead-time correction, atmospheric and electronic background subtraction, gluing of lidar signals, and trigger-delay correction. Moreover, the signal-to-noise ratio of the pre-processed signals can be improved by means of configurable time integration of the raw signals and/or spatial smoothing. The ELPP delivers the statistical uncertainties of the final products by means of error propagation or Monte Carlo simulations. During the development of the ELPP module, particular attention has been payed to make the tool flexible enough to handle all lidar configurations currently used within the EARLINET community. Moreover, it has been designed in a modular way to allow an easy extension to lidar configurations not yet implemented. The primary goal of the ELPP module is to enable the application of quality-assured procedures in the lidar data analysis starting from the raw lidar data. This provides the added value of full traceability of each delivered lidar product. Several tests have been performed to check the proper functioning of the ELPP module. The whole SCC has been tested with the same synthetic data sets, which were used for the EARLINET algorithm inter-comparison exercise. The ELPP module has been successfully employed for the automatic near-real-time pre-processing of the raw lidar data measured during several EARLINET inter-comparison campaigns as well as during intense field campaigns.

  8. Functional GPCR microarrays.

    PubMed

    Hong, Yulong; Webb, Brian L; Su, Hui; Mozdy, Eric J; Fang, Ye; Wu, Qi; Liu, Li; Beck, Jonathan; Ferrie, Ann M; Raghavan, Srikanth; Mauro, John; Carre, Alain; Müeller, Dirk; Lai, Fang; Rasnow, Brian; Johnson, Michael; Min, Hosung; Salon, John; Lahiri, Joydeep

    2005-11-09

    This paper describes G-protein-coupled receptor (GPCR) microarrays on porous glass substrates and functional assays based on the binding of a europium-labeled GTP analogue. The porous glass slides were made by casting a glass frit on impermeable glass slides and then coating with gamma-aminopropyl silane (GAPS). The emitted fluorescence was captured on an imager with a time-gated intensified CCD detector. Microarrays of the neurotensin receptor 1, the cholinergic receptor muscarinic 2, the opioid receptor mu, and the cannabinoid receptor 1 were fabricated by pin printing. The selective agonism of each of the receptors was observed. The screening of potential antagonists was demonstrated using a cocktail of agonists. The amount of activation observed was sufficient to permit determinations of EC50 and IC50. Such microarrays could potentially streamline drug discovery by helping integrate primary screening with selectivity and safety screening without compromising the essential functional information obtainable from cellular assays.

  9. Nanotechnologies in protein microarrays.

    PubMed

    Krizkova, Sona; Heger, Zbynek; Zalewska, Marta; Moulick, Amitava; Adam, Vojtech; Kizek, Rene

    2015-01-01

    Protein microarray technology became an important research tool for study and detection of proteins, protein-protein interactions and a number of other applications. The utilization of nanoparticle-based materials and nanotechnology-based techniques for immobilization allows us not only to extend the surface for biomolecule immobilization resulting in enhanced substrate binding properties, decreased background signals and enhanced reporter systems for more sensitive assays. Generally in contemporarily developed microarray systems, multiple nanotechnology-based techniques are combined. In this review, applications of nanoparticles and nanotechnologies in creating protein microarrays, proteins immobilization and detection are summarized. We anticipate that advanced nanotechnologies can be exploited to expand promising fields of proteins identification, monitoring of protein-protein or drug-protein interactions, or proteins structures.

  10. Multievidence microarray mining.

    PubMed

    Seifert, Martin; Scherf, Matthias; Epple, Anton; Werner, Thomas

    2005-10-01

    Microarray mining is a challenging task because of the superposition of several processes in the data. We believe that the combination of microarray data-based analyses (statistical significance analysis of gene expression) with array-independent analyses (literature-mining and promoter analysis) enables some of the problems of traditional array analysis to be overcome. As a proof-of-principle, we revisited publicly available microarray data derived from an experiment with platelet-derived growth factor (PDGF)-stimulated fibroblasts. Our strategy revealed results beyond the detection of the major metabolic pathway known to be linked to the PDGF response: we were able to identify the crosstalking regulatory networks underlying the metabolic pathway without using a priori knowledge about the experiment.

  11. Real-time acquisition and preprocessing system of transient electromagnetic data based on LabVIEW

    NASA Astrophysics Data System (ADS)

    Zhao, Huinan; Zhang, Shuang; Gu, Lingjia; Sun, Jian

    2014-09-01

    Transient electromagnetic method (TEM) is regarded as an everlasting issue for geological exploration. It is widely used in many research fields, such as mineral exploration, hydrogeology survey, engineering exploration and unexploded ordnance detection. The traditional measurement systems are often based on ARM DSP or FPGA, which have not real-time display, data preprocessing and data playback functions. In order to overcome the defects, a real-time data acquisition and preprocessing system based on LabVIEW virtual instrument development platform is proposed in the paper, moreover, a calibration model is established for TEM system based on a conductivity loop. The test results demonstrated that the system can complete real-time data acquisition and system calibration. For Transmit-Loop-Receive (TLR) response, the correlation coefficient between the measured results and the calculated results is 0.987. The measured results are basically consistent with the calculated results. Through the late inversion process for TLR, the signal of underground conductor was obtained. In the complex test environment, abnormal values usually exist in the measured data. In order to solve this problem, the judgment and revision algorithm of abnormal values is proposed in the paper. The test results proved that the proposed algorithm can effectively eliminate serious disturbance signals from the measured transient electromagnetic data.

  12. EEG preprocessing for synchronization estimation and epilepsy lateralization.

    PubMed

    Vélez-Pérez, H; Romo-Vázquez, R; Ranta, R; Louis-Dorr, V; Maillard, L

    2011-01-01

    The global framework of this paper is the synchronization analysis in EEG recordings. Two main objectives are pursued: the evaluation of the synchronization estimation for lateralization purposes in epileptic EEGs and the evaluation of the effect of the preprocessing (artifact and noise cancelling by blind source separation, wavelet denoising and classification) on the synchronization analysis. We propose a new global synchronization index, based on the classical cross power spectrum, estimated for each cerebral hemisphere. After preprocessing, the proposed index is able to correctly lateralize the epileptic zone in over 90% of the cases.

  13. Enhancing Interdisciplinary Mathematics and Biology Education: A Microarray Data Analysis Course Bridging These Disciplines

    PubMed Central

    Evans, Irene M.

    2010-01-01

    BIO2010 put forth the goal of improving the mathematical educational background of biology students. The analysis and interpretation of microarray high-dimensional data can be very challenging and is best done by a statistician and a biologist working and teaching in a collaborative manner. We set up such a collaboration and designed a course on microarray data analysis. We started using Genome Consortium for Active Teaching (GCAT) materials and Microarray Genome and Clustering Tool software and added R statistical software along with Bioconductor packages. In response to student feedback, one microarray data set was fully analyzed in class, starting from preprocessing to gene discovery to pathway analysis using the latter software. A class project was to conduct a similar analysis where students analyzed their own data or data from a published journal paper. This exercise showed the impact that filtering, preprocessing, and different normalization methods had on gene inclusion in the final data set. We conclude that this course achieved its goals to equip students with skills to analyze data from a microarray experiment. We offer our insight about collaborative teaching as well as how other faculty might design and implement a similar interdisciplinary course. PMID:20810954

  14. Assessing quality and normalization of microarrays: case studies using neurological genomic data.

    PubMed

    Hershey, A D; Burdine, D; Liu, C; Nick, T G; Gilbert, D L; Glauser, T A

    2008-07-01

    Genomic analysis using microarray tools has the potential benefit of enhancing our understanding of neurological diseases. The analysis of these data is complex due to the large amount of data generated. Many tools have been developed to assist with this, but standard methods of analysis of these tools have not been established. This study analyzed the sensitivity and specificity of different analytical methods for gene identification and presents a standardized approach. Affymetrix HG-U133 plus 2.0 microarray datasets from two neurological diseases - chronic migraine and new-onset epilepsy - were used as source data and methods of analysis for normalization of data and identification of gene changes were compared. Housekeeping genes were used to identify non-specific changes and gender related genes were used to identify specific changes. Initial normalization of data revealed that 5-10% of the microarray were potential outliers due to technical errors. Two separate methods of analysis (dChip and Bioconductor) identified the same microarray chips as outliers. For specificity and sensitivity testing, performing a per-gene normalization was found to be inferior to standard preprocessing procedures using robust multichip average analysis. Technical variation in microarray preprocessing may account for chip-to-chip and batch-to-batch variations and outliers need to be removed prior to analysis. Specificity and sensitivity of the final results are best achieved following this identification and removal with standard genomic analysis techniques. Future tools may benefit from the use of standard tools of measurement.

  15. Enhancing interdisciplinary mathematics and biology education: a microarray data analysis course bridging these disciplines.

    PubMed

    Tra, Yolande V; Evans, Irene M

    2010-01-01

    BIO2010 put forth the goal of improving the mathematical educational background of biology students. The analysis and interpretation of microarray high-dimensional data can be very challenging and is best done by a statistician and a biologist working and teaching in a collaborative manner. We set up such a collaboration and designed a course on microarray data analysis. We started using Genome Consortium for Active Teaching (GCAT) materials and Microarray Genome and Clustering Tool software and added R statistical software along with Bioconductor packages. In response to student feedback, one microarray data set was fully analyzed in class, starting from preprocessing to gene discovery to pathway analysis using the latter software. A class project was to conduct a similar analysis where students analyzed their own data or data from a published journal paper. This exercise showed the impact that filtering, preprocessing, and different normalization methods had on gene inclusion in the final data set. We conclude that this course achieved its goals to equip students with skills to analyze data from a microarray experiment. We offer our insight about collaborative teaching as well as how other faculty might design and implement a similar interdisciplinary course.

  16. Microarray data analysis for differential expression: a tutorial.

    PubMed

    Suárez, Erick; Burguete, Ana; Mclachlan, Geoffrey J

    2009-06-01

    DNA microarray is a technology that simultaneously evaluates quantitative measurements for the expression of thousands of genes. DNA microarrays have been used to assess gene expression between groups of cells of different organs or different populations. In order to understand the role and function of the genes, one needs the complete information about their mRNA transcripts and proteins. Unfortunately, exploring the protein functions is very difficult, due to their unique 3-dimentional complicated structure. To overcome this difficulty, one may concentrate on the mRNA molecules produced by the gene expression. In this paper, we describe some of the methods for preprocessing data for gene expression and for pairwise comparison from genomic experiments. Previous studies to assess the efficiency of different methods for pairwise comparisons have found little agreement in the lists of significant genes. Finally, we describe the procedures to control false discovery rates, sample size approach for these experiments, and available software for microarray data analysis. This paper is written for those professionals who are new in microarray data analysis for differential expression and want to have an overview of the specific steps or the different approaches for this sort of analysis.

  17. Microarrays for Undergraduate Classes

    ERIC Educational Resources Information Center

    Hancock, Dale; Nguyen, Lisa L.; Denyer, Gareth S.; Johnston, Jill M.

    2006-01-01

    A microarray experiment is presented that, in six laboratory sessions, takes undergraduate students from the tissue sample right through to data analysis. The model chosen, the murine erythroleukemia cell line, can be easily cultured in sufficient quantities for class use. Large changes in gene expression can be induced in these cells by…

  18. Protein Microarray Technology

    PubMed Central

    Hall, David A.; Ptacek, Jason

    2007-01-01

    Protein chips have emerged as a promising approach for a wide variety of applications including the identification of protein-protein interactions, protein-phospholipid interactions, small molecule targets, and substrates of proteins kinases. They can also be used for clinical diagnostics and monitoring disease states. This article reviews current methods in the generation and applications of protein microarrays. PMID:17126887

  19. Microarrays for Undergraduate Classes

    ERIC Educational Resources Information Center

    Hancock, Dale; Nguyen, Lisa L.; Denyer, Gareth S.; Johnston, Jill M.

    2006-01-01

    A microarray experiment is presented that, in six laboratory sessions, takes undergraduate students from the tissue sample right through to data analysis. The model chosen, the murine erythroleukemia cell line, can be easily cultured in sufficient quantities for class use. Large changes in gene expression can be induced in these cells by…

  20. Shared probe design and existing microarray reanalysis using PICKY.

    PubMed

    Chou, Hui-Hsien

    2010-04-20

    Large genomes contain families of highly similar genes that cannot be individually identified by microarray probes. This limitation is due to thermodynamic restrictions and cannot be resolved by any computational method. Since gene annotations are updated more frequently than microarrays, another common issue facing microarray users is that existing microarrays must be routinely reanalyzed to determine probes that are still useful with respect to the updated annotations. PICKY 2.0 can design shared probes for sets of genes that cannot be individually identified using unique probes. PICKY 2.0 uses novel algorithms to track sharable regions among genes and to strictly distinguish them from other highly similar but nontarget regions during thermodynamic comparisons. Therefore, PICKY does not sacrifice the quality of shared probes when choosing them. The latest PICKY 2.1 includes the new capability to reanalyze existing microarray probes against updated gene sets to determine probes that are still valid to use. In addition, more precise nonlinear salt effect estimates and other improvements are added, making PICKY 2.1 more versatile to microarray users. Shared probes allow expressed gene family members to be detected; this capability is generally more desirable than not knowing anything about these genes. Shared probes also enable the design of cross-genome microarrays, which facilitate multiple species identification in environmental samples. The new nonlinear salt effect calculation significantly increases the precision of probes at a lower buffer salt concentration, and the probe reanalysis function improves existing microarray result interpretations.

  1. An efficient data preprocessing approach for large scale medical data mining.

    PubMed

    Hu, Ya-Han; Lin, Wei-Chao; Tsai, Chih-Fong; Ke, Shih-Wen; Chen, Chih-Wen

    2015-01-01

    The size of medical datasets is usually very large, which directly affects the computational cost of the data mining process. Instance selection is a data preprocessing step in the knowledge discovery process, which can be employed to reduce storage requirements while also maintaining the mining quality. This process aims to filter out outliers (or noisy data) from a given (training) dataset. However, when the dataset is very large in size, more time is required to accomplish the instance selection task. In this paper, we introduce an efficient data preprocessing approach (EDP), which is composed of two steps. The first step is based on training a model over a small amount of training data after preforming instance selection. The model is then used to identify the rest of the large amount of training data. Experiments are conducted based on two medical datasets for breast cancer and protein homology prediction problems that contain over 100000 data samples. In addition, three well-known instance selection algorithms are used, IB3, DROP3, and genetic algorithms. On the other hand, three popular classification techniques are used to construct the learning models for comparison, namely the CART decision tree, k-nearest neighbor (k-NN), and support vector machine (SVM). The results show that our proposed approach not only reduces the computational cost by nearly a factor of two or three over three other state-of-the-art algorithms, but also maintains the final classification accuracy. To perform instance selection over large scale medical datasets, it requires a large computational cost to directly execute existing instance selection algorithms. Our proposed EDP approach solves this problem by training a learning model to recognize good and noisy data. To consider both computational complexity and final classification accuracy, the proposed EDP has been demonstrated its efficiency and effectiveness in the large scale instance selection problem.

  2. OPSN: The IMS COMSYS 1 and 2 Data Preprocessing System.

    ERIC Educational Resources Information Center

    Yu, John

    The Instructional Management System (IMS) developed by the Southwest Regional Laboratory (SWRL) processes student and teacher-generated data through the use of an optical scanner that produces a magnetic tape (Scan Tape) for input to IMS. A series of computer routines, OPSN, preprocesses the Scan Tape and prepares the data for transmission to the…

  3. The Minimal Preprocessing Pipelines for the Human Connectome Project

    PubMed Central

    Glasser, Matthew F.; Sotiropoulos, Stamatios N; Wilson, J Anthony; Coalson, Timothy S; Fischl, Bruce; Andersson, Jesper L; Xu, Junqian; Jbabdi, Saad; Webster, Matthew; Polimeni, Jonathan R; Van Essen, David C; Jenkinson, Mark

    2013-01-01

    The Human Connectome Project (HCP) faces the challenging task of bringing multiple magnetic resonance imaging (MRI) modalities together in a common automated preprocessing framework across a large cohort of subjects. The MRI data acquired by the HCP differ in many ways from data acquired on conventional 3 Tesla scanners and often require newly developed preprocessing methods. We describe the minimal preprocessing pipelines for structural, functional, and diffusion MRI that were developed by the HCP to accomplish many low level tasks, including spatial artifact/distortion removal, surface generation, cross-modal registration, and alignment to standard space. These pipelines are specially designed to capitalize on the high quality data offered by the HCP. The final standard space makes use of a recently introduced CIFTI file format and the associated grayordinates spatial coordinate system. This allows for combined cortical surface and subcortical volume analyses while reducing the storage and processing requirements for high spatial and temporal resolution data. Here, we provide the minimum image acquisition requirements for the HCP minimal preprocessing pipelines and additional advice for investigators interested in replicating the HCP’s acquisition protocols or using these pipelines. Finally, we discuss some potential future improvements for the pipelines. PMID:23668970

  4. OPSN: The IMS COMSYS 1 and 2 Data Preprocessing System.

    ERIC Educational Resources Information Center

    Yu, John

    The Instructional Management System (IMS) developed by the Southwest Regional Laboratory (SWRL) processes student and teacher-generated data through the use of an optical scanner that produces a magnetic tape (Scan Tape) for input to IMS. A series of computer routines, OPSN, preprocesses the Scan Tape and prepares the data for transmission to the…

  5. Biolog phenotype microarrays.

    PubMed

    Shea, April; Wolcott, Mark; Daefler, Simon; Rozak, David A

    2012-01-01

    Phenotype microarrays nicely complement traditional genomic, transcriptomic, and proteomic analysis by offering opportunities for researchers to ground microbial systems analysis and modeling in a broad yet quantitative assessment of the organism's physiological response to different metabolites and environments. Biolog phenotype assays achieve this by coupling tetrazolium dyes with minimally defined nutrients to measure the impact of hundreds of carbon, nitrogen, phosphorous, and sulfur sources on redox reactions that result from compound-induced effects on the electron transport chain. Over the years, we have used Biolog's reproducible and highly sensitive assays to distinguish closely related bacterial isolates, to understand their metabolic differences, and to model their metabolic behavior using flux balance analysis. This chapter describes Biolog phenotype microarray system components, reagents, and methods, particularly as they apply to bacterial identification, characterization, and metabolic analysis.

  6. Adjustment method for microarray data generated using two-cycle RNA labeling protocol.

    PubMed

    Wang, Fugui; Chen, Rui; Ji, Dong; Bai, Shunong; Qian, Minping; Deng, Minghua

    2013-01-16

    Microarray technology is widely utilized for monitoring the expression changes of thousands of genes simultaneously. However, the requirement of relatively large amount of RNA for labeling and hybridization makes it difficult to perform microarray experiments with limited biological materials, thus leads to the development of many methods for preparing and amplifying mRNA. It is addressed that amplification methods usually bring bias, which may strongly hamper the following interpretation of the results. A big challenge is how to correct for the bias before further analysis. In this article, we observed the bias in rice gene expression microarray data generated with the Affymetrix one-cycle, two-cycle RNA labeling protocols, followed by validation with Real Time PCR. Based on these data, we proposed a statistical framework to model the processes of mRNA two-cycle linear amplification, and established a linear model for probe level correction. Maximum Likelihood Estimation (MLE) was applied to perform robust estimation of the Retaining Rate for each probe. After bias correction, some known pre-processing methods, such as PDNN, could be combined to finish preprocessing. Then, we evaluated our model and the results suggest that our model can effectively increase the quality of the microarray raw data: (i) Decrease the Coefficient of Variation for PM intensities of probe sets; (ii) Distinguish the microarray samples of five stages for rice stamen development more clearly; (iii) Improve the correlation coefficients among stamen microarray samples. We also discussed the necessity of model adjustment by comparing with another simple adjustment method. We conclude that the adjustment model is necessary and could effectively increase the quality of estimation for gene expression from the microarray raw data.

  7. Analyzing Microarray Data.

    PubMed

    Hung, Jui-Hung; Weng, Zhiping

    2017-03-01

    Because there is no widely used software for analyzing RNA-seq data that has a graphical user interface, this protocol provides an example of analyzing microarray data using Babelomics. This analysis entails performing quantile normalization and then detecting differentially expressed genes associated with the transgenesis of a human oncogene c-Myc in mice. Finally, hierarchical clustering is performed on the differentially expressed genes using the Cluster program, and the results are visualized using TreeView.

  8. Membrane-based microarrays

    NASA Astrophysics Data System (ADS)

    Dawson, Elliott P.; Hudson, James; Steward, John; Donnell, Philip A.; Chan, Wing W.; Taylor, Richard F.

    1999-11-01

    Microarrays represent a new approach to the rapid detection and identification of analytes. Studies to date have shown that the immobilization of receptor molecules (such as DNA, oligonucleotides, antibodies, enzymes and binding proteins) onto silicon and polymeric substrates can result in arrays able to detect hundreds of analytes in a single step. The formation of the receptor/analyte complex can, itself, lead to detection, or the complex can be interrogated through the use of fluorescent, chemiluminescent or radioactive probes and ligands.

  9. Fingerprint verification by correlation using wavelet compression of preprocessing digital images

    NASA Astrophysics Data System (ADS)

    Daza, Yaileth Morales; Torres, Cesar O.

    2008-04-01

    In this paper, we implement a practical digital security system that combines digital correlation, wavelet compression with preprocessing digital images; we applied these biometric system for fingerprint verification or autentication, that it has as pattern unique of recognition the Fingerprint. The system has a biometric sensor, it is charge to capture the images and to digitize them. These images are preprocessing using wavelet-based image compression and are skeleting for to stand out elements of the image that are considered of greater importance, eliminating the undesired signals, that because of the method of acquisition, or the conditions under which it was captured they appear in the same one. This images are stored in the data base, which to the being passed through diferent algorithms makes a compound filter, that serves as reference or pattern fingerprint for making the comparisons. These comparisons are showing in a graph where are observed amplitude diferent for each one from fingerprint, this make with help Fourier transformed and the correlation operations, that allows to quantify the degree of similarity between two images. Depending on the results thrown by the system at the time of comparing a new fingerprint with the compound filter the access will be allowed or not the user.

  10. Molecular diagnosis and prognosis with DNA microarrays.

    PubMed

    Wiltgen, Marco; Tilz, Gernot P

    2011-05-01

    Microarray analysis makes it possible to determine thousands of gene expression values simultaneously. Changes in gene expression, as a response to diseases, can be detected allowing a better understanding and differentiation of diseases at a molecular level. By comparing different kinds of tissue, for example healthy tissue and cancer tissue, the microarray analysis indicates induced gene activity, repressed gene activity or when there is no change in the gene activity level. Fundamental patterns in gene expression are extracted by several clustering and machine learning algorithms. Certain kinds of cancer can be divided into subtypes, with different clinical outcomes, by their specific gene expression patterns. This enables a better diagnosis and tailoring of individual patient treatments.

  11. Linguistic Preprocessing and Tagging for Problem Report Trend Analysis

    NASA Technical Reports Server (NTRS)

    Beil, Robert J.; Malin, Jane T.

    2012-01-01

    Mr. Robert Beil, Systems Engineer at Kennedy Space Center (KSC), requested the NASA Engineering and Safety Center (NESC) develop a prototype tool suite that combines complementary software technology used at Johnson Space Center (JSC) and KSC for problem report preprocessing and semantic tag extraction, to improve input to data mining and trend analysis. This document contains the outcome of the assessment and the Findings, Observations and NESC Recommendations.

  12. Integration of geometric modeling and advanced finite element preprocessing

    NASA Technical Reports Server (NTRS)

    Shephard, Mark S.; Finnigan, Peter M.

    1987-01-01

    The structure to a geometry based finite element preprocessing system is presented. The key features of the system are the use of geometric operators to support all geometric calculations required for analysis model generation, and the use of a hierarchic boundary based data structure for the major data sets within the system. The approach presented can support the finite element modeling procedures used today as well as the fully automated procedures under development.

  13. Empirical evaluation of oligonucleotide probe selection for DNA microarrays.

    PubMed

    Mulle, Jennifer G; Patel, Viren C; Warren, Stephen T; Hegde, Madhuri R; Cutler, David J; Zwick, Michael E

    2010-03-29

    DNA-based microarrays are increasingly central to biomedical research. Selecting oligonucleotide sequences that will behave consistently across experiments is essential to the design, production and performance of DNA microarrays. Here our aim was to improve on probe design parameters by empirically and systematically evaluating probe performance in a multivariate context. We used experimental data from 19 array CGH hybridizations to assess the probe performance of 385,474 probes tiled in the Duchenne muscular dystrophy (DMD) region of the X chromosome. Our results demonstrate that probe melting temperature, single nucleotide polymorphisms (SNPs), and homocytosine motifs all have a strong effect on probe behavior. These findings, when incorporated into future microarray probe selection algorithms, may improve microarray performance for a wide variety of applications.

  14. A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps

    PubMed Central

    Tugizimana, Fidele; Steenkamp, Paul A.; Piater, Lizelle A.; Dubery, Ian A.

    2016-01-01

    Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the “exhaustive” extraction of information from these metabolomic datasets is still a non-trivial undertaking. A conversation on data mining strategies for a maximal information extraction from metabolomic data is needed. Using a liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomic dataset, this study explored the influence of collection parameters in the data pre-processing step, scaling and data transformation on the statistical models generated, and feature selection, thereafter. Data obtained in positive mode generated from a LC-MS-based untargeted metabolomic study (sorghum plants responding dynamically to infection by a fungal pathogen) were used. Raw data were pre-processed with MarkerLynxTM software (Waters Corporation, Manchester, UK). Here, two parameters were varied: the intensity threshold (50–100 counts) and the mass tolerance (0.005–0.01 Da). After the pre-processing, the datasets were imported into SIMCA (Umetrics, Umea, Sweden) for more data cleaning and statistical modeling. In addition, different scaling (unit variance, Pareto, etc.) and data transformation (log and power) methods were explored. The results showed that the pre-processing parameters (or algorithms) influence the output dataset with regard to the number of defined features. Furthermore, the study demonstrates that the pre-treatment of data prior to statistical modeling affects the subspace approximation outcome: e.g., the amount of variation in X-data that the model can explain and predict. The pre-processing and pre-treatment steps subsequently influence the number of statistically significant extracted/selected features (variables). Thus, as informed by the results, to maximize the value of untargeted metabolomic data

  15. A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps.

    PubMed

    Tugizimana, Fidele; Steenkamp, Paul A; Piater, Lizelle A; Dubery, Ian A

    2016-11-03

    Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the "exhaustive" extraction of information from these metabolomic datasets is still a non-trivial undertaking. A conversation on data mining strategies for a maximal information extraction from metabolomic data is needed. Using a liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomic dataset, this study explored the influence of collection parameters in the data pre-processing step, scaling and data transformation on the statistical models generated, and feature selection, thereafter. Data obtained in positive mode generated from a LC-MS-based untargeted metabolomic study (sorghum plants responding dynamically to infection by a fungal pathogen) were used. Raw data were pre-processed with MarkerLynx(TM) software (Waters Corporation, Manchester, UK). Here, two parameters were varied: the intensity threshold (50-100 counts) and the mass tolerance (0.005-0.01 Da). After the pre-processing, the datasets were imported into SIMCA (Umetrics, Umea, Sweden) for more data cleaning and statistical modeling. In addition, different scaling (unit variance, Pareto, etc.) and data transformation (log and power) methods were explored. The results showed that the pre-processing parameters (or algorithms) influence the output dataset with regard to the number of defined features. Furthermore, the study demonstrates that the pre-treatment of data prior to statistical modeling affects the subspace approximation outcome: e.g., the amount of variation in X-data that the model can explain and predict. The pre-processing and pre-treatment steps subsequently influence the number of statistically significant extracted/selected features (variables). Thus, as informed by the results, to maximize the value of untargeted metabolomic data, understanding

  16. Application of filtering techniques in preprocessing magnetic data

    NASA Astrophysics Data System (ADS)

    Liu, Haijun; Yi, Yongping; Yang, Hongxia; Hu, Guochuang; Liu, Guoming

    2010-08-01

    High precision magnetic exploration is a popular geophysical technique for its simplicity and its effectiveness. The explanation in high precision magnetic exploration is always a difficulty because of the existence of noise and disturbance factors, so it is necessary to find an effective preprocessing method to get rid of the affection of interference factors before further processing. The common way to do this work is by filtering. There are many kinds of filtering methods. In this paper we introduced in detail three popular kinds of filtering techniques including regularized filtering technique, sliding averages filtering technique, compensation smoothing filtering technique. Then we designed the work flow of filtering program based on these techniques and realized it with the help of DELPHI. To check it we applied it to preprocess magnetic data of a certain place in China. Comparing the initial contour map with the filtered contour map, we can see clearly the perfect effect our program. The contour map processed by our program is very smooth and the high frequency parts of data are disappeared. After filtering, we separated useful signals and noisy signals, minor anomaly and major anomaly, local anomaly and regional anomaly. It made us easily to focus on the useful information. Our program can be used to preprocess magnetic data. The results showed the effectiveness of our program.

  17. A Stereo Music Preprocessing Scheme for Cochlear Implant Users.

    PubMed

    Buyens, Wim; van Dijk, Bas; Wouters, Jan; Moonen, Marc

    2015-10-01

    Listening to music is still one of the more challenging aspects of using a cochlear implant (CI) for most users. Simple musical structures, a clear rhythm/beat, and lyrics that are easy to follow are among the top factors contributing to music appreciation for CI users. Modifying the audio mix of complex music potentially improves music enjoyment in CI users. A stereo music preprocessing scheme is described in which vocals, drums, and bass are emphasized based on the representation of the harmonic and the percussive components in the input spectrogram, combined with the spatial allocation of instruments in typical stereo recordings. The scheme is assessed with postlingually deafened CI subjects (N = 7) using pop/rock music excerpts with different complexity levels. The scheme is capable of modifying relative instrument level settings, with the aim of improving music appreciation in CI users, and allows individual preference adjustments. The assessment with CI subjects confirms the preference for more emphasis on vocals, drums, and bass as offered by the preprocessing scheme, especially for songs with higher complexity. The stereo music preprocessing scheme has the potential to improve music enjoyment in CI users by modifying the audio mix in widespread (stereo) music recordings. Since music enjoyment in CI users is generally poor, this scheme can assist the music listening experience of CI users as a training or rehabilitation tool.

  18. Face recognition by using optical correlator with wavelet preprocessing

    NASA Astrophysics Data System (ADS)

    Strzelecki, Jacek; Chalasinska-Macukow, Katarzyna

    2004-08-01

    The method of face recognition by using optical correlator with wavelet preprocessing is presented. The wavelet transform is used to improve the performance of standard Vander Lugt correlator with phase only filter (POF). The influence of various wavelet transforms of images of human faces on the recognition results has been analyzed. The quality of the face recognition process was tested according to two criteria: the peak to correlation energy ratio (PCE), and the discrimination capability (DC). Additionally, proper localization of correlation peak has been controlled. During the preprocessing step a set of three wavelets -- mexican hat, Haar, and Gabor wavelets, with various scales was used. In addition, Gabor wavelets were tested for various orientation angles. During the recognition procedure the input scene and POF are transformed by the same wavelet. We show the results of the computer simulation for a variety of images of human faces: original images without any distortions, noisy images, and images with non-uniform light ilumination. A comparison of results of recognition obtained with and without wavelet preprocessing is given.

  19. Review of feed forward neural network classification preprocessing techniques

    NASA Astrophysics Data System (ADS)

    Asadi, Roya; Kareem, Sameem Abdul

    2014-06-01

    The best feature of artificial intelligent Feed Forward Neural Network (FFNN) classification models is learning of input data through their weights. Data preprocessing and pre-training are the contributing factors in developing efficient techniques for low training time and high accuracy of classification. In this study, we investigate and review the powerful preprocessing functions of the FFNN models. Currently initialization of the weights is at random which is the main source of problems. Multilayer auto-encoder networks as the latest technique like other related techniques is unable to solve the problems. Weight Linear Analysis (WLA) is a combination of data pre-processing and pre-training to generate real weights through the use of normalized input values. The FFNN model by using the WLA increases classification accuracy and improve training time in a single epoch without any training cycle, the gradient of the mean square error function, updating the weights. The results of comparison and evaluation show that the WLA is a powerful technique in the FFNN classification area yet.

  20. Improving Drift Correction by Double Projection Preprocessing in Gas Sensor Arrays

    NASA Astrophysics Data System (ADS)

    Padilla, M.; Perera, A.; Montoliu, I.; Chaudry, A.; Persaud, K.; Marco, S.

    2009-05-01

    It is well known that gas chemical sensors are strongly affected by drift. Drift consist on changes in sensors responses along the time, which make that initial statistical models for gas or odor recognition become useless after a period of time of about weeks. Gas sensor arrays based instruments periodically need calibrations that are expensive and laborious. Many different statistical methods have been proposed to extend time between recalibrations. In this work, a simple preprocessing technique based on a double projection is proposed as a prior step to a posterior drift correction algorithm (in this particular case, Direct Orthogonal Signal Correction). This method highly improves the time stability of data in relation with the one obtained by using only such drift correction method. The performance of this technique will be evaluated on a dataset composed by measurements of three analytes by a polymer sensor array along ten months.

  1. A glance at DNA microarray technology and applications.

    PubMed

    Saei, Amir Ata; Omidi, Yadollah

    2011-01-01

    Because of huge impacts of "OMICS" technologies in life sciences, many researchers aim to implement such high throughput approach to address cellular and/or molecular functions in response to any influential intervention in genomics, proteomics, or metabolomics levels. However, in many cases, use of such technologies often encounters some cybernetic difficulties in terms of knowledge extraction from a bunch of data using related softwares. In fact, there is little guidance upon data mining for novices. The main goal of this article is to provide a brief review on different steps of microarray data handling and mining for novices and at last to introduce different PC and/or web-based softwares that can be used in preprocessing and/or data mining of microarray data. To pursue such aim, recently published papers and microarray softwares were reviewed. It was found that defining the true place of the genes in cell networks is the main phase in our understanding of programming and functioning of living cells. This can be obtained with global/selected gene expression profiling. Studying the regulation patterns of genes in groups, using clustering and classification methods helps us understand different pathways in the cell, their functions, regulations and the way one component in the system affects the other one. These networks can act as starting points for data mining and hypothesis generation, helping us reverse engineer.

  2. A Glance at DNA Microarray Technology and Applications

    PubMed Central

    Saei, Amir Ata; Omidi, Yadollah

    2011-01-01

    Introduction Because of huge impacts of “OMICS” technologies in life sciences, many researchers aim to implement such high throughput approach to address cellular and/or molecular functions in response to any influential intervention in genomics, proteomics, or metabolomics levels. However, in many cases, use of such technologies often encounters some cybernetic difficulties in terms of knowledge extraction from a bunch of data using related softwares. In fact, there is little guidance upon data mining for novices. The main goal of this article is to provide a brief review on different steps of microarray data handling and mining for novices and at last to introduce different PC and/or web-based softwares that can be used in preprocessing and/or data mining of microarray data. Methods To pursue such aim, recently published papers and microarray softwares were reviewed. Results It was found that defining the true place of the genes in cell networks is the main phase in our understanding of programming and functioning of living cells. This can be obtained with global/selected gene expression profiling. Conclusion Studying the regulation patterns of genes in groups, using clustering and classification methods helps us understand different pathways in the cell, their functions, regulations and the way one component in the system affects the other one. These networks can act as starting points for data mining and hypothesis generation, helping us reverse engineer. PMID:23678411

  3. Surface chemistries for antibody microarrays

    SciTech Connect

    Seurynck-Servoss, Shannon L.; Baird, Cheryl L.; Rodland, Karin D.; Zangar, Richard C.

    2007-05-01

    Enzyme-linked immunosorbent assay (ELISA) microarrays promise to be a powerful tool for the detection of disease biomarkers. The original technology for printing ELISA microarray chips and capturing antibodies on slides was derived from the DNA microarray field. However, due to the need to maintain antibody structure and function when immobilized, surface chemistries used for DNA microarrays are not always appropriate for ELISA microarrays. In order to identify better surface chemistries for antibody capture, a number of commercial companies and academic research groups have developed new slide types that could improve antibody function in microarray applications. In this review we compare and contrast the commercially available slide chemistries, as well as highlight some promising recent advances in the field.

  4. Ecotoxicogenomics: Microarray interlaboratory comparability.

    PubMed

    Vidal-Dorsch, Doris E; Bay, Steven M; Moore, Shelly; Layton, Blythe; Mehinto, Alvine C; Vulpe, Chris D; Brown-Augustine, Marianna; Loguinov, Alex; Poynton, Helen; Garcia-Reyero, Natàlia; Perkins, Edward J; Escalon, Lynn; Denslow, Nancy D; Cristina, Colli-Dula R; Doan, Tri; Shukradas, Shweta; Bruno, Joy; Brown, Lorraine; Van Agglen, Graham; Jackman, Paula; Bauer, Megan

    2016-02-01

    Transcriptomic analysis can complement traditional ecotoxicology data by providing mechanistic insight, and by identifying sub-lethal organismal responses and contaminant classes underlying observed toxicity. Before transcriptomic information can be used in monitoring and risk assessment, it is necessary to determine its reproducibility and detect key steps impacting the reliable identification of differentially expressed genes. A custom 15K-probe microarray was used to conduct transcriptomics analyses across six laboratories with estuarine amphipods exposed to cyfluthrin-spiked or control sediments (10 days). Two sample types were generated, one consisted of total RNA extracts (Ex) from exposed and control samples (extracted by one laboratory) and the other consisted of exposed and control whole body amphipods (WB) from which each laboratory extracted RNA. Our findings indicate that gene expression microarray results are repeatable. Differentially expressed data had a higher degree of repeatability across all laboratories in samples with similar RNA quality (Ex) when compared to WB samples with more variable RNA quality. Despite such variability a subset of genes were consistently identified as differentially expressed across all laboratories and sample types. We found that the differences among the individual laboratory results can be attributed to several factors including RNA quality and technical expertise, but the overall results can be improved by following consistent protocols and with appropriate training. Published by Elsevier Ltd.

  5. The Effect of LC-MS Data Preprocessing Methods on the Selection of Plasma Biomarkers in Fed vs. Fasted Rats

    PubMed Central

    Gürdeniz, Gözde; Kristensen, Mette; Skov, Thomas; Dragsted, Lars O.

    2012-01-01

    The metabolic composition of plasma is affected by time passed since the last meal and by individual variation in metabolite clearance rates. Rat plasma in fed and fasted states was analyzed with liquid chromatography quadrupole-time-of-flight mass spectrometry (LC-QTOF) for an untargeted investigation of these metabolite patterns. The dataset was used to investigate the effect of data preprocessing on biomarker selection using three different softwares, MarkerLynxTM, MZmine, XCMS along with a customized preprocessing method that performs binning of m/z channels followed by summation through retention time. Direct comparison of selected features representing the fed or fasted state showed large differences between the softwares. Many false positive markers were obtained from custom data preprocessing compared with dedicated softwares while MarkerLynxTM provided better coverage of markers. However, marker selection was more reliable with the gap filling (or peak finding) algorithms present in MZmine and XCMS. Further identification of the putative markers revealed that many of the differences between the markers selected were due to variations in features representing adducts or daughter ions of the same metabolites or of compounds from the same chemical subclasses, e.g., lyso-phosphatidylcholines (LPCs) and lyso-phosphatidylethanolamines (LPEs). We conclude that despite considerable differences in the performance of the preprocessing tools we could extract the same biological information by any of them. Carnitine, branched-chain amino acids, LPCs and LPEs were identified by all methods as markers of the fed state whereas acetylcarnitine was abundant during fasting in rats. PMID:24957369

  6. A new approach to pre-processing digital image for wavelet-based watermark

    NASA Astrophysics Data System (ADS)

    Agreste, Santa; Andaloro, Guido

    2008-11-01

    The growth of the Internet has increased the phenomenon of digital piracy, in multimedia objects, like software, image, video, audio and text. Therefore it is strategic to individualize and to develop methods and numerical algorithms, which are stable and have low computational cost, that will allow us to find a solution to these problems. We describe a digital watermarking algorithm for color image protection and authenticity: robust, not blind, and wavelet-based. The use of Discrete Wavelet Transform is motivated by good time-frequency features and a good match with Human Visual System directives. These two combined elements are important for building an invisible and robust watermark. Moreover our algorithm can work with any image, thanks to the step of pre-processing of the image that includes resize techniques that adapt to the size of the original image for Wavelet transform. The watermark signal is calculated in correlation with the image features and statistic properties. In the detection step we apply a re-synchronization between the original and watermarked image according to the Neyman-Pearson statistic criterion. Experimentation on a large set of different images has been shown to be resistant against geometric, filtering, and StirMark attacks with a low rate of false alarm.

  7. Microarrays in cancer research.

    PubMed

    Grant, Geraldine M; Fortney, Amanda; Gorreta, Francesco; Estep, Michael; Del Giacco, Luca; Van Meter, Amy; Christensen, Alan; Appalla, Lakshmi; Naouar, Chahla; Jamison, Curtis; Al-Timimi, Ali; Donovan, Jean; Cooper, James; Garrett, Carleton; Chandhoke, Vikas

    2004-01-01

    Microarray technology has presented the scientific community with a compelling approach that allows for simultaneous evaluation of all cellular processes at once. Cancer, being one of the most challenging diseases due to its polygenic nature, presents itself as a perfect candidate for evaluation by this approach. Several recent articles have provided significant insight into the strengths and limitations of microarrays. Nevertheless, there are strong indications that this approach will provide new molecular markers that could be used in diagnosis and prognosis of cancers. To achieve these goals it is essential that there is a seamless integration of clinical and molecular biological data that allows us to elucidate genes and pathways involved in various cancers. To this effect we are currently evaluating gene expression profiles in human brain, ovarian, breast and hematopoetic, lung, colorectal, head and neck and biliary tract cancers. To address the issues we have a joint team of scientists, doctors and computer scientists from two Virginia Universities and a major healthcare provider. The study has been divided into several focus groups that include; Tissue Bank Clinical & Pathology Laboratory Data, Chip Fabrication, QA/QC, Tissue Devitalization, Database Design and Data Analysis, using multiple microarray platforms. Currently over 300 consenting patients have been enrolled in the study with the largest number being that of breast cancer patients. Clinical data on each patient is being compiled into a secure and interactive relational database and integration of these data elements will be accomplished by a common programming interface. This clinical database contains several key parameters on each patient including demographic (risk factors, nutrition, co-morbidity, familial history), histopathology (non genetic predictors), tumor, treatment and follow-up information. Gene expression data derived from the tissue samples will be linked to this database, which

  8. The Genopolis Microarray Database

    PubMed Central

    Splendiani, Andrea; Brandizi, Marco; Even, Gael; Beretta, Ottavio; Pavelka, Norman; Pelizzola, Mattia; Mayhaus, Manuel; Foti, Maria; Mauri, Giancarlo; Ricciardi-Castagnoli, Paola

    2007-01-01

    Background Gene expression databases are key resources for microarray data management and analysis and the importance of a proper annotation of their content is well understood. Public repositories as well as microarray database systems that can be implemented by single laboratories exist. However, there is not yet a tool that can easily support a collaborative environment where different users with different rights of access to data can interact to define a common highly coherent content. The scope of the Genopolis database is to provide a resource that allows different groups performing microarray experiments related to a common subject to create a common coherent knowledge base and to analyse it. The Genopolis database has been implemented as a dedicated system for the scientific community studying dendritic and macrophage cells functions and host-parasite interactions. Results The Genopolis Database system allows the community to build an object based MIAME compliant annotation of their experiments and to store images, raw and processed data from the Affymetrix GeneChip® platform. It supports dynamical definition of controlled vocabularies and provides automated and supervised steps to control the coherence of data and annotations. It allows a precise control of the visibility of the database content to different sub groups in the community and facilitates exports of its content to public repositories. It provides an interactive users interface for data analysis: this allows users to visualize data matrices based on functional lists and sample characterization, and to navigate to other data matrices defined by similarity of expression values as well as functional characterizations of genes involved. A collaborative environment is also provided for the definition and sharing of functional annotation by users. Conclusion The Genopolis Database supports a community in building a common coherent knowledge base and analyse it. This fills a gap between a local

  9. An Introduction to MAMA (Meta-Analysis of MicroArray data) System.

    PubMed

    Zhang, Zhe; Fenstermacher, David

    2005-01-01

    Analyzing microarray data across multiple experiments has been proven advantageous. To support this kind of analysis, we are developing a software system called MAMA (Meta-Analysis of MicroArray data). MAMA utilizes a client-server architecture with a relational database on the server-side for the storage of microarray datasets collected from various resources. The client-side is an application running on the end user's computer that allows the user to manipulate microarray data and analytical results locally. MAMA implementation will integrate several analytical methods, including meta-analysis within an open-source framework offering other developers the flexibility to plug in additional statistical algorithms.

  10. DNA Microarray-Based Diagnostics.

    PubMed

    Marzancola, Mahsa Gharibi; Sedighi, Abootaleb; Li, Paul C H

    2016-01-01

    The DNA microarray technology is currently a useful biomedical tool which has been developed for a variety of diagnostic applications. However, the development pathway has not been smooth and the technology has faced some challenges. The reliability of the microarray data and also the clinical utility of the results in the early days were criticized. These criticisms added to the severe competition from other techniques, such as next-generation sequencing (NGS), impacting the growth of microarray-based tests in the molecular diagnostic market.Thanks to the advances in the underlying technologies as well as the tremendous effort offered by the research community and commercial vendors, these challenges have mostly been addressed. Nowadays, the microarray platform has achieved sufficient standardization and method validation as well as efficient probe printing, liquid handling and signal visualization. Integration of various steps of the microarray assay into a harmonized and miniaturized handheld lab-on-a-chip (LOC) device has been a goal for the microarray community. In this respect, notable progress has been achieved in coupling the DNA microarray with the liquid manipulation microsystem as well as the supporting subsystem that will generate the stand-alone LOC device.In this chapter, we discuss the major challenges that microarray technology has faced in its almost two decades of development and also describe the solutions to overcome the challenges. In addition, we review the advancements of the technology, especially the progress toward developing the LOC devices for DNA diagnostic applications.

  11. Living-cell microarrays.

    PubMed

    Yarmush, Martin L; King, Kevin R

    2009-01-01

    Living cells are remarkably complex. To unravel this complexity, living-cell assays have been developed that allow delivery of experimental stimuli and measurement of the resulting cellular responses. High-throughput adaptations of these assays, known as living-cell microarrays, which are based on microtiter plates, high-density spotting, microfabrication, and microfluidics technologies, are being developed for two general applications: (a) to screen large-scale chemical and genomic libraries and (b) to systematically investigate the local cellular microenvironment. These emerging experimental platforms offer exciting opportunities to rapidly identify genetic determinants of disease, to discover modulators of cellular function, and to probe the complex and dynamic relationships between cells and their local environment.

  12. Preprocessing and Analysis of LC-MS-Based Proteomic Data

    PubMed Central

    Tsai, Tsung-Heng; Wang, Minkun; Ressom, Habtom W.

    2016-01-01

    Liquid chromatography coupled with mass spectrometry (LC-MS) has been widely used for profiling protein expression levels. This chapter is focused on LC-MS data preprocessing, which is a crucial step in the analysis of LC-MS based proteomics. We provide a high-level overview, highlight associated challenges, and present a step-by-step example for analysis of data from LC-MS based untargeted proteomic study. Furthermore, key procedures and relevant issues with the subsequent analysis by multiple reaction monitoring (MRM) are discussed. PMID:26519169

  13. Microarray oligonucleotide probe designer (MOPeD): A web service.

    PubMed

    Patel, Viren C; Mondal, Kajari; Shetty, Amol Carl; Horner, Vanessa L; Bedoyan, Jirair K; Martin, Donna; Caspary, Tamara; Cutler, David J; Zwick, Michael E

    2010-11-01

    Methods of genomic selection that combine high-density oligonucleotide microarrays with next-generation DNA sequencing allow investigators to characterize genomic variation in selected portions of complex eukaryotic genomes. Yet choosing which specific oligonucleotides to be use can pose a major technical challenge. To address this issue, we have developed a software package called MOPeD (Microarray Oligonucleotide Probe Designer), which automates the process of designing genomic selection microarrays. This web-based software allows individual investigators to design custom genomic selection microarrays optimized for synthesis with Roche NimbleGen's maskless photolithography. Design parameters include uniqueness of the probe sequences, melting temperature, hairpin formation, and the presence of single nucleotide polymorphisms. We generated probe databases for the human, mouse, and rhesus macaque genomes and conducted experimental validation of MOPeD-designed microarrays in human samples by sequencing the human X chromosome exome, where relevant sequence metrics indicated superior performance relative to a microarray designed by the Roche NimbleGen proprietary algorithm. We also performed validation in the mouse to identify known mutations contained within a 487-kb region from mouse chromosome 16, the mouse chromosome 16 exome (1.7 Mb), and the mouse chromosome 12 exome (3.3 Mb). Our results suggest that the open source MOPeD software package and website (http://moped.genetics.emory.edu/) will make a valuable resource for investigators in their sequence-based studies of complex eukaryotic genomes.

  14. A brief introduction to tiling microarrays: principles, concepts, and applications.

    PubMed

    Lemetre, Christophe; Zhang, Zhengdong D

    2013-01-01

    Technological achievements have always contributed to the advancement of biomedical research. It has never been more so than in recent times, when the development and application of innovative cutting-edge technologies have transformed biology into a data-rich quantitative science. This stunning revolution in biology primarily ensued from the emergence of microarrays over two decades ago. The completion of whole-genome sequencing projects and the advance in microarray manufacturing technologies enabled the development of tiling microarrays, which gave unprecedented genomic coverage. Since their first description, several types of application of tiling arrays have emerged, each aiming to tackle a different biological problem. Although numerous algorithms have already been developed to analyze microarray data, new method development is still needed not only for better performance but also for integration of available microarray data sets, which without doubt constitute one of the largest collections of biological data ever generated. In this chapter we first introduce the principles behind the emergence and the development of tiling microarrays, and then discuss with some examples how they are used to investigate different biological problems.

  15. The Longhorn Array Database (LAD): An Open-Source, MIAME compliant implementation of the Stanford Microarray Database (SMD)

    PubMed Central

    Killion, Patrick J; Sherlock, Gavin; Iyer, Vishwanath R

    2003-01-01

    Background The power of microarray analysis can be realized only if data is systematically archived and linked to biological annotations as well as analysis algorithms. Description The Longhorn Array Database (LAD) is a MIAME compliant microarray database that operates on PostgreSQL and Linux. It is a fully open source version of the Stanford Microarray Database (SMD), one of the largest microarray databases. LAD is available at Conclusions Our development of LAD provides a simple, free, open, reliable and proven solution for storage and analysis of two-color microarray data. PMID:12930545

  16. Multi-platform microarray integration research

    NASA Astrophysics Data System (ADS)

    Wu, Ronghui; Lu, Lei

    2017-08-01

    There are more and more tumor data from different laboratories and different technology platforms. Data from independent laboratory is often difficult to explain and unreliable. So there comes a challenging task which is to develop robust integration algorithms to integrate microarray data from various experiments and platforms. We studied the traditional integration algorithm, and found that it was effective to use Combat to integrate the data. Combat is characterized not only in the integration of small samples, but also in the case of large samples. It is comparable with other integration algorithms and performs well on various evaluation indicators. But we find that Combat is a combination of mean and merging variance for several batches, and integrate directly without fully considering the information from sample data, which we believe will ignore a single Batch characteristics and parameters are estimated to be biased. So an improved algorithm is proposed to find the mean and variance for a single batch, and when integrating, consider the first principal component of a single batch and segment the sample using the first principal component, and then integrated. In the experimental part, we first evaluate our algorithm with the commonly used evaluation index.

  17. A novel pre-processing technique for improving image quality in digital breast tomosynthesis.

    PubMed

    Kim, Hyeongseok; Lee, Taewon; Hong, Joonpyo; Sabir, Sohail; Lee, Jung-Ryun; Choi, Young Wook; Kim, Hak Hee; Chae, Eun Young; Cho, Seungryong

    2017-02-01

    Nonlinear pre-reconstruction processing of the projection data in computed tomography (CT) where accurate recovery of the CT numbers is important for diagnosis is usually discouraged, for such a processing would violate the physics of image formation in CT. However, one can devise a pre-processing step to enhance detectability of lesions in digital breast tomosynthesis (DBT) where accurate recovery of the CT numbers is fundamentally impossible due to the incompleteness of the scanned data. Since the detection of lesions such as micro-calcifications and mass in breasts is the purpose of using DBT, it is justified that a technique producing higher detectability of lesions is a virtue. A histogram modification technique was developed in the projection data domain. Histogram of raw projection data was first divided into two parts: One for the breast projection data and the other for background. Background pixel values were set to a single value that represents the boundary between breast and background. After that, both histogram parts were shifted by an appropriate amount of offset and the histogram-modified projection data were log-transformed. Filtered-backprojection (FBP) algorithm was used for image reconstruction of DBT. To evaluate performance of the proposed method, we computed the detectability index for the reconstructed images from clinically acquired data. Typical breast border enhancement artifacts were greatly suppressed and the detectability of calcifications and masses was increased by use of the proposed method. Compared to a global threshold-based post-reconstruction processing technique, the proposed method produced images of higher contrast without invoking additional image artifacts. In this work, we report a novel pre-processing technique that improves detectability of lesions in DBT and has potential advantages over the global threshold-based post-reconstruction processing technique. The proposed method not only increased the lesion detectability

  18. Development, Characterization and Experimental Validation of a Cultivated Sunflower (Helianthus annuus L.) Gene Expression Oligonucleotide Microarray

    PubMed Central

    Fernandez, Paula; Soria, Marcelo; Blesa, David; DiRienzo, Julio; Moschen, Sebastian; Rivarola, Maximo; Clavijo, Bernardo Jose; Gonzalez, Sergio; Peluffo, Lucila; Príncipi, Dario; Dosio, Guillermo; Aguirrezabal, Luis; García-García, Francisco; Conesa, Ana; Hopp, Esteban; Dopazo, Joaquín; Heinz, Ruth Amelia; Paniego, Norma

    2012-01-01

    Oligonucleotide-based microarrays with accurate gene coverage represent a key strategy for transcriptional studies in orphan species such as sunflower, H. annuus L., which lacks full genome sequences. The goal of this study was the development and functional annotation of a comprehensive sunflower unigene collection and the design and validation of a custom sunflower oligonucleotide-based microarray. A large scale EST (>130,000 ESTs) curation, assembly and sequence annotation was performed using Blast2GO (www.blast2go.de). The EST assembly comprises 41,013 putative transcripts (12,924 contigs and 28,089 singletons). The resulting Sunflower Unigen Resource (SUR version 1.0) was used to design an oligonucleotide-based Agilent microarray for cultivated sunflower. This microarray includes a total of 42,326 features: 1,417 Agilent controls, 74 control probes for sunflower replicated 10 times (740 controls) and 40,169 different non-control probes. Microarray performance was validated using a model experiment examining the induction of senescence by water deficit. Pre-processing and differential expression analysis of Agilent microarrays was performed using the Bioconductor limma package. The analyses based on p-values calculated by eBayes (p<0.01) allowed the detection of 558 differentially expressed genes between water stress and control conditions; from these, ten genes were further validated by qPCR. Over-represented ontologies were identified using FatiScan in the Babelomics suite. This work generated a curated and trustable sunflower unigene collection, and a custom, validated sunflower oligonucleotide-based microarray using Agilent technology. Both the curated unigene collection and the validated oligonucleotide microarray provide key resources for sunflower genome analysis, transcriptional studies, and molecular breeding for crop improvement. PMID:23110046

  19. Development, characterization and experimental validation of a cultivated sunflower (Helianthus annuus L.) gene expression oligonucleotide microarray.

    PubMed

    Fernandez, Paula; Soria, Marcelo; Blesa, David; DiRienzo, Julio; Moschen, Sebastian; Rivarola, Maximo; Clavijo, Bernardo Jose; Gonzalez, Sergio; Peluffo, Lucila; Príncipi, Dario; Dosio, Guillermo; Aguirrezabal, Luis; García-García, Francisco; Conesa, Ana; Hopp, Esteban; Dopazo, Joaquín; Heinz, Ruth Amelia; Paniego, Norma

    2012-01-01

    Oligonucleotide-based microarrays with accurate gene coverage represent a key strategy for transcriptional studies in orphan species such as sunflower, H. annuus L., which lacks full genome sequences. The goal of this study was the development and functional annotation of a comprehensive sunflower unigene collection and the design and validation of a custom sunflower oligonucleotide-based microarray. A large scale EST (>130,000 ESTs) curation, assembly and sequence annotation was performed using Blast2GO (www.blast2go.de). The EST assembly comprises 41,013 putative transcripts (12,924 contigs and 28,089 singletons). The resulting Sunflower Unigen Resource (SUR version 1.0) was used to design an oligonucleotide-based Agilent microarray for cultivated sunflower. This microarray includes a total of 42,326 features: 1,417 Agilent controls, 74 control probes for sunflower replicated 10 times (740 controls) and 40,169 different non-control probes. Microarray performance was validated using a model experiment examining the induction of senescence by water deficit. Pre-processing and differential expression analysis of Agilent microarrays was performed using the Bioconductor limma package. The analyses based on p-values calculated by eBayes (p<0.01) allowed the detection of 558 differentially expressed genes between water stress and control conditions; from these, ten genes were further validated by qPCR. Over-represented ontologies were identified using FatiScan in the Babelomics suite. This work generated a curated and trustable sunflower unigene collection, and a custom, validated sunflower oligonucleotide-based microarray using Agilent technology. Both the curated unigene collection and the validated oligonucleotide microarray provide key resources for sunflower genome analysis, transcriptional studies, and molecular breeding for crop improvement.

  20. Automatic and robust system for correcting microarray images' rotations and isolating spots.

    PubMed

    Wang, Anlei; Kaabouch, Naima; Hu, Wen-Chen

    2011-01-01

    Microarray images contain a large volume of genetic data in the form of thousands of spots that need to be extracted and analyzed using digital image processing. Automatic extraction, gridding, is therefore necessary to save time, to remove user-dependent variations, and, hence, to obtain repeatable results. In this research paper, an algorithm that involves four steps is proposed to efficiently grid microarray images. A set of real and synthetic microarray images of different sizes and degrees of rotations is used to assess the proposed algorithm, and its efficiency is compared with the efficiencies of other methods from the literature.

  1. An accelerated procedure for recursive feature ranking on microarray data.

    PubMed

    Furlanello, C; Serafini, M; Merler, S; Jurman, G

    2003-01-01

    We describe a new wrapper algorithm for fast feature ranking in classification problems. The Entropy-based Recursive Feature Elimination (E-RFE) method eliminates chunks of uninteresting features according to the entropy of the weights distribution of a SVM classifier. With specific regard to DNA microarray datasets, the method is designed to support computationally intensive model selection in classification problems in which the number of features is much larger than the number of samples. We test E-RFE on synthetic and real data sets, comparing it with other SVM-based methods. The speed-up obtained with E-RFE supports predictive modeling on high dimensional microarray data.

  2. [The net analyte preprocessing combined with radial basis partial least squares regression applied in noninvasive measurement of blood glucose].

    PubMed

    Li, Qing-Bo; Huang, Zheng-Wei

    2014-02-01

    In order to improve the prediction accuracy of quantitative analysis model in the near-infrared spectroscopy of blood glucose, this paper, by combining net analyte preprocessing (NAP) algorithm and radial basis functions partial least squares (RBFPLS) regression, builds a nonlinear model building method which is suitable for glucose measurement of human, named as NAP-RBFPLS. First, NAP is used to pre-process the near-infrared spectroscopy of blood glucose, in order to effectively extract the information which only relates to glucose signal from the original near-infrared spectra, so that it could effectively weaken the occasional correlation problems of the glucose changes and the interference factors which are caused by the absorption of water, albumin, hemoglobin, fat and other components of the blood in human body, the change of temperature of human body, the drift of measuring instruments, the changes of measuring environment, and the changes of measuring conditions; and then a nonlinear quantitative analysis model is built with the near-infrared spectroscopy data after NAP, in order to solve the nonlinear relationship between glucose concentrations and near-infrared spectroscopy which is caused by body strong scattering. In this paper, the new method is compared with other three quantitative analysis models building on partial least squares (PLS), net analyte preprocessing partial least squares (NAP-PLS) and RBFPLS respectively. At last, the experimental results show that the nonlinear calibration model, developed by combining NAP algorithm and RBFPLS regression, which was put forward in this paper, greatly improves the prediction accuracy of prediction sets, and what has been proved in this paper is that the nonlinear model building method will produce practical applications for the research of non-invasive detection techniques on human glucose concentrations.

  3. Automatic pre-processing for an object-oriented distributed hydrological model using GRASS-GIS

    NASA Astrophysics Data System (ADS)

    Sanzana, P.; Jankowfsky, S.; Branger, F.; Braud, I.; Vargas, X.; Hitschfeld, N.

    2012-04-01

    Landscapes are very heterogeneous, which impact the hydrological processes occurring in the catchments, especially in the modeling of peri-urban catchments. The Hydrological Response Units (HRUs), resulting from the intersection of different maps, such as land use, soil types and geology, and flow networks, allow the representation of these elements in an explicit way, preserving natural and artificial contours of the different layers. These HRUs are used as model mesh in some distributed object-oriented hydrological models, allowing the application of a topological oriented approach. The connectivity between polygons and polylines provides a detailed representation of the water balance and overland flow in these distributed hydrological models, based on irregular hydro-landscape units. When computing fluxes between these HRUs, the geometrical parameters, such as the distance between the centroid of gravity of the HRUs and the river network, and the length of the perimeter, can impact the realism of the calculated overland, sub-surface and groundwater fluxes. Therefore, it is necessary to process the original model mesh in order to avoid these numerical problems. We present an automatic pre-processing implemented in the open source GRASS-GIS software, for which several Python scripts or some algorithms already available were used, such as the Triangle software. First, some scripts were developed to improve the topology of the various elements, such as snapping of the river network to the closest contours. When data are derived with remote sensing, such as vegetation areas, their perimeter has lots of right angles that were smoothed. Second, the algorithms more particularly address bad-shaped elements of the model mesh such as polygons with narrow shapes, marked irregular contours and/or the centroid outside of the polygons. To identify these elements we used shape descriptors. The convexity index was considered the best descriptor to identify them with a threshold

  4. Microarray simulator as educational tool.

    PubMed

    Ruusuvuori, Pekka; Nykter, Matti; Mäkiraatikka, Eeva; Lehmussola, Antti; Korpelainen, Tomi; Erkkilä, Timo; Yli-Harja, Olli

    2007-01-01

    As many real-world applications, microarray measurements are inapplicable for large-scale teaching purposes due to their laborious preparation process and expense. Fortunately, many phases of the array preparation process can be efficiently demonstrated by using a software simulator tool. Here we propose the use of microarray simulator as an aiding tool in teaching of computational biology. Three case studies on educational use of the simulator are presented, which demonstrate the effect of gene knock-out, synthetic time series, and effect of noise sources. We conclude that the simulator, used for teaching the principles of microarray measurement technology, proved to be a useful tool in education.

  5. Reverse Phase Protein Microarrays.

    PubMed

    Baldelli, Elisa; Calvert, Valerie; Hodge, Alex; VanMeter, Amy; Petricoin, Emanuel F; Pierobon, Mariaelena

    2017-01-01

    While genes and RNA encode information about cellular status, proteins are considered the engine of the cellular machine, as they are the effective elements that drive all cellular functions including proliferation, migration, differentiation, and apoptosis. Consequently, investigations of the cellular protein network are considered a fundamental tool for understanding cellular functions.Alteration of the cellular homeostasis driven by elaborate intra- and extracellular interactions has become one of the most studied fields in the era of personalized medicine and targeted therapy. Increasing interest has been focused on developing and improving proteomic technologies that are suitable for analysis of clinical samples. In this context, reverse-phase protein microarrays (RPPA) is a sensitive, quantitative, high-throughput immunoassay for protein analyses of tissue samples, cells, and body fluids.RPPA is well suited for broad proteomic profiling and is capable of capturing protein activation as well as biochemical reactions such as phosphorylation, glycosylation, ubiquitination, protein cleavage, and conformational alterations across hundreds of samples using a limited amount of biological material. For these reasons, RPPA represents a valid tool for protein analyses and generates data that help elucidate the functional signaling architecture through protein-protein interaction and protein activation mapping for the identification of critical nodes for individualized or combinatorial targeted therapy.

  6. Combining results of microarray experiments: a rank aggregation approach.

    PubMed

    DeConde, Robert P; Hawley, Sarah; Falcon, Seth; Clegg, Nigel; Knudsen, Beatrice; Etzioni, Ruth

    2006-01-01

    As technology for microarray analysis becomes widespread, it is becoming increasingly important to be able to compare and combine the results of experiments that explore the same scientific question. In this article, we present a rank-aggregation approach for combining results from several microarray studies. The motivation for this approach is twofold; first, the final results of microarray studies are typically expressed as lists of genes, rank-ordered by a measure of the strength of evidence that they are functionally involved in the disease process, and second, using the information on this rank-ordered metric means that we do not have to concern ourselves with data on the actual expression levels, which may not be comparable across experiments. Our approach draws on methods for combining top-k lists from the computer science literature on meta-search. The meta-search problem shares several important features with that of combining microarray experiments, including the fact that there are typically few lists with many elements and the elements may not be common to all lists. We implement two meta-search algorithms, which use a Markov chain framework to convert pairwise preferences between list elements into a stationary distribution that represents an aggregate ranking (Dwork et al, 2001). We explore the behavior of the algorithms in hypothetical examples and a simulated dataset and compare their performance with that of an algorithm based on the order-statistics model of Thurstone (Thurstone, 1927). We apply all three algorithms to aggregate the results of five microarray studies of prostate cancer.

  7. Design and implementation of a preprocessing system for a sodium lidar

    NASA Technical Reports Server (NTRS)

    Voelz, D. G.; Sechrist, C. F., Jr.

    1983-01-01

    A preprocessing system, designed and constructed for use with the University of Illinois sodium lidar system, was developed to increase the altitude resolution and range of the lidar system and also to decrease the processing burden of the main lidar computer. The preprocessing system hardware and the software required to implement the system are described. Some preliminary results of an airborne sodium lidar experiment conducted with the preprocessing system installed in the sodium lidar are presented.

  8. A multichannel order-statistic technique for cDNA microarray image processing.

    PubMed

    Lukac, Rastislav; Plataniotis, Konstantinos N; Smolka, Bogdan; Venetsanopoulos, Anastasios N

    2004-12-01

    This paper introduces an automated image processing procedure capable of processing complementary deoxyribonucleic acid (cDNA) microarray images. Microarray data is contaminated by noise and suffers from broken edges and visual artifacts. Without the utilization of a filter, subsequent tasks such as spot identification and gene expression determination cannot be completed. By employing, in a unique cascade processing cycle, nonlinear filtering solutions based on robust order statistics, the procedure: 1) removes both background and high-frequency corrupting noise and 2) correctly identifies edges and spots in cDNA microarray data. The proposed solution operates directly on the microarray data, does not rely on explicit data normalization or spot separation preprocessing, and operates in a robust manner without using heuristically determined design parameters. Other routine microarray processing operations such as shape manipulations and grid adjustments can be used in conjunction with the developed solution in the processing pipeline. Experimentation reported in this paper indicates that the proposed solution yields excellent performance by removing noise and enhancing spot location determination.

  9. Chemistry of Natural Glycan Microarray

    PubMed Central

    Song, Xuezheng; Heimburg-Molinaro, Jamie; Cummings, Richard D.; Smith, David F.

    2014-01-01

    Glycan microarrays have become indispensable tools for studying protein-glycan interactions. Along with chemo-enzymatic synthesis, glycans isolated from natural sources have played important roles in array development and will continue to be a major source of glycans. N- and O-glycans from glycoproteins, and glycans from glycosphingolipids can be released from corresponding glycoconjugates with relatively mature methods, although isolation of large numbers and quantities of glycans are still very challenging. Glycosylphosphatidylinositol (GPI)-anchors and glycosaminoglycans (GAGs) are less represented on current glycan microarrays. Glycan microarray development has been greatly facilitated by bifunctional fluorescent linkers, which can be applied in a “Shotgun Glycomics” approach to incorporate isolated natural glycans. Glycan presentation on microarrays may affect glycan binding by GBPs, often through multivalent recognition by the GBP. PMID:24487062

  10. An evaluation of two-channel ChIP-on-chip and DNA methylation microarray normalization strategies.

    PubMed

    Adriaens, Michiel E; Jaillard, Magali; Eijssen, Lars M T; Mayer, Claus-Dieter; Evelo, Chris T A

    2012-01-25

    The combination of chromatin immunoprecipitation with two-channel microarray technology enables genome-wide mapping of binding sites of DNA-interacting proteins (ChIP-on-chip) or sites with methylated CpG di-nucleotides (DNA methylation microarray). These powerful tools are the gateway to understanding gene transcription regulation. Since the goals of such studies, the sample preparation procedures, the microarray content and study design are all different from transcriptomics microarrays, the data pre-processing strategies traditionally applied to transcriptomics microarrays may not be appropriate. Particularly, the main challenge of the normalization of "regulation microarrays" is (i) to make the data of individual microarrays quantitatively comparable and (ii) to keep the signals of the enriched probes, representing DNA sequences from the precipitate, as distinguishable as possible from the signals of the un-enriched probes, representing DNA sequences largely absent from the precipitate. We compare several widely used normalization approaches (VSN, LOWESS, quantile, T-quantile, Tukey's biweight scaling, Peng's method) applied to a selection of regulation microarray datasets, ranging from DNA methylation to transcription factor binding and histone modification studies. Through comparison of the data distributions of control probes and gene promoter probes before and after normalization, and assessment of the power to identify known enriched genomic regions after normalization, we demonstrate that there are clear differences in performance between normalization procedures. T-quantile normalization applied separately on the channels and Tukey's biweight scaling outperform other methods in terms of the conservation of enriched and un-enriched signal separation, as well as in identification of genomic regions known to be enriched. T-quantile normalization is preferable as it additionally improves comparability between microarrays. In contrast, popular normalization

  11. Pre-processing technologies to prepare solid waste for composting

    SciTech Connect

    Gould, M.

    1996-09-01

    The organic constituents of municipal solid waste can be converted into compost for use as a safe, beneficial soil amendment, conserving landfill space. The solid waste must first be processed to remove contaminants and prepare the organics for composting. This paper describes five different preprocessing systems, covering a broad range of technical approaches. Three are described briefly; two, from projects managed by the author, are presented as more detailed case histories: (1) a pilot study at a refuse-derived fuel (RDF) plant in Hartford, Connecticut and (2) a solid waste composting facility in East Hampton, New York. Materials flow diagrams and mass balances are presented for each process, showing that 100 tons of solid waste will yield 32 to 44 tons of compost, requiring disposal of 3 to 10 tons of metal, grit, and glass and 16 to 40 tons of light residue that can be landfilled or used as RDF.

  12. Preprocessing of Satellite Data for Urban Object Extraction

    NASA Astrophysics Data System (ADS)

    Krauß, T.

    2015-03-01

    Very high resolution (VHR) DSMs (digital surface models) derived from stereo- or multi-stereo images from current VHR satellites like WorldView-2 or Pléiades can be produced up to the ground sampling distance (GSD) of the sensors in the range of 50 cm to 1 m. From such DSMs the digital terrain model (DTM) representing the ground and also a so called nDEM (normalized digital elevation model) describing the height of objects above the ground can be derived. In parallel these sensors deliver multispectral imagery which can be used for a spectral classification of the imagery. Fusion of the multispectral classification and the nDEM allows a simple classification and detection of urban objects. In further processing steps these detected urban objects can be modeled and exported in a suitable description language like CityGML. In this work we present the pre-processing steps up to the classification and detection of the urban objects. The modeling is not part of this work. The pre-processing steps described here cover briefly the coregistration of the input images and the generation of the DSM. In more detail the improvement of the DSM, the extraction of the DTM and nDEM, the multispectral classification and the object detection and extraction are explained. The methods described are applied to two test regions from two satellites: First the center of Munich acquired by WorldView-2 and second the center of Melbourne acquired by Pĺeiades. From both acquisitions a stereo-pair from the panchromatic bands is used for creation of the DSM and the pan-sharpened multispectral images are used for spectral classification. Finally the quality of the detected urban objects is discussed.

  13. Pre-processing of ultraviolet resonance Raman spectra.

    PubMed

    Simpson, John V; Oshokoya, Olayinka; Wagner, Nicole; Liu, Jing; JiJi, Renee D

    2011-03-21

    The application of UV excitation sources coupled with resonance Raman have the potential to offer information unavailable with the current inventory of commonly used structural techniques including X-ray, NMR and IR analysis. However, for ultraviolet resonance Raman (UVRR) spectroscopy to become a mainstream method for the determination of protein secondary structure content and monitoring protein dynamics, the application of multivariate data analysis methodologies must be made routine. Typically, the application of higher order data analysis methods requires robust pre-processing methods in order to standardize the data arrays. The application of such methods can be problematic in UVRR datasets due to spectral shifts arising from day-to-day fluctuations in the instrument response. Additionally, the non-linear increases in spectral resolution in wavenumbers (increasing spectral data points for the same spectral region) that results from increasing excitation wavelengths can make the alignment of multi-excitation datasets problematic. Last, a uniform and standardized methodology for the subtraction of the water band has also been a systematic issue for multivariate data analysis as the water band overlaps the amide I mode. Here we present a two-pronged preprocessing approach using correlation optimized warping (COW) to alleviate spectra-to-spectra and day-to-day alignment errors coupled with a method whereby the relative intensity of the water band is determined through a least-squares determination of the signal intensity between 1750 and 1900 cm(-1) to make complex multi-excitation datasets more homogeneous and usable with multivariate analysis methods.

  14. The Effect Of Pre-Processing On Image Compression

    NASA Astrophysics Data System (ADS)

    Cookson, J.; Thoma, G.

    1986-10-01

    The Lister Hill National Center for Biomedical Communications, the National Library of Medicine's research division, is currently engaged in studying the application of Electronic Document Storage and Retrieval (EDSR) systems to a library environment. To accomplish this, an EDSR prototype has been built and is currently in use as a laboratory test-bed. The system consists of CCD scanners for document digitization, high resolution CRT document displays, hardcopy output devices, and optical and magnetic disk storage devices, all under the control of a PDP-11/44 computer. Prior to storage and transmission, the captured document images undergo processing operations that enhance their quality, eliminate degradations and remove redundancy. It is postulated that a "pre-processing" stage that removes extraneous material from the raw image data could improve the performance of the processing operations. The processing operation selected to prove this hypothesis is image compression, an important feature to economically extend on-line image storage capacity and increase image transfer speed in the EDSR system. The particular technique selected for implementation is one-dimensional runlength coding (CCITT recommendation T.4), because it is an established standard and appropriate as a base line system. The preprocessing operations on the raw image data are border removal and page centering. After centering the images, which are approximately 6 by 9 inches in the examples picked, in an 8.5 by 11 inch field, the "noisy" border areas are then made white. These operations are done electronically in a digital memory under operator control. For a selected set of pages, mostly comprising title pages and tables of contents, the result is an average improvement in compression ratios by a factor of over 3.

  15. Systematic interpretation of microarray data using experiment annotations

    PubMed Central

    Fellenberg, Kurt; Busold, Christian H; Witt, Olaf; Bauer, Andrea; Beckmann, Boris; Hauser, Nicole C; Frohme, Marcus; Winter, Stefan; Dippon, Jürgen; Hoheisel, Jörg D

    2006-01-01

    Background Up to now, microarray data are mostly assessed in context with only one or few parameters characterizing the experimental conditions under study. More explicit experiment annotations, however, are highly useful for interpreting microarray data, when available in a statistically accessible format. Results We provide means to preprocess these additional data, and to extract relevant traits corresponding to the transcription patterns under study. We found correspondence analysis particularly well-suited for mapping such extracted traits. It visualizes associations both among and between the traits, the hereby annotated experiments, and the genes, revealing how they are all interrelated. Here, we apply our methods to the systematic interpretation of radioactive (single channel) and two-channel data, stemming from model organisms such as yeast and drosophila up to complex human cancer samples. Inclusion of technical parameters allows for identification of artifacts and flaws in experimental design. Conclusion Biological and clinical traits can act as landmarks in transcription space, systematically mapping the variance of large datasets from the predominant changes down toward intricate details. PMID:17181856

  16. Study on Construction of a Medical X-Ray Direct Digital Radiography System and Hybrid Preprocessing Methods

    PubMed Central

    Ren, Yong; Wu, Sheng; Wang, Mijian; Cen, Zhongjie

    2014-01-01

    We construct a medical X-ray direct digital radiography (DDR) system based on a CCD (charge-coupled devices) camera. For the original images captured from X-ray exposure, computer first executes image flat-field correction and image gamma correction, and then carries out image contrast enhancement. A hybrid image contrast enhancement algorithm which is based on sharp frequency localization-contourlet transform (SFL-CT) and contrast limited adaptive histogram equalization (CLAHE), is proposed and verified by the clinical DDR images. Experimental results show that, for the medical X-ray DDR images, the proposed comprehensive preprocessing algorithm can not only greatly enhance the contrast and detail information, but also improve the resolution capability of DDR system. PMID:25013452

  17. Advanced spot quality analysis in two-colour microarray experiments

    PubMed Central

    Yatskou, Mikalai; Novikov, Eugene; Vetter, Guillaume; Muller, Arnaud; Barillot, Emmanuel; Vallar, Laurent; Friederich, Evelyne

    2008-01-01

    Background Image analysis of microarrays and, in particular, spot quantification and spot quality control, is one of the most important steps in statistical analysis of microarray data. Recent methods of spot quality control are still in early age of development, often leading to underestimation of true positive microarray features and, consequently, to loss of important biological information. Therefore, improving and standardizing the statistical approaches of spot quality control are essential to facilitate the overall analysis of microarray data and subsequent extraction of biological information. Findings We evaluated the performance of two image analysis packages MAIA and GenePix (GP) using two complementary experimental approaches with a focus on the statistical analysis of spot quality factors. First, we developed control microarrays with a priori known fluorescence ratios to verify the accuracy and precision of the ratio estimation of signal intensities. Next, we developed advanced semi-automatic protocols of spot quality evaluation in MAIA and GP and compared their performance with available facilities of spot quantitative filtering in GP. We evaluated these algorithms for standardised spot quality analysis in a whole-genome microarray experiment assessing well-characterised transcriptional modifications induced by the transcription regulator SNAI1. Using a set of RT-PCR or qRT-PCR validated microarray data, we found that the semi-automatic protocol of spot quality control we developed with MAIA allowed recovering approximately 13% more spots and 38% more differentially expressed genes (at FDR = 5%) than GP with default spot filtering conditions. Conclusion Careful control of spot quality characteristics with advanced spot quality evaluation can significantly increase the amount of confident and accurate data resulting in more meaningful biological conclusions. PMID:18798985

  18. WebArray: an online platform for microarray data analysis

    PubMed Central

    Xia, Xiaoqin; McClelland, Michael; Wang, Yipeng

    2005-01-01

    Background Many cutting-edge microarray analysis tools and algorithms, including commonly used limma and affy packages in Bioconductor, need sophisticated knowledge of mathematics, statistics and computer skills for implementation. Commercially available software can provide a user-friendly interface at considerable cost. To facilitate the use of these tools for microarray data analysis on an open platform we developed an online microarray data analysis platform, WebArray, for bench biologists to utilize these tools to explore data from single/dual color microarray experiments. Results The currently implemented functions were based on limma and affy package from Bioconductor, the spacings LOESS histogram (SPLOSH) method, PCA-assisted normalization method and genome mapping method. WebArray incorporates these packages and provides a user-friendly interface for accessing a wide range of key functions of limma and others, such as spot quality weight, background correction, graphical plotting, normalization, linear modeling, empirical bayes statistical analysis, false discovery rate (FDR) estimation, chromosomal mapping for genome comparison. Conclusion WebArray offers a convenient platform for bench biologists to access several cutting-edge microarray data analysis tools. The website is freely available at . It runs on a Linux server with Apache and MySQL. PMID:16371165

  19. Classification of large microarray datasets using fast random forest construction.

    PubMed

    Manilich, Elena A; Özsoyoğlu, Z Meral; Trubachev, Valeriy; Radivoyevitch, Tomas

    2011-04-01

    Random forest is an ensemble classification algorithm. It performs well when most predictive variables are noisy and can be used when the number of variables is much larger than the number of observations. The use of bootstrap samples and restricted subsets of attributes makes it more powerful than simple ensembles of trees. The main advantage of a random forest classifier is its explanatory power: it measures variable importance or impact of each factor on a predicted class label. These characteristics make the algorithm ideal for microarray data. It was shown to build models with high accuracy when tested on high-dimensional microarray datasets. Current implementations of random forest in the machine learning and statistics community, however, limit its usability for mining over large datasets, as they require that the entire dataset remains permanently in memory. We propose a new framework, an optimized implementation of a random forest classifier, which addresses specific properties of microarray data, takes computational complexity of a decision tree algorithm into consideration, and shows excellent computing performance while preserving predictive accuracy. The implementation is based on reducing overlapping computations and eliminating dependency on the size of main memory. The implementation's excellent computational performance makes the algorithm useful for interactive data analyses and data mining.

  20. Fourier Lucas-Kanade algorithm.

    PubMed

    Lucey, Simon; Navarathna, Rajitha; Ashraf, Ahmed Bilal; Sridharan, Sridha

    2013-06-01

    In this paper, we propose a framework for both gradient descent image and object alignment in the Fourier domain. Our method centers upon the classical Lucas & Kanade (LK) algorithm where we represent the source and template/model in the complex 2D Fourier domain rather than in the spatial 2D domain. We refer to our approach as the Fourier LK (FLK) algorithm. The FLK formulation is advantageous when one preprocesses the source image and template/model with a bank of filters (e.g., oriented edges, Gabor, etc.) as 1) it can handle substantial illumination variations, 2) the inefficient preprocessing filter bank step can be subsumed within the FLK algorithm as a sparse diagonal weighting matrix, 3) unlike traditional LK, the computational cost is invariant to the number of filters and as a result is far more efficient, and 4) this approach can be extended to the Inverse Compositional (IC) form of the LK algorithm where nearly all steps (including Fourier transform and filter bank preprocessing) can be precomputed, leading to an extremely efficient and robust approach to gradient descent image matching. Further, these computational savings translate to nonrigid object alignment tasks that are considered extensions of the LK algorithm, such as those found in Active Appearance Models (AAMs).

  1. D-MaPs - DNA-microarray projects: Web-based software for multi-platform microarray analysis

    PubMed Central

    2009-01-01

    The web application D-Maps provides a user-friendly interface to researchers performing studies based on microarrays. The program was developed to manage and process one- or two-color microarray data obtained from several platforms (currently, GeneTAC, ScanArray, CodeLink, NimbleGen and Affymetrix). Despite the availability of many algorithms and many software programs designed to perform microarray analysis on the internet, these usually require sophisticated knowledge of mathematics, statistics and computation. D-maps was developed to overcome the requirement of high performance computers or programming experience. D-Maps performs raw data processing, normalization and statistical analysis, allowing access to the analyzed data in text or graphical format. An original feature presented by D-Maps is GEO (Gene Expression Omnibus) submission format service. The D-MaPs application was already used for analysis of oligonucleotide microarrays and PCR-spotted arrays (one- and two-color, laser and light scanner). In conclusion, D-Maps is a valuable tool for microarray research community, especially in the case of groups without a bioinformatic core. PMID:21637530

  2. D-MaPs - DNA-microarray projects: Web-based software for multi-platform microarray analysis.

    PubMed

    Carazzolle, Marcelo F; Herig, Taís S; Deckmann, Ana C; Pereira, Gonçalo A G

    2009-07-01

    The web application D-Maps provides a user-friendly interface to researchers performing studies based on microarrays. The program was developed to manage and process one- or two-color microarray data obtained from several platforms (currently, GeneTAC, ScanArray, CodeLink, NimbleGen and Affymetrix). Despite the availability of many algorithms and many software programs designed to perform microarray analysis on the internet, these usually require sophisticated knowledge of mathematics, statistics and computation. D-maps was developed to overcome the requirement of high performance computers or programming experience. D-Maps performs raw data processing, normalization and statistical analysis, allowing access to the analyzed data in text or graphical format. An original feature presented by D-Maps is GEO (Gene Expression Omnibus) submission format service. The D-MaPs application was already used for analysis of oligonucleotide microarrays and PCR-spotted arrays (one- and two-color, laser and light scanner). In conclusion, D-Maps is a valuable tool for microarray research community, especially in the case of groups without a bioinformatic core.

  3. Inferring genetic networks from microarray data.

    SciTech Connect

    May, Elebeoba Eni; Davidson, George S.; Martin, Shawn Bryan; Werner-Washburne, Margaret C.; Faulon, Jean-Loup Michel

    2004-06-01

    In theory, it should be possible to infer realistic genetic networks from time series microarray data. In practice, however, network discovery has proved problematic. The three major challenges are: (1) inferring the network; (2) estimating the stability of the inferred network; and (3) making the network visually accessible to the user. Here we describe a method, tested on publicly available time series microarray data, which addresses these concerns. The inference of genetic networks from genome-wide experimental data is an important biological problem which has received much attention. Approaches to this problem have typically included application of clustering algorithms [6]; the use of Boolean networks [12, 1, 10]; the use of Bayesian networks [8, 11]; and the use of continuous models [21, 14, 19]. Overviews of the problem and general approaches to network inference can be found in [4, 3]. Our approach to network inference is similar to earlier methods in that we use both clustering and Boolean network inference. However, we have attempted to extend the process to better serve the end-user, the biologist. In particular, we have incorporated a system to assess the reliability of our network, and we have developed tools which allow interactive visualization of the proposed network.

  4. A Java-based tool for the design of classification microarrays.

    PubMed

    Meng, Da; Broschat, Shira L; Call, Douglas R

    2008-08-04

    Classification microarrays are used for purposes such as identifying strains of bacteria and determining genetic relationships to understand the epidemiology of an infectious disease. For these cases, mixed microarrays, which are composed of DNA from more than one organism, are more effective than conventional microarrays composed of DNA from a single organism. Selection of probes is a key factor in designing successful mixed microarrays because redundant sequences are inefficient and limited representation of diversity can restrict application of the microarray. We have developed a Java-based software tool, called PLASMID, for use in selecting the minimum set of probe sequences needed to classify different groups of plasmids or bacteria. The software program was successfully applied to several different sets of data. The utility of PLASMID was illustrated using existing mixed-plasmid microarray data as well as data from a virtual mixed-genome microarray constructed from different strains of Streptococcus. Moreover, use of data from expression microarray experiments demonstrated the generality of PLASMID. In this paper we describe a new software tool for selecting a set of probes for a classification microarray. While the tool was developed for the design of mixed microarrays-and mixed-plasmid microarrays in particular-it can also be used to design expression arrays. The user can choose from several clustering methods (including hierarchical, non-hierarchical, and a model-based genetic algorithm), several probe ranking methods, and several different display methods. A novel approach is used for probe redundancy reduction, and probe selection is accomplished via stepwise discriminant analysis. Data can be entered in different formats (including Excel and comma-delimited text), and dendrogram, heat map, and scatter plot images can be saved in several different formats (including jpeg and tiff). Weights generated using stepwise discriminant analysis can be stored for

  5. Comparing Bacterial DNA Microarray Fingerprints

    SciTech Connect

    Willse, Alan R.; Chandler, Darrell P.; White, Amanda M.; Protic, Miroslava; Daly, Don S.; Wunschel, Sharon C.

    2005-08-15

    Detecting subtle genetic differences between microorganisms is an important problem in molecular epidemiology and microbial forensics. In a typical investigation, gel electrophoresis is used to compare randomly amplified DNA fragments between microbial strains, where the patterns of DNA fragment sizes are proxies for a microbe's genotype. The limited genomic sample captured on a gel is often insufficient to discriminate nearly identical strains. This paper examines the application of microarray technology to DNA fingerprinting as a high-resolution alternative to gel-based methods. The so-called universal microarray, which uses short oligonucleotide probes that do not target specific genes or species, is intended to be applicable to all microorganisms because it does not require prior knowledge of genomic sequence. In principle, closely related strains can be distinguished if the number of probes on the microarray is sufficiently large, i.e., if the genome is sufficiently sampled. In practice, we confront noisy data, imperfectly matched hybridizations, and a high-dimensional inference problem. We describe the statistical problems of microarray fingerprinting, outline similarities with and differences from more conventional microarray applications, and illustrate the statistical fingerprinting problem for 10 closely related strains from three Bacillus species, and 3 strains from non-Bacillus species.

  6. Microarray Technologies in Fungal Diagnostics.

    PubMed

    Rupp, Steffen

    2017-01-01

    Microarray technologies have been a major research tool in the last decades. In addition they have been introduced into several fields of diagnostics including diagnostics of infectious diseases. Microarrays are highly parallelized assay systems that initially were developed for multiparametric nucleic acid detection. From there on they rapidly developed towards a tool for the detection of all kind of biological compounds (DNA, RNA, proteins, cells, nucleic acids, carbohydrates, etc.) or their modifications (methylation, phosphorylation, etc.). The combination of closed-tube systems and lab on chip devices with microarrays further enabled a higher automation degree with a reduced contamination risk. Microarray-based diagnostic applications currently complement and may in the future replace classical methods in clinical microbiology like blood cultures, resistance determination, microscopic and metabolic analyses as well as biochemical or immunohistochemical assays. In addition, novel diagnostic markers appear, like noncoding RNAs and miRNAs providing additional room for novel nucleic acid based biomarkers. Here I focus an microarray technologies in diagnostics and as research tools, based on nucleic acid-based arrays.

  7. Image microarrays (IMA): Digital pathology's missing tool.

    PubMed

    Hipp, Jason; Cheng, Jerome; Pantanowitz, Liron; Hewitt, Stephen; Yagi, Yukako; Monaco, James; Madabhushi, Anant; Rodriguez-Canales, Jaime; Hanson, Jeffrey; Roy-Chowdhuri, Sinchita; Filie, Armando C; Feldman, Michael D; Tomaszewski, John E; Shih, Natalie Nc; Brodsky, Victor; Giaccone, Giuseppe; Emmert-Buck, Michael R; Balis, Ulysses J

    2011-01-01

    The increasing availability of whole slide imaging (WSI) data sets (digital slides) from glass slides offers new opportunities for the development of computer-aided diagnostic (CAD) algorithms. With the all-digital pathology workflow that these data sets will enable in the near future, literally millions of digital slides will be generated and stored. Consequently, the field in general and pathologists, specifically, will need tools to help extract actionable information from this new and vast collective repository. To address this limitation, we designed and implemented a tool (dCORE) to enable the systematic capture of image tiles with constrained size and resolution that contain desired histopathologic features. In this communication, we describe a user-friendly tool that will enable pathologists to mine digital slides archives to create image microarrays (IMAs). IMAs are to digital slides as tissue microarrays (TMAs) are to cell blocks. Thus, a single digital slide could be transformed into an array of hundreds to thousands of high quality digital images, with each containing key diagnostic morphologies and appropriate controls. Current manual digital image cut-and-paste methods that allow for the creation of a grid of images (such as an IMA) of matching resolutions are tedious. The ability to create IMAs representing hundreds to thousands of vetted morphologic features has numerous applications in education, proficiency testing, consensus case review, and research. Lastly, in a manner analogous to the way conventional TMA technology has significantly accelerated in situ studies of tissue specimens use of IMAs has similar potential to significantly accelerate CAD algorithm development.

  8. Image microarrays (IMA): Digital pathology's missing tool

    PubMed Central

    Hipp, Jason; Cheng, Jerome; Pantanowitz, Liron; Hewitt, Stephen; Yagi, Yukako; Monaco, James; Madabhushi, Anant; Rodriguez-canales, Jaime; Hanson, Jeffrey; Roy-Chowdhuri, Sinchita; Filie, Armando C.; Feldman, Michael D.; Tomaszewski, John E.; Shih, Natalie NC.; Brodsky, Victor; Giaccone, Giuseppe; Emmert-Buck, Michael R.; Balis, Ulysses J.

    2011-01-01

    Introduction: The increasing availability of whole slide imaging (WSI) data sets (digital slides) from glass slides offers new opportunities for the development of computer-aided diagnostic (CAD) algorithms. With the all-digital pathology workflow that these data sets will enable in the near future, literally millions of digital slides will be generated and stored. Consequently, the field in general and pathologists, specifically, will need tools to help extract actionable information from this new and vast collective repository. Methods: To address this limitation, we designed and implemented a tool (dCORE) to enable the systematic capture of image tiles with constrained size and resolution that contain desired histopathologic features. Results: In this communication, we describe a user-friendly tool that will enable pathologists to mine digital slides archives to create image microarrays (IMAs). IMAs are to digital slides as tissue microarrays (TMAs) are to cell blocks. Thus, a single digital slide could be transformed into an array of hundreds to thousands of high quality digital images, with each containing key diagnostic morphologies and appropriate controls. Current manual digital image cut-and-paste methods that allow for the creation of a grid of images (such as an IMA) of matching resolutions are tedious. Conclusion: The ability to create IMAs representing hundreds to thousands of vetted morphologic features has numerous applications in education, proficiency testing, consensus case review, and research. Lastly, in a manner analogous to the way conventional TMA technology has significantly accelerated in situ studies of tissue specimens use of IMAs has similar potential to significantly accelerate CAD algorithm development. PMID:22200030

  9. Preprocessing effects of 22 linear univariate features on the performance of seizure prediction methods.

    PubMed

    Rasekhi, Jalil; Mollaei, Mohammad Reza Karami; Bandarabadi, Mojtaba; Teixeira, Cesar A; Dourado, Antonio

    2013-07-15

    Combining multiple linear univariate features in one feature space and classifying the feature space using machine learning methods could predict epileptic seizures in patients suffering from refractory epilepsy. For each patient, a set of twenty-two linear univariate features were extracted from 6 electroencephalogram (EEG) signals to make a 132 dimensional feature space. Preprocessing and normalization methods of the features, which affect the output of the seizure prediction algorithm, were studied in terms of alarm sensitivity and false prediction rate (FPR). The problem of choosing an optimal preictal time was tackled using 4 distinct values of 10, 20, 30, and 40 min. The seizure prediction problem has traditionally been considered a two-class classification problem, which is also exercised here. These studies have been conducted on the features obtained from 10 patients. For each patient, 48 different combinations of methods are compared to find the best configuration. Normalization by dividing by the maximum and smoothing are found to be the best configuration in most of the patients. The results also indicate that applying machine learning methods on a multidimensional feature space of 22 univariate features predicted seizure onsets with high performance. On average, the seizures were predicted in 73.9% of the cases (34 out of 46 in 737.9h of test data), with a FPR of 0.15 h(-1). Copyright © 2013 Elsevier B.V. All rights reserved.

  10. Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection

    PubMed Central

    Wong, Raymond

    2013-01-01

    Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. PMID:24288684

  11. Road Sign Recognition with Fuzzy Adaptive Pre-Processing Models

    PubMed Central

    Lin, Chien-Chuan; Wang, Ming-Shi

    2012-01-01

    A road sign recognition system based on adaptive image pre-processing models using two fuzzy inference schemes has been proposed. The first fuzzy inference scheme is to check the changes of the light illumination and rich red color of a frame image by the checking areas. The other is to check the variance of vehicle's speed and angle of steering wheel to select an adaptive size and position of the detection area. The Adaboost classifier was employed to detect the road sign candidates from an image and the support vector machine technique was employed to recognize the content of the road sign candidates. The prohibitory and warning road traffic signs are the processing targets in this research. The detection rate in the detection phase is 97.42%. In the recognition phase, the recognition rate is 93.04%. The total accuracy rate of the system is 92.47%. For video sequences, the best accuracy rate is 90.54%, and the average accuracy rate is 80.17%. The average computing time is 51.86 milliseconds per frame. The proposed system can not only overcome low illumination and rich red color around the road sign problems but also offer high detection rates and high computing performance. PMID:22778650

  12. 2.52-THz scanning reflection imaging and image preprocessing

    NASA Astrophysics Data System (ADS)

    Li, Qi; Yao, Rui; Yin, Qiquo; Shan, Jixin; Wang, Qi

    2008-12-01

    Terahertz radiation has high penetration capability and will not cause harmful ionization to human beings; therefore THz reflection imaging has wide application prospect in security inspection and counter-terrorism. However the reflection signal in THz active imaging is weak and the sensitivity of existing detectors in this wave band is low. In this paper the 2.52-THz laser reflection imaging system based on scanning imaging is constructed. The FIR laser of SIFIR-50FPL made by Coherent Inc in America is chosen as the THz radiation source and its power is about 30mW. The detector of P4-42 made by the Molectron Corporation is adopted, which works at the room temperature with the spectral range from 1nm to over 1mm. The imaging experiments on knives and some other objects are made utilizing this system. The images of concealed objects are obtained and the images are clearer after image preprocessing. The experimental results show that this imaging system has wide application potential in security inspection.

  13. Software for Preprocessing Data from Rocket-Engine Tests

    NASA Technical Reports Server (NTRS)

    Cheng, Chiu-Fu

    2004-01-01

    Three computer programs have been written to preprocess digitized outputs of sensors during rocket-engine tests at Stennis Space Center (SSC). The programs apply exclusively to the SSC E test-stand complex and utilize the SSC file format. The programs are the following: Engineering Units Generator (EUGEN) converts sensor-output-measurement data to engineering units. The inputs to EUGEN are raw binary test-data files, which include the voltage data, a list identifying the data channels, and time codes. EUGEN effects conversion by use of a file that contains calibration coefficients for each channel. QUICKLOOK enables immediate viewing of a few selected channels of data, in contradistinction to viewing only after post-test processing (which can take 30 minutes to several hours depending on the number of channels and other test parameters) of data from all channels. QUICKLOOK converts the selected data into a form in which they can be plotted in engineering units by use of Winplot (a free graphing program written by Rick Paris). EUPLOT provides a quick means for looking at data files generated by EUGEN without the necessity of relying on the PV-WAVE based plotting software.

  14. Software for Preprocessing Data From Rocket-Engine Tests

    NASA Technical Reports Server (NTRS)

    Cheng, Chiu-Fu

    2003-01-01

    Three computer programs have been written to preprocess digitized outputs of sensors during rocket-engine tests at Stennis Space Center (SSC). The programs apply exclusively to the SSC E test-stand complex and utilize the SSC file format. The programs are the following: (1) Engineering Units Generator (EUGEN) converts sensor-output-measurement data to engineering units. The inputs to EUGEN are raw binary test-data files, which include the voltage data, a list identifying the data channels, and time codes. EUGEN effects conversion by use of a file that contains calibration coefficients for each channel. (2) QUICKLOOK enables immediate viewing of a few selected channels of data, in contradistinction to viewing only after post-test processing (which can take 30 minutes to several hours depending on the number of channels and other test parameters) of data from all channels. QUICKLOOK converts the selected data into a form in which they can be plotted in engineering units by use of Winplot. (3) EUPLOT provides a quick means for looking at data files generated by EUGEN without the necessity of relying on the PVWAVE based plotting software.

  15. Software for Preprocessing Data From Rocket-Engine Tests

    NASA Technical Reports Server (NTRS)

    Cheng, Chiu-Fu

    2002-01-01

    Three computer programs have been written to preprocess digitized outputs of sensors during rocket-engine tests at Stennis Space Center (SSC). The programs apply exclusively to the SSC "E" test-stand complex and utilize the SSC file format. The programs are the following: 1) Engineering Units Generator (EUGEN) converts sensor-output-measurement data to engineering units. The inputs to EUGEN are raw binary test-data files, which include the voltage data, a list identifying the data channels, and time codes. EUGEN effects conversion by use of a file that contains calibration coefficients for each channel; 2) QUICKLOOK enables immediate viewing of a few selected channels of data, in contradistinction to viewing only after post test processing (which can take 30 minutes to several hours depending on the number of channels and other test parameters) of data from all channels. QUICKLOOK converts the selected data into a form in which they can be plotted in engineering units by use of Winplot (a free graphing program written by Rick Paris); and 3) EUPLOT provides a quick means for looking at data files generated by EUGEN without the necessity of relying on the PVWAVE based plotting software.

  16. Breast image pre-processing for mammographic tissue segmentation.

    PubMed

    He, Wenda; Hogg, Peter; Juette, Arne; Denton, Erika R E; Zwiggelaar, Reyer

    2015-12-01

    During mammographic image acquisition, a compression paddle is used to even the breast thickness in order to obtain optimal image quality. Clinical observation has indicated that some mammograms may exhibit abrupt intensity change and low visibility of tissue structures in the breast peripheral areas. Such appearance discrepancies can affect image interpretation and may not be desirable for computer aided mammography, leading to incorrect diagnosis and/or detection which can have a negative impact on sensitivity and specificity of screening mammography. This paper describes a novel mammographic image pre-processing method to improve image quality for analysis. An image selection process is incorporated to better target problematic images. The processed images show improved mammographic appearances not only in the breast periphery but also across the mammograms. Mammographic segmentation and risk/density classification were performed to facilitate a quantitative and qualitative evaluation. When using the processed images, the results indicated more anatomically correct segmentation in tissue specific areas, and subsequently better classification accuracies were achieved. Visual assessments were conducted in a clinical environment to determine the quality of the processed images and the resultant segmentation. The developed method has shown promising results. It is expected to be useful in early breast cancer detection, risk-stratified screening, and aiding radiologists in the process of decision making prior to surgery and/or treatment. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Macular Preprocessing of Linear Acceleratory Stimuli: Implications for the Clinic

    NASA Technical Reports Server (NTRS)

    Ross, M. D.; Hargens, Alan R. (Technical Monitor)

    1996-01-01

    Three-dimensional reconstructions of innervation patterns in rat maculae were carried out using serial section images sent to a Silicon Graphics workstation from a transmission electron microscope. Contours were extracted from mosaicked sections, then registered and visualized using Biocomputation Center software. Purposes were to determine innervation patterns of type II cells and areas encompassed by vestibular afferent receptive fields. Terminals on type II cells typically are elongated and compartmentalized into parts varying in vesicular content; reciprocal and serial synapses are common. The terminals originate as processes of nearby calyces or from nerve fibers passing to calyces outside the immediate vicinity. Thus, receptive fields of the afferents overlap in unique ways. Multiple processes are frequent; from 4 to 6 afferents supply 12-16 terminals on a type II cell. Processes commonly communicate with two type II cells. The morphology indicates that extensive preprocessing of linear acceleratory stimuli occurs peripherally, as is true also of visual and olfactory systems. Clinically, this means that loss of individual nerve fibers may not be noticed behaviorally, due to redundancy (receptive field overlap). However, peripheral processing implies the presence of neuroactive agents whose loss can acutely or chronically alter normal peripheral function and cause balance disorders. (Platform presentation preferred - Theme 11)

  18. Software for Preprocessing Data from Rocket-Engine Tests

    NASA Technical Reports Server (NTRS)

    Cheng, Chiu-Fu

    2004-01-01

    Three computer programs have been written to preprocess digitized outputs of sensors during rocket-engine tests at Stennis Space Center (SSC). The programs apply exclusively to the SSC E test-stand complex and utilize the SSC file format. The programs are the following: Engineering Units Generator (EUGEN) converts sensor-output-measurement data to engineering units. The inputs to EUGEN are raw binary test-data files, which include the voltage data, a list identifying the data channels, and time codes. EUGEN effects conversion by use of a file that contains calibration coefficients for each channel. QUICKLOOK enables immediate viewing of a few selected channels of data, in contradistinction to viewing only after post-test processing (which can take 30 minutes to several hours depending on the number of channels and other test parameters) of data from all channels. QUICKLOOK converts the selected data into a form in which they can be plotted in engineering units by use of Winplot (a free graphing program written by Rick Paris). EUPLOT provides a quick means for looking at data files generated by EUGEN without the necessity of relying on the PV-WAVE based plotting software.

  19. Software for Preprocessing Data From Rocket-Engine Tests

    NASA Technical Reports Server (NTRS)

    Cheng, Chiu-Fu

    2002-01-01

    Three computer programs have been written to preprocess digitized outputs of sensors during rocket-engine tests at Stennis Space Center (SSC). The programs apply exclusively to the SSC "E" test-stand complex and utilize the SSC file format. The programs are the following: 1) Engineering Units Generator (EUGEN) converts sensor-output-measurement data to engineering units. The inputs to EUGEN are raw binary test-data files, which include the voltage data, a list identifying the data channels, and time codes. EUGEN effects conversion by use of a file that contains calibration coefficients for each channel; 2) QUICKLOOK enables immediate viewing of a few selected channels of data, in contradistinction to viewing only after post test processing (which can take 30 minutes to several hours depending on the number of channels and other test parameters) of data from all channels. QUICKLOOK converts the selected data into a form in which they can be plotted in engineering units by use of Winplot (a free graphing program written by Rick Paris); and 3) EUPLOT provides a quick means for looking at data files generated by EUGEN without the necessity of relying on the PVWAVE based plotting software.

  20. Road sign recognition with fuzzy adaptive pre-processing models.

    PubMed

    Lin, Chien-Chuan; Wang, Ming-Shi

    2012-01-01

    A road sign recognition system based on adaptive image pre-processing models using two fuzzy inference schemes has been proposed. The first fuzzy inference scheme is to check the changes of the light illumination and rich red color of a frame image by the checking areas. The other is to check the variance of vehicle's speed and angle of steering wheel to select an adaptive size and position of the detection area. The Adaboost classifier was employed to detect the road sign candidates from an image and the support vector machine technique was employed to recognize the content of the road sign candidates. The prohibitory and warning road traffic signs are the processing targets in this research. The detection rate in the detection phase is 97.42%. In the recognition phase, the recognition rate is 93.04%. The total accuracy rate of the system is 92.47%. For video sequences, the best accuracy rate is 90.54%, and the average accuracy rate is 80.17%. The average computing time is 51.86 milliseconds per frame. The proposed system can not only overcome low illumination and rich red color around the road sign problems but also offer high detection rates and high computing performance.

  1. Macular Preprocessing of Linear Acceleratory Stimuli: Implications for the Clinic

    NASA Technical Reports Server (NTRS)

    Ross, M. D.; Hargens, Alan R. (Technical Monitor)

    1996-01-01

    Three-dimensional reconstructions of innervation patterns in rat maculae were carried out using serial section images sent to a Silicon Graphics workstation from a transmission electron microscope. Contours were extracted from mosaicked sections, then registered and visualized using Biocomputation Center software. Purposes were to determine innervation patterns of type II cells and areas encompassed by vestibular afferent receptive fields. Terminals on type II cells typically are elongated and compartmentalized into parts varying in vesicular content; reciprocal and serial synapses are common. The terminals originate as processes of nearby calyces or from nerve fibers passing to calyces outside the immediate vicinity. Thus, receptive fields of the afferents overlap in unique ways. Multiple processes are frequent; from 4 to 6 afferents supply 12-16 terminals on a type II cell. Processes commonly communicate with two type II cells. The morphology indicates that extensive preprocessing of linear acceleratory stimuli occurs peripherally, as is true also of visual and olfactory systems. Clinically, this means that loss of individual nerve fibers may not be noticed behaviorally, due to redundancy (receptive field overlap). However, peripheral processing implies the presence of neuroactive agents whose loss can acutely or chronically alter normal peripheral function and cause balance disorders. (Platform presentation preferred - Theme 11)

  2. Impact of spatial complexity preprocessing on hyperspectral data unmixing

    NASA Astrophysics Data System (ADS)

    Robila, Stefan A.; Pirate, Kimberly; Hall, Terrance

    2013-05-01

    For most of the success, hyperspectral image processing techniques have their origins in multidimensional signal processing with a special emphasis on optimization based on objective functions. Many of these techniques (ICA, PCA, NMF, OSP, etc.) have their basis on collections of single dimensional data and do not take in consideration any spatial based characteristics (such as the shape of objects in the scene). Recently, in an effort to improve the processing results, several approaches that characterize spatial complexity (based on the neighborhood information) were introduced. Our goal is to investigate how spatial complexity based approaches can be employed as preprocessing techniques for other previously established methods. First, we designed for each spatial complexity based technique a step that generates a hyperspectral cube scaled based on spatial information. Next we feed the new cubes to a group of processing techniques such as ICA and PCA. We compare the results between processing the original and the scaled data. We compared the results on the scaled data with the results on the full data. We built upon these initial results by employing additional spatial complexity approaches. We also introduced new hybrid approaches that would embed the spatial complexity step into the main processing stage.

  3. Localization of spatially distributed brain sources after a tensor-based preprocessing of interictal epileptic EEG data.

    PubMed

    Albera, L; Becker, H; Karfoul, A; Gribonval, R; Kachenoura, A; Bensaid, S; Senhadji, L; Hernandez, A; Merlet, I

    2015-01-01

    This paper addresses the localization of spatially distributed sources from interictal epileptic electroencephalographic data after a tensor-based preprocessing. Justifying the Canonical Polyadic (CP) model of the space-time-frequency and space-time-wave-vector tensors is not an easy task when two or more extended sources have to be localized. On the other hand, the occurrence of several amplitude modulated spikes originating from the same epileptic region can be used to build a space-time-spike tensor from the EEG data. While the CP model of this tensor appears more justified, the exact computation of its loading matrices can be limited by the presence of highly correlated sources or/and a strong background noise. An efficient extended source localization scheme after the tensor-based preprocessing has then to be set up. Different strategies are thus investigated and compared on realistic simulated data: the "disk algorithm" using a precomputed dictionary of circular patches, a standardized Tikhonov regularization and a fused LASSO scheme.

  4. Technical Advances of the Recombinant Antibody Microarray Technology Platform for Clinical Immunoproteomics.

    PubMed

    Delfani, Payam; Dexlin Mellby, Linda; Nordström, Malin; Holmér, Andreas; Ohlsson, Mattias; Borrebaeck, Carl A K; Wingren, Christer

    2016-01-01

    In the quest for deciphering disease-associated biomarkers, high-performing tools for multiplexed protein expression profiling of crude clinical samples will be crucial. Affinity proteomics, mainly represented by antibody-based microarrays, have during recent years been established as a proteomic tool providing unique opportunities for parallelized protein expression profiling. But despite the progress, several main technical features and assay procedures remains to be (fully) resolved. Among these issues, the handling of protein microarray data, i.e. the biostatistics parts, is one of the key features to solve. In this study, we have therefore further optimized, validated, and standardized our in-house designed recombinant antibody microarray technology platform. To this end, we addressed the main remaining technical issues (e.g. antibody quality, array production, sample labelling, and selected assay conditions) and most importantly key biostatistics subjects (e.g. array data pre-processing and biomarker panel condensation). This represents one of the first antibody array studies in which these key biostatistics subjects have been studied in detail. Here, we thus present the next generation of the recombinant antibody microarray technology platform designed for clinical immunoproteomics.

  5. Technical Advances of the Recombinant Antibody Microarray Technology Platform for Clinical Immunoproteomics

    PubMed Central

    Delfani, Payam; Dexlin Mellby, Linda; Nordström, Malin; Holmér, Andreas; Ohlsson, Mattias; Borrebaeck, Carl A. K.; Wingren, Christer

    2016-01-01

    In the quest for deciphering disease-associated biomarkers, high-performing tools for multiplexed protein expression profiling of crude clinical samples will be crucial. Affinity proteomics, mainly represented by antibody-based microarrays, have during recent years been established as a proteomic tool providing unique opportunities for parallelized protein expression profiling. But despite the progress, several main technical features and assay procedures remains to be (fully) resolved. Among these issues, the handling of protein microarray data, i.e. the biostatistics parts, is one of the key features to solve. In this study, we have therefore further optimized, validated, and standardized our in-house designed recombinant antibody microarray technology platform. To this end, we addressed the main remaining technical issues (e.g. antibody quality, array production, sample labelling, and selected assay conditions) and most importantly key biostatistics subjects (e.g. array data pre-processing and biomarker panel condensation). This represents one of the first antibody array studies in which these key biostatistics subjects have been studied in detail. Here, we thus present the next generation of the recombinant antibody microarray technology platform designed for clinical immunoproteomics. PMID:27414037

  6. Validation of MIMGO: a method to identify differentially expressed GO terms in a microarray dataset

    PubMed Central

    2012-01-01

    Background We previously proposed an algorithm for the identification of GO terms that commonly annotate genes whose expression is upregulated or downregulated in some microarray data compared with in other microarray data. We call these “differentially expressed GO terms” and have named the algorithm “matrix-assisted identification method of differentially expressed GO terms” (MIMGO). MIMGO can also identify microarray data in which genes annotated with a differentially expressed GO term are upregulated or downregulated. However, MIMGO has not yet been validated on a real microarray dataset using all available GO terms. Findings We combined Gene Set Enrichment Analysis (GSEA) with MIMGO to identify differentially expressed GO terms in a yeast cell cycle microarray dataset. GSEA followed by MIMGO (GSEA + MIMGO) correctly identified (p < 0.05) microarray data in which genes annotated to differentially expressed GO terms are upregulated. We found that GSEA + MIMGO was slightly less effective than, or comparable to, GSEA (Pearson), a method that uses Pearson’s correlation as a metric, at detecting true differentially expressed GO terms. However, unlike other methods including GSEA (Pearson), GSEA + MIMGO can comprehensively identify the microarray data in which genes annotated with a differentially expressed GO term are upregulated or downregulated. Conclusions MIMGO is a reliable method to identify differentially expressed GO terms comprehensively. PMID:23232071

  7. Understanding the effects of pre-processing on extracted signal features from gait accelerometry signals.

    PubMed

    Millecamps, Alexandre; Lowry, Kristin A; Brach, Jennifer S; Perera, Subashan; Redfern, Mark S; Sejdić, Ervin

    2015-07-01

    Gait accelerometry is an important approach for gait assessment. Previous contributions have adopted various pre-processing approaches for gait accelerometry signals, but none have thoroughly investigated the effects of such pre-processing operations on the obtained results. Therefore, this paper investigated the influence of pre-processing operations on signal features extracted from gait accelerometry signals. These signals were collected from 35 participants aged over 65years: 14 of them were healthy controls (HC), 10 had Parkinson׳s disease (PD) and 11 had peripheral neuropathy (PN). The participants walked on a treadmill at preferred speed. Signal features in time, frequency and time-frequency domains were computed for both raw and pre-processed signals. The pre-processing stage consisted of applying tilt correction and denoising operations to acquired signals. We first examined the effects of these operations separately, followed by the investigation of their joint effects. Several important observations were made based on the obtained results. First, the denoising operation alone had almost no effects in comparison to the trends observed in the raw data. Second, the tilt correction affected the reported results to a certain degree, which could lead to a better discrimination between groups. Third, the combination of the two pre-processing operations yielded similar trends as the tilt correction alone. These results indicated that while gait accelerometry is a valuable approach for the gait assessment, one has to carefully adopt any pre-processing steps as they alter the observed findings.

  8. Comparison of preprocessing methods and storage times for touch DNA samples

    PubMed Central

    Dong, Hui; Wang, Jing; Zhang, Tao; Ge, Jian-ye; Dong, Ying-qiang; Sun, Qi-fan; Liu, Chao; Li, Cai-xia

    2017-01-01

    Aim To select appropriate preprocessing methods for different substrates by comparing the effects of four different preprocessing methods on touch DNA samples and to determine the effect of various storage times on the results of touch DNA sample analysis. Method Hand touch DNA samples were used to investigate the detection and inspection results of DNA on different substrates. Four preprocessing methods, including the direct cutting method, stubbing procedure, double swab technique, and vacuum cleaner method, were used in this study. DNA was extracted from mock samples with four different preprocessing methods. The best preprocess protocol determined from the study was further used to compare performance after various storage times. DNA extracted from all samples was quantified and amplified using standard procedures. Results The amounts of DNA and the number of alleles detected on the porous substrates were greater than those on the non-porous substrates. The performances of the four preprocessing methods varied with different substrates. The direct cutting method displayed advantages for porous substrates, and the vacuum cleaner method was advantageous for non-porous substrates. No significant degradation trend was observed as the storage times increased. Conclusion Different substrates require the use of different preprocessing method in order to obtain the highest DNA amount and allele number from touch DNA samples. This study provides a theoretical basis for explorations of touch DNA samples and may be used as a reference when dealing with touch DNA samples in case work. PMID:28252870

  9. Object localization based on smoothing preprocessing and cascade classifier

    NASA Astrophysics Data System (ADS)

    Zhang, Xingfu; Liu, Lei; Zhao, Feng

    2017-01-01

    An improved algorithm for image location is proposed in this paper. Firstly, the image is smoothed and the partial noise is removed. Then use the cascade classifier to train a template. Finally, the template is used to detect the related images. The advantage of the algorithm is that it is robust to noise and the proportion of the image is not sensitive to change. At the same time, the algorithm also has the advantages of fast computation speed. In this paper, a real truck bottom picture is chosen as the experimental object. Images of normal components and faulty components are all included in the image sample. Experimental results show that the accuracy rate of the image is more than 90 percent when the grade is more than 40. So we can draw a conclusion that the algorithm proposed in this paper can be applied to the actual image localization project.

  10. DNA microarray integromics analysis platform.

    PubMed

    Waller, Tomasz; Gubała, Tomasz; Sarapata, Krzysztof; Piwowar, Monika; Jurkowski, Wiktor

    2015-01-01

    The study of interactions between molecules belonging to different biochemical families (such as lipids and nucleic acids) requires specialized data analysis methods. This article describes the DNA Microarray Integromics Analysis Platform, a unique web application that focuses on computational integration and analysis of "multi-omics" data. Our tool supports a range of complex analyses, including - among others - low- and high-level analyses of DNA microarray data, integrated analysis of transcriptomics and lipidomics data and the ability to infer miRNA-mRNA interactions. We demonstrate the characteristics and benefits of the DNA Microarray Integromics Analysis Platform using two different test cases. The first test case involves the analysis of the nutrimouse dataset, which contains measurements of the expression of genes involved in nutritional problems and the concentrations of hepatic fatty acids. The second test case involves the analysis of miRNA-mRNA interactions in polysaccharide-stimulated human dermal fibroblasts infected with porcine endogenous retroviruses. The DNA Microarray Integromics Analysis Platform is a web-based graphical user interface for "multi-omics" data management and analysis. Its intuitive nature and wide range of available workflows make it an effective tool for molecular biology research. The platform is hosted at https://lifescience.plgrid.pl/.

  11. Improving FoRe: A New Inlet Design for Filtering Samples through Individual Microarray Spots.

    PubMed

    de Lange, Victoria; Habegger, Marco; Schmidt, Marco; Vörös, János

    2017-03-24

    In this publication we present an improvement to our previously introduced vertical flow microarray, the FoRe array, which capitalizes on the fusion of immunofiltration and densely packed micron test sites. Filtering samples through individual microarray spots allows us to rapidly analyze dilute samples with high-throughput and high signal-to-noise. Unlike other flowthrough microarrays, in the FoRe design samples are injected into micron channels and sequentially exposed to different targets. This arrangement makes it possible to increase the sensitivity of the microarray by simply increasing the sample volume or to rapidly reconcentrate samples after preprocessing steps dilute the analyte. Here we present a new inlet system which allows us to increase the analyzed sample volume without compromising the micron spot size and dense layout. We combined this with a model assay to demonstrate that the device is sensitive to the amount of antigen, and as a result, sample volume directly correlates to sensitivity. We introduced a simple technique for analysis of blood, which previously clogged the nanometer-sized pores, requiring only microliter volumes expected from an infant heel prick. A drop of blood is mixed with buffer to separate the plasma before reconcentrating the sample on the microarray spot. We demonstrated the success of this procedure by spiking TNF-α into blood and achieved a limit of detection of 18 pM. Compared to traditional protein microarrays, the FoRe array is still inexpensive, customizable, and simple to use, and thanks to these improvements has a broad range of applications from small animal studies to environmental monitoring.

  12. Comparing Binaural Pre-processing Strategies I: Instrumental Evaluation.

    PubMed

    Baumgärtel, Regina M; Krawczyk-Becker, Martin; Marquardt, Daniel; Völker, Christoph; Hu, Hongmei; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Ernst, Stephan M A; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias

    2015-12-30

    In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios. © The Author(s) 2015.

  13. Microfluidic microarray systems and methods thereof

    DOEpatents

    West, Jay A. A. [Castro Valley, CA; Hukari, Kyle W [San Ramon, CA; Hux, Gary A [Tracy, CA

    2009-04-28

    Disclosed are systems that include a manifold in fluid communication with a microfluidic chip having a microarray, an illuminator, and a detector in optical communication with the microarray. Methods for using these systems for biological detection are also disclosed.

  14. Automated Pre-processing for NMR Assignments with Reduced Tedium

    SciTech Connect

    Pawley, Norma; Gans, Jason

    2004-05-11

    An important rate-limiting step in the reasonance asignment process is accurate identification of resonance peaks in MNR spectra. NMR spectra are noisy. Hence, automatic peak-picking programs must navigate between the Scylla of reliable but incomplete picking, and the Charybdis of noisy but complete picking. Each of these extremes complicates the assignment process: incomplete peak-picking results in the loss of essential connectivities, while noisy picking conceals the true connectivities under a combinatiorial explosion of false positives. Intermediate processing can simplify the assignment process by preferentially removing false peaks from noisy peak lists. This is accomplished by requiring consensus between multiple NMR experiments, exploiting a priori information about NMR spectra, and drawing on empirical statistical distributions of chemical shift extracted from the BioMagResBank. Experienced NMR practitioners currently apply many of these techniques "by hand", which is tedious, and may appear arbitrary to the novice. To increase efficiency, we have created a systematic and automated approach to this process, known as APART. Automated pre-processing has three main advantages: reduced tedium, standardization, and pedagogy. In the hands of experienced spectroscopists, the main advantage is reduced tedium (a rapid increase in the ratio of true peaks to false peaks with minimal effort). When a project is passed from hand to hand, the main advantage is standardization. APART automatically documents the peak filtering process by archiving its original recommendations, the accompanying justifications, and whether a user accepted or overrode a given filtering recommendation. In the hands of a novice, this tool can reduce the stumbling block of learning to differentiate between real peaks and noise, by providing real-time examples of how such decisions are made.

  15. Independent component analysis of Alzheimer's DNA microarray gene expression data

    PubMed Central

    Kong, Wei; Mou, Xiaoyang; Liu, Qingzhong; Chen, Zhongxue; Vanderburg, Charles R; Rogers, Jack T; Huang, Xudong

    2009-01-01

    Background Gene microarray technology is an effective tool to investigate the simultaneous activity of multiple cellular pathways from hundreds to thousands of genes. However, because data in the colossal amounts generated by DNA microarray technology are usually complex, noisy, high-dimensional, and often hindered by low statistical power, their exploitation is difficult. To overcome these problems, two kinds of unsupervised analysis methods for microarray data: principal component analysis (PCA) and independent component analysis (ICA) have been developed to accomplish the task. PCA projects the data into a new space spanned by the principal components that are mutually orthonormal to each other. The constraint of mutual orthogonality and second-order statistics technique within PCA algorithms, however, may not be applied to the biological systems studied. Extracting and characterizing the most informative features of the biological signals, however, require higher-order statistics. Results ICA is one of the unsupervised algorithms that can extract higher-order statistical structures from data and has been applied to DNA microarray gene expression data analysis. We performed FastICA method on DNA microarray gene expression data from Alzheimer's disease (AD) hippocampal tissue samples and consequential gene clustering. Experimental results showed that the ICA method can improve the clustering results of AD samples and identify significant genes. More than 50 significant genes with high expression levels in severe AD were extracted, representing immunity-related protein, metal-related protein, membrane protein, lipoprotein, neuropeptide, cytoskeleton protein, cellular binding protein, and ribosomal protein. Within the aforementioned categories, our method also found 37 significant genes with low expression levels. Moreover, it is worth noting that some oncogenes and phosphorylation-related proteins are expressed in low levels. In comparison to the PCA and support

  16. The Microarray Revolution: Perspectives from Educators

    ERIC Educational Resources Information Center

    Brewster, Jay L.; Beason, K. Beth; Eckdahl, Todd T.; Evans, Irene M.

    2004-01-01

    In recent years, microarray analysis has become a key experimental tool, enabling the analysis of genome-wide patterns of gene expression. This review approaches the microarray revolution with a focus upon four topics: 1) the early development of this technology and its application to cancer diagnostics; 2) a primer of microarray research,…

  17. The Microarray Revolution: Perspectives from Educators

    ERIC Educational Resources Information Center

    Brewster, Jay L.; Beason, K. Beth; Eckdahl, Todd T.; Evans, Irene M.

    2004-01-01

    In recent years, microarray analysis has become a key experimental tool, enabling the analysis of genome-wide patterns of gene expression. This review approaches the microarray revolution with a focus upon four topics: 1) the early development of this technology and its application to cancer diagnostics; 2) a primer of microarray research,…

  18. Pre-processing of data coming from a laser-EMAT system for non-destructive testing of steel slabs.

    PubMed

    Sgarbi, Mirko; Colla, Valentina; Cateni, Sivia; Higson, Stuart

    2012-01-01

    Non destructive test systems are increasingly applied in the industrial context for their strong potentialities in improving and standardizing quality control. Especially in the intermediate manufacturing stages, early detection of defects on semi-finished products allow their direction towards later production processes according to their quality, with consequent considerable savings in time, energy, materials and work. However, the raw data coming from non destructive test systems are not always immediately suitable for sophisticated defect detection algorithms, due to noise and disturbances which are unavoidable, especially in harsh operating conditions, such as the ones which are typical of the steelmaking cycle. The paper describes some pre-processing operations which are required in order to exploit the data coming from a non destructive test system. Such a system is based on the joint exploitation of Laser and Electro-Magnetic Acoustic Transducer technologies and is applied to the detection of surface and sub-surface cracks in cold and hot steel slabs.

  19. Mining pathway signatures from microarray data and relevant biological knowledge.

    PubMed

    Panteris, Eleftherios; Swift, Stephen; Payne, Annette; Liu, Xiaohui

    2007-12-01

    High-throughput technologies such as DNA microarray are in the process of revolutionizing the way modern biological research is being done. Bioinformatics tools are becoming increasingly important to assist biomedical scientists in their quest in understanding complex biological processes. Gene expression analysis has attracted a large amount of attention over the last few years mostly in the form of algorithms, exploring cluster and regulatory relationships among genes of interest, and programs that try to display the multidimensional microarray data in appropriate formats so that they make biological sense. To reduce the dimensionality of microarray data and make the corresponding analysis more biologically relevant, in this paper we propose a biologically-led approach to biochemical pathway analysis using microarray data and relevant biological knowledge. The method selects a subset of genes for each pathway that describes the behaviour of the pathway at a given experimental condition, and transforms them into pathway signatures. The metabolic pathways of Escherichia coli are used as a case study.

  20. Improved preprocessing and data clustering for land mine discrimination

    NASA Astrophysics Data System (ADS)

    Mereddy, Pramodh; Agarwal, Sanjeev; Rao, Vittal S.

    2000-08-01

    In this paper we discuss an improved algorithm for sensor- specific data processing and unsupervised data clustering for landmine discrimination. Pre-processor and data- clustering modules forma central part of modular sensor fusion architecture for landmine detection and discrimination. The dynamic unsupervised clustering algorithm is based on Dignet clustering. The self-organizing capability of Dignet is based on the idea of competitive generation and elimination of attraction wells. The center, width and depth characterize each attraction well. The Dignet architecture assumes prior knowledge of the data characteristics in the form of predefine well width. In this paper some modifications to Dignet architecture are presented in order to make Dignet truly self-organizing and data independent clustering algorithm. Information theoretic per-processing is used to capture underlying statistical properties of the sensor data which in turn is used to define important parameter for Dignet clustering such as similarity metrics, initial cluster width etc. The width of the cluster is also adapted online so that a fixed width is not enforced. A suitable procedure for online merge and clean operations is defined to re-organize the cluster development. A concept of dual width is employed to satisfy the competing requirements of compact clusters and high coverage of the data space. The performance of the improved clustering algorithm is compared with base-line Dignet algorithm using simulated data.

  1. Performance of Multi-User Transmitter Pre-Processing Assisted Multi-Cell IDMA System for Downlink Transmission

    NASA Astrophysics Data System (ADS)

    Partibane, B.; Nagarajan, V.; Vishvaksenan, K. S.; Kalidoss, R.

    2015-06-01

    In this paper, we present the performance of multi-user transmitter pre-processing (MUTP) assisted coded-interleave division multiple access (IDMA) system over correlated frequency-selective channels for downlink communication. We realize MUTP using singular value decomposition (SVD) technique, which exploits the channel state information (CSI) of all the active users that is acquired via feedback channels. We consider the MUTP technique to alleviate the effects of co-channel interference (CCI) and multiple access interference (MAI). To be specific, we estimate the CSI using least square error (LSE) algorithm at each of the mobile stations (MSs) and perform vector quantization using Lloyd's algorithm, and feedback the bits that represents the quantized magnitudes and phases to the base station (BS) through the dedicated low rate noisy channel. Finally we recover the quantized bits at the BS to formulate the pre-processing matrix. The performance of MUTP aided IDMA systems are evaluated for five types of delay spread distributions pertaining to long-term evolution (LTE) and Stanford University Interim (SUI) channel models. We also compare the performance of MUTP with minimum mean square error (MMSE) detector for the coded IDMA system. The considered TP scheme alleviates the effects of CCI with less complex signal detection at the MSs when compared to MMSE detector. Further, our simulation results reveal that SVD-based MUTP assisted coded IDMA system outperforms the MMSE detector in terms of achievable bit error rate (BER) with low signal-to-noise ratio (SNR) requirement by mitigating the effects of CCI and MAI.

  2. Ontology-Based Analysis of Microarray Data.

    PubMed

    Giuseppe, Agapito; Milano, Marianna

    2016-01-01

    The importance of semantic-based methods and algorithms for the analysis and management of biological data is growing for two main reasons. From a biological side, knowledge contained in ontologies is more and more accurate and complete, from a computational side, recent algorithms are using in a valuable way such knowledge. Here we focus on semantic-based management and analysis of protein interaction networks referring to all the approaches of analysis of protein-protein interaction data that uses knowledge encoded into biological ontologies. Semantic approaches for studying high-throughput data have been largely used in the past to mine genomic and expression data. Recently, the emergence of network approaches for investigating molecular machineries has stimulated in a parallel way the introduction of semantic-based techniques for analysis and management of network data. The application of these computational approaches to the study of microarray data can broad the application scenario of them and simultaneously can help the understanding of disease development and progress.

  3. Multisensor data fusion algorithm development

    SciTech Connect

    Yocky, D.A.; Chadwick, M.D.; Goudy, S.P.; Johnson, D.K.

    1995-12-01

    This report presents a two-year LDRD research effort into multisensor data fusion. We approached the problem by addressing the available types of data, preprocessing that data, and developing fusion algorithms using that data. The report reflects these three distinct areas. First, the possible data sets for fusion are identified. Second, automated registration techniques for imagery data are analyzed. Third, two fusion techniques are presented. The first fusion algorithm is based on the two-dimensional discrete wavelet transform. Using test images, the wavelet algorithm is compared against intensity modulation and intensity-hue-saturation image fusion algorithms that are available in commercial software. The wavelet approach outperforms the other two fusion techniques by preserving spectral/spatial information more precisely. The wavelet fusion algorithm was also applied to Landsat Thematic Mapper and SPOT panchromatic imagery data. The second algorithm is based on a linear-regression technique. We analyzed the technique using the same Landsat and SPOT data.

  4. Microarray analysis in pulmonary hypertension.

    PubMed

    Hoffmann, Julia; Wilhelm, Jochen; Olschewski, Andrea; Kwapiszewska, Grazyna

    2016-07-01

    Microarrays are a powerful and effective tool that allows the detection of genome-wide gene expression differences between controls and disease conditions. They have been broadly applied to investigate the pathobiology of diverse forms of pulmonary hypertension, namely group 1, including patients with idiopathic pulmonary arterial hypertension, and group 3, including pulmonary hypertension associated with chronic lung diseases such as chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis. To date, numerous human microarray studies have been conducted to analyse global (lung homogenate samples), compartment-specific (laser capture microdissection), cell type-specific (isolated primary cells) and circulating cell (peripheral blood) expression profiles. Combined, they provide important information on development, progression and the end-stage disease. In the future, system biology approaches, expression of noncoding RNAs that regulate coding RNAs, and direct comparison between animal models and human disease might be of importance.

  5. DNA microarray technology in dermatology.

    PubMed

    Kunz, Manfred

    2008-03-01

    In recent years, DNA microarray technology has been used for the analysis of gene expression patterns in a variety of skin diseases, including malignant melanoma, psoriasis, lupus erythematosus, and systemic sclerosis. Many of the studies described herein confirmed earlier results on individual genes or functional groups of genes. However, a plethora of new candidate genes, gene patterns, and regulatory pathways have been identified. Major progresses were reached by the identification of a prognostic gene pattern in malignant melanoma, an immune signaling cluster in psoriasis, and a so-called interferon signature in systemic lupus erythematosus. In future, interference with genes or regulatory pathways with the use of different RNA interference technologies or targeted therapy may not only underscore the functional significance of microarray data but also may open interesting therapeutic perspectives. Large-scale gene expression analyses may also help to design more individualized treatment approaches of cutaneous diseases.

  6. Microarray analysis in pulmonary hypertension

    PubMed Central

    Hoffmann, Julia; Wilhelm, Jochen; Olschewski, Andrea

    2016-01-01

    Microarrays are a powerful and effective tool that allows the detection of genome-wide gene expression differences between controls and disease conditions. They have been broadly applied to investigate the pathobiology of diverse forms of pulmonary hypertension, namely group 1, including patients with idiopathic pulmonary arterial hypertension, and group 3, including pulmonary hypertension associated with chronic lung diseases such as chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis. To date, numerous human microarray studies have been conducted to analyse global (lung homogenate samples), compartment-specific (laser capture microdissection), cell type-specific (isolated primary cells) and circulating cell (peripheral blood) expression profiles. Combined, they provide important information on development, progression and the end-stage disease. In the future, system biology approaches, expression of noncoding RNAs that regulate coding RNAs, and direct comparison between animal models and human disease might be of importance. PMID:27076594

  7. Statistical designs for two-color spotted microarray experiments.

    PubMed

    Chai, Feng-Shun; Liao, Chen-Tuo; Tsai, Shin-Fu

    2007-04-01

    Two-color cDNA or oligonucleotide-based spotted microarrays have been commonly used in measuring the expression levels of thousands of genes simultaneously. To realize the immense potential of this powerful new technology, budgeted within limited resources or other constraints, practical designs with high efficiencies are in demand. In this study, we address the design issue concerning the arrangement of the mRNA samples labeled with fluorescent dyes and hybridized on the slides. A normalization model is proposed to characterize major sources of systematic variation in a two-color microarray experiment. This normalization model establishes a connection between designs for two-color microarray experiments with a particular class of classical row-column designs. A heuristic algorithm for constructing A-optimal or highly efficient designs is provided. Statistical optimality results are found for some of the designs generated from the algorithm. It is believed that the constructed designs are the best or very close to the best possible for estimating the relative gene expression levels among the mRNA samples of interest.

  8. The Current Status of DNA Microarrays

    NASA Astrophysics Data System (ADS)

    Shi, Leming; Perkins, Roger G.; Tong, Weida

    DNA microarray technology that allows simultaneous assay of thousands of genes in a single experiment has steadily advanced to become a mainstream method used in research, and has reached a stage that envisions its use in medical applications and personalized medicine. Many different strategies have been developed for manufacturing DNA microarrays. In this chapter, we discuss the manufacturing characteristics of seven microarray platforms that were used in a recently completed large study by the MicroArray Quality Control (MAQC) consortium, which evaluated the concordance of results across these platforms. The platforms can be grouped into three categories: (1) in situ synthesis of oligonucleotide probes on microarrays (Affymetrix GeneChip® arrays based on photolithography synthesis and Agilent's arrays based on inkjet synthesis); (2) spotting of presynthesized oligonucleotide probes on microarrays (GE Healthcare's CodeLink system, Applied Biosystems' Genome Survey Microarrays, and the custom microarrays printed with Operon's oligonucleotide set); and (3) deposition of presynthesized oligonucleotide probes on bead-based microarrays (Illumina's BeadChip microarrays). We conclude this chapter with our views on the challenges and opportunities toward acceptance of DNA microarray data in clinical and regulatory settings.

  9. The Current Status of DNA Microarrays

    NASA Astrophysics Data System (ADS)

    Shi, Leming; Perkins, Roger G.; Tong, Weida

    DNA microarray technology that allows simultaneous assay of thousands of genes in a single experiment has steadily advanced to become a mainstream method used in research, and has reached a stage that envisions its use in medical applications and personalized medicine. Many different strategies have been developed for manufacturing DNA microarrays. In this chapter, we discuss the manu facturing characteristics of seven microarray platforms that were used in a recently completed large study by the MicroArray Quality Control (MAQC) consortium, which evaluated the concordance of results across these platforms. The platforms can be grouped into three categories: (1) in situ synthesis of oligonucleotide probes on microarrays (Affymetrix GeneChip® arrays based on photolithography synthesis and Agilent's arrays based on inkjet synthesis); (2) spotting of presynthe-sized oligonucleotide probes on microarrays (GE Healthcare's CodeLink system, Applied Biosystems' Genome Survey Microarrays, and the custom microarrays printed with Operon's oligonucleotide set); and (3) deposition of presynthesized oligonucleotide probes on bead-based microarrays (Illumina's BeadChip microar-rays). We conclude this chapter with our views on the challenges and opportunities toward acceptance of DNA microarray data in clinical and regulatory settings.

  10. Comparison of contamination of femoral heads and pre-processed bone chips during hip revision arthroplasty.

    PubMed

    Mathijssen, N M C; Sturm, P D; Pilot, P; Bloem, R M; Buma, P; Petit, P L; Schreurs, B W

    2013-12-01

    With bone impaction grafting, cancellous bone chips made from allograft femoral heads are impacted in a bone defect, which introduces an additional source of infection. The potential benefit of the use of pre-processed bone chips was investigated by comparing the bacterial contamination of bone chips prepared intraoperatively with the bacterial contamination of pre-processed bone chips at different stages in the surgical procedure. To investigate baseline contamination of the bone grafts, specimens were collected during 88 procedures before actual use or preparation of the bone chips: in 44 procedures intraoperatively prepared chips were used (Group A) and in the other 44 procedures pre-processed bone chips were used (Group B). In 64 of these procedures (32 using locally prepared bone chips and 32 using pre-processed bone chips) specimens were also collected later in the procedure to investigate contamination after use and preparation of the bone chips. In total, 8 procedures had one or more positive specimen(s) (12.5 %). Contamination rates were not significantly different between bone chips prepared at the operating theatre and pre-processed bone chips. In conclusion, there was no difference in bacterial contamination between bone chips prepared from whole femoral heads in the operating room and pre-processed bone chips, and therefore, both types of bone allografts are comparable with respect to risk of infection.

  11. Spectral subtraction denoising preprocessing block to improve P300-based brain-computer interfacing

    PubMed Central

    2014-01-01

    Background The signals acquired in brain-computer interface (BCI) experiments usually involve several complicated sampling, artifact and noise conditions. This mandated the use of several strategies as preprocessing to allow the extraction of meaningful components of the measured signals to be passed along to further processing steps. In spite of the success present preprocessing methods have to improve the reliability of BCI, there is still room for further improvement to boost the performance even more. Methods A new preprocessing method for denoising P300-based brain-computer interface data that allows better performance with lower number of channels and blocks is presented. The new denoising technique is based on a modified version of the spectral subtraction denoising and works on each temporal signal channel independently thus offering seamless integration with existing preprocessing and allowing low channel counts to be used. Results The new method is verified using experimental data and compared to the classification results of the same data without denoising and with denoising using present wavelet shrinkage based technique. Enhanced performance in different experiments as quantitatively assessed using classification block accuracy as well as bit rate estimates was confirmed. Conclusion The new preprocessing method based on spectral subtraction denoising offer superior performance to existing methods and has potential for practical utility as a new standard preprocessing block in BCI signal processing. PMID:24708647

  12. Microarrays, antiobesity and the liver

    PubMed Central

    Castro-Chávez, Fernando

    2013-01-01

    In this review, the microarray technology and especially oligonucleotide arrays are exemplified with a practical example taken from the perilipin−/− mice and using the dChip software, available for non-lucrative purposes. It was found that the liver of perilipin−/− mice was healthy and normal, even under high-fat diet when compared with the results published for the scd1−/− mice, which under high-fat diets had a darker liver, suggestive of hepatic steatosis. Scd1 is required for the biosynthesis of monounsaturated fatty acids and plays a key role in the hepatic synthesis of triglycerides and of very-low-density lipoproteins. Both models of obesity resistance share many similar phenotypic antiobesity features, however, the perilipin−/− mice had a significant downregulation of stearoyl CoA desaturases scd1 and scd2 in its white adipose tissue, but a normal level of both genes inside the liver, even under high-fat diet. Here, different microarray methodologies are discussed, and also some of the most recent discoveries and perspectives regarding the use of microarrays, with an emphasis on obesity gene expression, and a personal remark on my findings of increased expression for hemoglobin transcripts and other hemo related genes (hemo-like), and for leukocyte like (leuko-like) genes inside the white adipose tissue of the perilipin−/− mice. In conclusion, microarrays have much to offer in comparative studies such as those in antiobesity, and also they are methodologies adequate for new astounding molecular discoveries [free full text of this article PMID:15657555

  13. A Comparative Investigation of the Combined Effects of Pre-Processing, Wavelength Selection, and Regression Methods on Near-Infrared Calibration Model Performance.

    PubMed

    Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N

    2017-07-01

    Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant

  14. Automatic design of decision-tree algorithms with evolutionary algorithms.

    PubMed

    Barros, Rodrigo C; Basgalupp, Márcio P; de Carvalho, André C P L F; Freitas, Alex A

    2013-01-01

    This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability to provide an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated over more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI datasets and 10 microarray gene expression datasets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART.

  15. Washing scaling of GeneChip microarray expression

    PubMed Central

    2010-01-01

    Background Post-hybridization washing is an essential part of microarray experiments. Both the quality of the experimental washing protocol and adequate consideration of washing in intensity calibration ultimately affect the quality of the expression estimates extracted from the microarray intensities. Results We conducted experiments on GeneChip microarrays with altered protocols for washing, scanning and staining to study the probe-level intensity changes as a function of the number of washing cycles. For calibration and analysis of the intensity data we make use of the 'hook' method which allows intensity contributions due to non-specific and specific hybridization of perfect match (PM) and mismatch (MM) probes to be disentangled in a sequence specific manner. On average, washing according to the standard protocol removes about 90% of the non-specific background and about 30-50% and less than 10% of the specific targets from the MM and PM, respectively. Analysis of the washing kinetics shows that the signal-to-noise ratio doubles roughly every ten stringent washing cycles. Washing can be characterized by time-dependent rate constants which reflect the heterogeneous character of target binding to microarray probes. We propose an empirical washing function which estimates the survival of probe bound targets. It depends on the intensity contribution due to specific and non-specific hybridization per probe which can be estimated for each probe using existing methods. The washing function allows probe intensities to be calibrated for the effect of washing. On a relative scale, proper calibration for washing markedly increases expression measures, especially in the limit of small and large values. Conclusions Washing is among the factors which potentially distort expression measures. The proposed first-order correction method allows direct implementation in existing calibration algorithms for microarray data. We provide an experimental 'washing data set' which might

  16. Recommendations for the use of microarrays in prenatal diagnosis.

    PubMed

    Suela, Javier; López-Expósito, Isabel; Querejeta, María Eugenia; Martorell, Rosa; Cuatrecasas, Esther; Armengol, Lluis; Antolín, Eugenia; Domínguez Garrido, Elena; Trujillo-Tiebas, María José; Rosell, Jordi; García Planells, Javier; Cigudosa, Juan Cruz

    2017-04-07

    Microarray technology, recently implemented in international prenatal diagnosis systems, has become one of the main techniques in this field in terms of detection rate and objectivity of the results. This guideline attempts to provide background information on this technology, including technical and diagnostic aspects to be considered. Specifically, this guideline defines: the different prenatal sample types to be used, as well as their characteristics (chorionic villi samples, amniotic fluid, fetal cord blood or miscarriage tissue material); variant reporting policies (including variants of uncertain significance) to be considered in informed consents and prenatal microarray reports; microarray limitations inherent to the technique and which must be taken into account when recommending microarray testing for diagnosis; a detailed clinical algorithm recommending the use of microarray testing and its introduction into routine clinical practice within the context of other genetic tests, including pregnancies in families with a genetic history or specific syndrome suspicion, first trimester increased nuchal translucency or second trimester heart malformation and ultrasound findings not related to a known or specific syndrome. This guideline has been coordinated by the Spanish Association for Prenatal Diagnosis (AEDP, «Asociación Española de Diagnóstico Prenatal»), the Spanish Human Genetics Association (AEGH, «Asociación Española de Genética Humana») and the Spanish Society of Clinical Genetics and Dysmorphology (SEGCyD, «Sociedad Española de Genética Clínica y Dismorfología»). Copyright © 2017 Elsevier España, S.L.U. All rights reserved.

  17. Grouping preprocess for haplotype inference from SNP and CNV data

    NASA Astrophysics Data System (ADS)

    Shindo, Hiroyuki; Chigira, Hiroshi; Nagaoka, Tomoyo; Kamatani, Naoyuki; Inoue, Masato

    2009-12-01

    The method of statistical haplotype inference is an indispensable technique in the field of medical science. The authors previously reported Hardy-Weinberg equilibrium-based haplotype inference that could manage single nucleotide polymorphism (SNP) data. We recently extended the method to cover copy number variation (CNV) data. Haplotype inference from mixed data is important because SNPs and CNVs are occasionally in linkage disequilibrium. The idea underlying the proposed method is simple, but the algorithm for it needs to be quite elaborate to reduce the calculation cost. Consequently, we have focused on the details on the algorithm in this study. Although the main advantage of the method is accuracy, in that it does not use any approximation, its main disadvantage is still the calculation cost, which is sometimes intractable for large data sets with missing values.

  18. Microarray Inspector: tissue cross contamination detection tool for microarray data.

    PubMed

    Stępniak, Piotr; Maycock, Matthew; Wojdan, Konrad; Markowska, Monika; Perun, Serhiy; Srivastava, Aashish; Wyrwicz, Lucjan S; Świrski, Konrad

    2013-01-01

    Microarray technology changed the landscape of contemporary life sciences by providing vast amounts of expression data. Researchers are building up repositories of experiment results with various conditions and samples which serve the scientific community as a precious resource. Ensuring that the sample is of high quality is of utmost importance to this effort. The task is complicated by the fact that in many cases datasets lack information concerning pre-experimental quality assessment. Transcription profiling of tissue samples may be invalidated by an error caused by heterogeneity of the material. The risk of tissue cross contamination is especially high in oncological studies, where it is often difficult to extract the sample. Therefore, there is a need of developing a method detecting tissue contamination in a post-experimental phase. We propose Microarray Inspector: customizable, user-friendly software that enables easy detection of samples containing mixed tissue types. The advantage of the tool is that it uses raw expression data files and analyses each array independently. In addition, the system allows the user to adjust the criteria of the analysis to conform to individual needs and research requirements. The final output of the program contains comfortable to read reports about tissue contamination assessment with detailed information about the test parameters and results. Microarray Inspector provides a list of contaminant biomarkers needed in the analysis of adipose tissue contamination. Using real data (datasets from public repositories) and our tool, we confirmed high specificity of the software in detecting contamination. The results indicated the presence of adipose tissue admixture in a range from approximately 4% to 13% in several tested surgical samples.

  19. ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses

    PubMed Central

    Stokes, Todd H; Torrance, JT; Li, Henry; Wang, May D

    2008-01-01

    Background A survey of microarray databases reveals that most of the repository contents and data models are heterogeneous (i.e., data obtained from different chip manufacturers), and that the repositories provide only basic biological keywords linking to PubMed. As a result, it is difficult to find datasets using research context or analysis parameters information beyond a few keywords. For example, to reduce the "curse-of-dimension" problem in microarray analysis, the number of samples is often increased by merging array data from different datasets. Knowing chip data parameters such as pre-processing steps (e.g., normalization, artefact removal, etc), and knowing any previous biological validation of the dataset is essential due to the heterogeneity of the data. However, most of the microarray repositories do not have meta-data information in the first place, and do not have a a mechanism to add or insert this information. Thus, there is a critical need to create "intelligent" microarray repositories that (1) enable update of meta-data with the raw array data, and (2) provide standardized archiving protocols to minimize bias from the raw data sources. Results To address the problems discussed, we have developed a community maintained system called ArrayWiki that unites disparate meta-data of microarray meta-experiments from multiple primary sources with four key features. First, ArrayWiki provides a user-friendly knowledge management interface in addition to a programmable interface using standards developed by Wikipedia. Second, ArrayWiki includes automated quality control processes (caCORRECT) and novel visualization methods (BioPNG, Gel Plots), which provide extra information about data quality unavailable in other microarray repositories. Third, it provides a user-curation capability through the familiar Wiki interface. Fourth, ArrayWiki provides users with simple text-based searches across all experiment meta-data, and exposes data to search engine crawlers

  20. Gene Expression Browser: large-scale and cross-experiment microarray data integration, management, search & visualization

    PubMed Central

    2010-01-01

    Background In the last decade, a large amount of microarray gene expression data has been accumulated in public repositories. Integrating and analyzing high-throughput gene expression data have become key activities for exploring gene functions, gene networks and biological pathways. Effectively utilizing these invaluable microarray data remains challenging due to a lack of powerful tools to integrate large-scale gene-expression information across diverse experiments and to search and visualize a large number of gene-expression data points. Results Gene Expression Browser is a microarray data integration, management and processing system with web-based search and visualization functions. An innovative method has been developed to define a treatment over a control for every microarray experiment to standardize and make microarray data from different experiments homogeneous. In the browser, data are pre-processed offline and the resulting data points are visualized online with a 2-layer dynamic web display. Users can view all treatments over control that affect the expression of a selected gene via Gene View, and view all genes that change in a selected treatment over control via treatment over control View. Users can also check the changes of expression profiles of a set of either the treatments over control or genes via Slide View. In addition, the relationships between genes and treatments over control are computed according to gene expression ratio and are shown as co-responsive genes and co-regulation treatments over control. Conclusion Gene Expression Browser is composed of a set of software tools, including a data extraction tool, a microarray data-management system, a data-annotation tool, a microarray data-processing pipeline, and a data search & visualization tool. The browser is deployed as a free public web service (http://www.ExpressionBrowser.com) that integrates 301 ATH1 gene microarray experiments from public data repositories (viz. the Gene

  1. ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses.

    PubMed

    Stokes, Todd H; Torrance, J T; Li, Henry; Wang, May D

    2008-05-28

    A survey of microarray databases reveals that most of the repository contents and data models are heterogeneous (i.e., data obtained from different chip manufacturers), and that the repositories provide only basic biological keywords linking to PubMed. As a result, it is difficult to find datasets using research context or analysis parameters information beyond a few keywords. For example, to reduce the "curse-of-dimension" problem in microarray analysis, the number of samples is often increased by merging array data from different datasets. Knowing chip data parameters such as pre-processing steps (e.g., normalization, artefact removal, etc), and knowing any previous biological validation of the dataset is essential due to the heterogeneity of the data. However, most of the microarray repositories do not have meta-data information in the first place, and do not have a a mechanism to add or insert this information. Thus, there is a critical need to create "intelligent" microarray repositories that (1) enable update of meta-data with the raw array data, and (2) provide standardized archiving protocols to minimize bias from the raw data sources. To address the problems discussed, we have developed a community maintained system called ArrayWiki that unites disparate meta-data of microarray meta-experiments from multiple primary sources with four key features. First, ArrayWiki provides a user-friendly knowledge management interface in addition to a programmable interface using standards developed by Wikipedia. Second, ArrayWiki includes automated quality control processes (caCORRECT) and novel visualization methods (BioPNG, Gel Plots), which provide extra information about data quality unavailable in other microarray repositories. Third, it provides a user-curation capability through the familiar Wiki interface. Fourth, ArrayWiki provides users with simple text-based searches across all experiment meta-data, and exposes data to search engine crawlers (Semantic Agents

  2. Workflows for microarray data processing in the Kepler environment

    PubMed Central

    2012-01-01

    Background Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. Results We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or

  3. Workflows for microarray data processing in the Kepler environment.

    PubMed

    Stropp, Thomas; McPhillips, Timothy; Ludäscher, Bertram; Bieda, Mark

    2012-05-17

    Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R

  4. FPGA based system for automatic cDNA microarray image processing.

    PubMed

    Belean, Bogdan; Borda, Monica; Le Gal, Bertrand; Terebes, Romulus

    2012-07-01

    Automation is an open subject in DNA microarray image processing, aiming reliable gene expression estimation. The paper presents a novel shock filter based approach for automatic microarray grid alignment. The proposed method brings up significantly reduced computational complexity compared to state of the art approaches, while similar results in terms of accuracy are achieved. Based on this approach, we also propose an FPGA based system for microarray image analysis that eliminates the shortcomings of existing software platforms: user intervention, increased computational time and cost. Our system includes application-specific architectures which involve algorithm parallelization, aiming fast and automated cDNA microarray image processing. The proposed automated image processing chain is implemented both on a general purpose processor and using the developed hardware architectures as co-processors in a FPGA based system. The comparative results included in the last section show that an important gain in terms of computational time is obtained using hardware based implementations.

  5. Constructing the gene regulation-level representation of microarray data for cancer classification.

    PubMed

    Wong, Hau-San; Wang, Hong-Qiang

    2008-02-01

    In this paper, we propose a regulation-level representation for microarray data and optimize it using genetic algorithms (GAs) for cancer classification. Compared with the traditional expression-level features, this representation can greatly reduce the dimensionality of microarray data and accommodate noise and variability such that many statistical machine-learning methods now become applicable and efficient for cancer classification. Experimental results on real-world microarray datasets show that the regulation-level representation can consistently converge at a solution with three regulation levels. This verifies the existence of the three regulation levels (up-regulation, down-regulation and non-significant regulation) associated with a particular biological phenotype. The ternary regulation-level representation not only improves the cancer classification capability but also facilitates the visualization of microarray data.

  6. Reconfiguration-based implementation of SVM classifier on FPGA for Classifying Microarray data.

    PubMed

    Hussain, Hanaa M; Benkrid, Khaled; Seker, Huseyin

    2013-01-01

    Classifying Microarray data, which are of high dimensional nature, requires high computational power. Support Vector Machines-based classifier (SVM) is among the most common and successful classifiers used in the analysis of Microarray data but also requires high computational power due to its complex mathematical architecture. Implementing SVM on hardware exploits the parallelism available within the algorithm kernels to accelerate the classification of Microarray data. In this work, a flexible, dynamically and partially reconfigurable implementation of the SVM classifier on Field Programmable Gate Array (FPGA) is presented. The SVM architecture achieved up to 85× speed-up over equivalent general purpose processor (GPP) showing the capability of FPGAs in enhancing the performance of SVM-based analysis of Microarray data as well as future bioinformatics applications.

  7. Multi-task feature selection in microarray data by binary integer programming

    PubMed Central

    2013-01-01

    A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods. PMID:24564944

  8. Generation of attributes for learning algorithms

    SciTech Connect

    Hu, Yuh-Jyh; Kibler, D.

    1996-12-31

    Inductive algorithms rely strongly on their representational biases. Constructive induction can mitigate representational inadequacies. This paper introduces the notion of a relative gain measure and describes a new constructive induction algorithm (GALA) which is independent of the learning algorithm. Unlike most previous research on constructive induction, our methods are designed as preprocessing step before standard machine learning algorithms are applied. We present the results which demonstrate the effectiveness of GALA on artificial and real domains for several learners: C4.5, CN2, perceptron and backpropagation.

  9. Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.

    PubMed

    Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi

    2013-01-01

    The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.

  10. Surface characterization of carbohydrate microarrays.

    PubMed

    Scurr, David J; Horlacher, Tim; Oberli, Matthias A; Werz, Daniel B; Kroeck, Lenz; Bufali, Simone; Seeberger, Peter H; Shard, Alexander G; Alexander, Morgan R

    2010-11-16

    Carbohydrate microarrays are essential tools to determine the biological function of glycans. Here, we analyze a glycan array by time-of-flight secondary ion mass spectrometry (ToF-SIMS) to gain a better understanding of the physicochemical properties of the individual spots and to improve carbohydrate microarray quality. The carbohydrate microarray is prepared by piezo printing of thiol-terminated sugars onto a maleimide functionalized glass slide. The hyperspectral ToF-SIMS imaging data are analyzed by multivariate curve resolution (MCR) to discern secondary ions from regions of the array containing saccharide, linker, salts from the printing buffer, and the background linker chemistry. Analysis of secondary ions from the linker common to all of the sugar molecules employed reveals a relatively uniform distribution of the sugars within the spots formed from solutions with saccharide concentration of 0.4 mM and less, whereas a doughnut shape is often formed at higher-concentration solutions. A detailed analysis of individual spots reveals that in the larger spots the phosphate buffered saline (PBS) salts are heterogeneously distributed, apparently resulting in saccharide concentrated at the rim of the spots. A model of spot formation from the evaporating sessile drop is proposed to explain these observations. Saccharide spot diameters increase with saccharide concentration due to a reduction in surface tension of the saccharide solution compared to PBS. The multivariate analytical partial least squares (PLS) technique identifies ions from the sugars that in the complex ToF-SIMS spectra correlate with the binding of galectin proteins.

  11. Spotting effect in microarray experiments

    PubMed Central

    Mary-Huard, Tristan; Daudin, Jean-Jacques; Robin, Stéphane; Bitton, Frédérique; Cabannes, Eric; Hilson, Pierre

    2004-01-01

    Background Microarray data must be normalized because they suffer from multiple biases. We have identified a source of spatial experimental variability that significantly affects data obtained with Cy3/Cy5 spotted glass arrays. It yields a periodic pattern altering both signal (Cy3/Cy5 ratio) and intensity across the array. Results Using the variogram, a geostatistical tool, we characterized the observed variability, called here the spotting effect because it most probably arises during steps in the array printing procedure. Conclusions The spotting effect is not appropriately corrected by current normalization methods, even by those addressing spatial variability. Importantly, the spotting effect may alter differential and clustering analysis. PMID:15151695

  12. Preprocessing Inconsistent Linear System for a Meaningful Least Squares Solution

    NASA Technical Reports Server (NTRS)

    Sen, Syamal K.; Shaykhian, Gholam Ali

    2011-01-01

    Mathematical models of many physical/statistical problems are systems of linear equations. Due to measurement and possible human errors/mistakes in modeling/data, as well as due to certain assumptions to reduce complexity, inconsistency (contradiction) is injected into the model, viz. the linear system. While any inconsistent system irrespective of the degree of inconsistency has always a least-squares solution, one needs to check whether an equation is too much inconsistent or, equivalently too much contradictory. Such an equation will affect/distort the least-squares solution to such an extent that renders it unacceptable/unfit to be used in a real-world application. We propose an algorithm which (i) prunes numerically redundant linear equations from the system as these do not add any new information to the model, (ii) detects contradictory linear equations along with their degree of contradiction (inconsistency index), (iii) removes those equations presumed to be too contradictory, and then (iv) obtain the minimum norm least-squares solution of the acceptably inconsistent reduced linear system. The algorithm presented in Matlab reduces the computational and storage complexities and also improves the accuracy of the solution. It also provides the necessary warning about the existence of too much contradiction in the model. In addition, we suggest a thorough relook into the mathematical modeling to determine the reason why unacceptable contradiction has occurred thus prompting us to make necessary corrections/modifications to the models - both mathematical and, if necessary, physical.

  13. A hybrid preprocessing method using geometry based diffusion and elective enhancement filtering for pulmonary nodule detection

    NASA Astrophysics Data System (ADS)

    Dhara, Ashis K.; Mukhopadhyay, Sudipta

    2012-03-01

    The computer aided diagnostic (CAD) system has been developed to assist radiologist for early detection and analysis of lung nodules. For pulmonary nodule detection, image preprocessing is required to remove the anatomical structure of lung parenchyma and to enhance the visibility of pulmonary nodules. In this paper a hybrid preprocessing technique using geometry based diffusion and selective enhancement filtering have been proposed. This technique provides a unified preprocessing framework for solid nodule as well as ground glass opacity (GGO) nodules. Geometry based diffusion is applied to smooth the images by preserving the boundary. In order to improve the sensitivity of pulmonary nodule detection, selective enhancement filter is used to highlight blob like structure. But selective enhancement filter sometimes enhances the structures like blood vessel and airways other than nodule and results in large number of false positive. In first step, geometry based diffusion (GBD) is applied for reduction of false positive and in second step, selective enhancement filtering is used for further reduction of false negative. Geometry based diffusion and selective enhancement filtering has been used as preprocessing step separately but their combined effect was not investigated earlier. This hybrid preprocessing approach is suitable for accurate calculation of voxel based features. The proposed method has been validated on one public database named Lung Image Database Consortium (LIDC) containing 50 nodules (30 solid and 20 GGO nodule) from 30 subjects and one private database containing 40 nodules (25 solid and 15 GGO nodule) from 30 subjects.

  14. On the Development of Parafoveal Preprocessing: Evidence from the Incremental Boundary Paradigm

    PubMed Central

    Marx, Christina; Hutzler, Florian; Schuster, Sarah; Hawelka, Stefan

    2016-01-01

    Parafoveal preprocessing of upcoming words and the resultant preview benefit are key aspects of fluent reading. Evidence regarding the development of parafoveal preprocessing during reading acquisition, however, is scarce. The present developmental (cross-sectional) eye tracking study estimated the magnitude of parafoveal preprocessing of beginning readers with a novel variant of the classical boundary paradigm. Additionally, we assessed the association of parafoveal preprocessing with several reading-related psychometric measures. The participants were children learning to read the regular German orthography with about 1, 3, and 5 years of formal reading instruction (Grade 2, 4, and 6, respectively). We found evidence of parafoveal preprocessing in each Grade. However, an effective use of parafoveal information was related to the individual reading fluency of the participants (i.e., the reading rate expressed as words-per-minute) which substantially overlapped between the Grades. The size of the preview benefit was furthermore associated with the children’s performance in rapid naming tasks and with their performance in a pseudoword reading task. The latter task assessed the children’s efficiency in phonological decoding and our findings show that the best decoders exhibited the largest preview benefit. PMID:27148123

  15. A distribution-free convolution model for background correction of oligonucleotide microarray data

    PubMed Central

    Chen, Zhongxue; McGee, Monnie; Liu, Qingzhong; Kong, Megan; Deng, Youping; Scheuermann, Richard H

    2009-01-01

    Introduction Affymetrix GeneChip® high-density oligonucleotide arrays are widely used in biological and medical research because of production reproducibility, which facilitates the comparison of results between experiment runs. In order to obtain high-level classification and cluster analysis that can be trusted, it is important to perform various pre-processing steps on the probe-level data to control for variability in sample processing and array hybridization. Many proposed preprocessing methods are parametric, in that they assume that the background noise generated by microarray data is a random sample from a statistical distribution, typically a normal distribution. The quality of the final results depends on the validity of such assumptions. Results We propose a Distribution Free Convolution Model (DFCM) to circumvent observed deficiencies in meeting and validating distribution assumptions of parametric methods. Knowledge of array structure and the biological function of the probes indicate that the intensities of mismatched (MM) probes that correspond to the smallest perfect match (PM) intensities can be used to estimate the background noise. Specifically, we obtain the smallest q2 percent of the MM intensities that are associated with the lowest q1 percent PM intensities, and use these intensities to estimate background. Conclusion Using the Affymetrix Latin Square spike-in experiments, we show that the background noise generated by microarray experiments typically is not well modeled by a single overall normal distribution. We further show that the signal is not exponentially distributed, as is also commonly assumed. Therefore, DFCM has better sensitivity and specificity, as measured by ROC curves and area under the curve (AUC) than MAS 5.0, RMA, RMA with no background correction (RMA-noBG), GCRMA, PLIER, and dChip (MBEI) for preprocessing of Affymetrix microarray data. These results hold for two spike-in data sets and one real data set that were

  16. THE ABRF MARG MICROARRAY SURVEY 2005: TAKING THE PULSE ON THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years microarray technology has evolved into a critical component of any discovery based program. Since 1999, the Association of Biomolecular Resource Facilities (ABRF) Microarray Research Group (MARG) has conducted biennial surveys designed to generate a pr...

  17. THE ABRF MARG MICROARRAY SURVEY 2005: TAKING THE PULSE ON THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years microarray technology has evolved into a critical component of any discovery based program. Since 1999, the Association of Biomolecular Resource Facilities (ABRF) Microarray Research Group (MARG) has conducted biennial surveys designed to generate a pr...

  18. Living Cell Microarrays: An Overview of Concepts

    PubMed Central

    Jonczyk, Rebecca; Kurth, Tracy; Lavrentieva, Antonina; Walter, Johanna-Gabriela; Scheper, Thomas; Stahl, Frank

    2016-01-01

    Living cell microarrays are a highly efficient cellular screening system. Due to the low number of cells required per spot, cell microarrays enable the use of primary and stem cells and provide resolution close to the single-cell level. Apart from a variety of conventional static designs, microfluidic microarray systems have also been established. An alternative format is a microarray consisting of three-dimensional cell constructs ranging from cell spheroids to cells encapsulated in hydrogel. These systems provide an in vivo-like microenvironment and are preferably used for the investigation of cellular physiology, cytotoxicity, and drug screening. Thus, many different high-tech microarray platforms are currently available. Disadvantages of many systems include their high cost, the requirement of specialized equipment for their manufacture, and the poor comparability of results between different platforms. In this article, we provide an overview of static, microfluidic, and 3D cell microarrays. In addition, we describe a simple method for the printing of living cell microarrays on modified microscope glass slides using standard DNA microarray equipment available in most laboratories. Applications in research and diagnostics are discussed, e.g., the selective and sensitive detection of biomarkers. Finally, we highlight current limitations and the future prospects of living cell microarrays. PMID:27600077

  19. Transcriptome Analysis of Zebrafish Embryogenesis Using Microarrays

    PubMed Central

    Mathavan, Sinnakaruppan; Lee, Serene G. P; Mak, Alicia; Miller, Lance D; Murthy, Karuturi Radha Krishna; Govindarajan, Kunde R; Tong, Yan; Wu, Yi Lian; Lam, Siew Hong; Yang, Henry; Ruan, Yijun; Korzh, Vladimir; Gong, Zhiyuan; Liu, Edison T; Lufkin, Thomas

    2005-01-01

    Zebrafish (Danio rerio) is a well-recognized model for the study of vertebrate developmental genetics, yet at the same time little is known about the transcriptional events that underlie zebrafish embryogenesis. Here we have employed microarray analysis to study the temporal activity of developmentally regulated genes during zebrafish embryogenesis. Transcriptome analysis at 12 different embryonic time points covering five different developmental stages (maternal, blastula, gastrula, segmentation, and pharyngula) revealed a highly dynamic transcriptional profile. Hierarchical clustering, stage-specific clustering, and algorithms to detect onset and peak of gene expression revealed clearly demarcated transcript clusters with maximum gene activity at distinct developmental stages as well as co-regulated expression of gene groups involved in dedicated functions such as organogenesis. Our study also revealed a previously unidentified cohort of genes that are transcribed prior to the mid-blastula transition, a time point earlier than when the zygotic genome was traditionally thought to become active. Here we provide, for the first time to our knowledge, a comprehensive list of developmentally regulated zebrafish genes and their expression profiles during embryogenesis, including novel information on the temporal expression of several thousand previously uncharacterized genes. The expression data generated from this study are accessible to all interested scientists from our institute resource database (http://giscompute.gis.a-star.edu.sg/~govind/zebrafish/data_download.html). PMID:16132083

  20. "Harshlighting" small blemishes on microarrays

    PubMed Central

    Suárez-Fariñas, Mayte; Haider, Asifa; Wittkowski, Knut M

    2005-01-01

    Background Microscopists are familiar with many blemishes that fluorescence images can have due to dust and debris, glass flaws, uneven distribution of fluids or surface coatings, etc. Microarray scans show similar artefacts, which affect the analysis, particularly when one tries to detect subtle changes. However, most blemishes are hard to find by the unaided eye, particularly in high-density oligonucleotide arrays (HDONAs). Results We present a method that harnesses the statistical power provided by having several HDONAs available, which are obtained under similar conditions except for the experimental factor. This method "harshlights" blemishes and renders them evident. We find empirically that about 25% of our chips are blemished, and we analyze the impact of masking them on screening for differentially expressed genes. Conclusion Experiments attempting to assess subtle expression changes should be carefully screened for blemishes on the chips. The proposed method provides investigators with a novel robust approach to improve the sensitivity of microarray analyses. By utilizing topological information to identify and mask blemishes prior to model based analyses, the method prevents artefacts from confounding the process of background correction, normalization, and summarization. PMID:15784152

  1. "Harshlighting" small blemishes on microarrays.

    PubMed

    Suárez-Fariñas, Mayte; Haider, Asifa; Wittkowski, Knut M

    2005-03-22

    Microscopists are familiar with many blemishes that fluorescence images can have due to dust and debris, glass flaws, uneven distribution of fluids or surface coatings, etc. Microarray scans show similar artefacts, which affect the analysis, particularly when one tries to detect subtle changes. However, most blemishes are hard to find by the unaided eye, particularly in high-density oligonucleotide arrays (HDONAs). We present a method that harnesses the statistical power provided by having several HDONAs available, which are obtained under similar conditions except for the experimental factor. This method "harshlights" blemishes and renders them evident. We find empirically that about 25% of our chips are blemished, and we analyze the impact of masking them on screening for differentially expressed genes. Experiments attempting to assess subtle expression changes should be carefully screened for blemishes on the chips. The proposed method provides investigators with a novel robust approach to improve the sensitivity of microarray analyses. By utilizing topological information to identify and mask blemishes prior to model based analyses, the method prevents artefacts from confounding the process of background correction, normalization, and summarization.

  2. Automatic image analysis and spot classification for detection of pathogenic Escherichia coli on glass slide DNA microarrays

    USDA-ARS?s Scientific Manuscript database

    A computer algorithm was created to inspect scanned images from DNA microarray slides developed to rapidly detect and genotype E. Coli O157 virulent strains. The algorithm computes centroid locations for signal and background pixels in RGB space and defines a plane perpendicular to the line connect...

  3. 2008 Microarray Research Group (MARG Survey): Sensing the State of Microarray Technology

    EPA Science Inventory

    Over the past several years, the field of microarrays has grown and evolved drastically. In its continued efforts to track this evolution and transformation, the ABRF-MARG has once again conducted a survey of international microarray facilities and individual microarray users. Th...

  4. THE ABRF-MARG MICROARRAY SURVEY 2004: TAKING THE PULSE OF THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years, the field of microarrays has grown and evolved drastically. In its continued efforts to track this evolution, the ABRF-MARG has once again conducted a survey of international microarray facilities and individual microarray users. The goal of the surve...

  5. THE ABRF-MARG MICROARRAY SURVEY 2004: TAKING THE PULSE OF THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years, the field of microarrays has grown and evolved drastically. In its continued efforts to track this evolution, the ABRF-MARG has once again conducted a survey of international microarray facilities and individual microarray users. The goal of the surve...

  6. Preprocessed barley, rye, and triticale as a feedstock for an integrated fuel ethanol-feedlot plant

    SciTech Connect

    Sosulski, K.; Wang, Sunmin; Ingledew, W.M.

    1997-12-31

    Rye, triticale, and barley were evaluated as starch feedstock to replace wheat for ethanol production. Preprocessing of grain by abrasion on a Satake mill reduced fiber and increased starch concentrations in feed-stock for fermentations. Higher concentrations of starch in flours from preprocessed cereal grains would increase plant throughput by 8-23% since more starch is processed in the same weight of feedstock. Increased concentrations of starch for fermentation resulted in higher concentrations of ethanol in beer. Energy requirements to produce one L of ethanol from preprocessed grains were reduced, the natural gas by 3.5-11.4%, whereas power consumption was reduced by 5.2-15.6%. 7 refs., 7 figs., 4 tabs.

  7. MAGMA: analysis of two-channel microarrays made easy.

    PubMed

    Rehrauer, Hubert; Zoller, Stefan; Schlapbach, Ralph

    2007-07-01

    The web application MAGMA provides a simple and intuitive interface to identify differentially expressed genes from two-channel microarray data. While the underlying algorithms are not superior to those of similar web applications, MAGMA is particularly user friendly and can be used without prior training. The user interface guides the novice user through the most typical microarray analysis workflow consisting of data upload, annotation, normalization and statistical analysis. It automatically generates R-scripts that document MAGMA's entire data processing steps, thereby allowing the user to regenerate all results in his local R installation. The implementation of MAGMA follows the model-view-controller design pattern that strictly separates the R-based statistical data processing, the web-representation and the application logic. This modular design makes the application flexible and easily extendible by experts in one of the fields: statistical microarray analysis, web design or software development. State-of-the-art Java Server Faces technology was used to generate the web interface and to perform user input processing. MAGMA's object-oriented modular framework makes it easily extendible and applicable to other fields and demonstrates that modern Java technology is also suitable for rather small and concise academic projects. MAGMA is freely available at www.magma-fgcz.uzh.ch.

  8. Classification of Microarray Data Using Kernel Fuzzy Inference System.

    PubMed

    Kumar, Mukesh; Kumar Rath, Santanu

    2014-01-01

    The DNA microarray classification technique has gained more popularity in both research and practice. In real data analysis, such as microarray data, the dataset contains a huge number of insignificant and irrelevant features that tend to lose useful information. Classes with high relevance and feature sets with high significance are generally referred for the selected features, which determine the samples classification into their respective classes. In this paper, kernel fuzzy inference system (K-FIS) algorithm is applied to classify the microarray data (leukemia) using t-test as a feature selection method. Kernel functions are used to map original data points into a higher-dimensional (possibly infinite-dimensional) feature space defined by a (usually nonlinear) function ϕ through a mathematical process called the kernel trick. This paper also presents a comparative study for classification using K-FIS along with support vector machine (SVM) for different set of features (genes). Performance parameters available in the literature such as precision, recall, specificity, F-measure, ROC curve, and accuracy are considered to analyze the efficiency of the classification model. From the proposed approach, it is apparent that K-FIS model obtains similar results when compared with SVM model. This is an indication that the proposed approach relies on kernel function.

  9. MAGMA: analysis of two-channel microarrays made easy

    PubMed Central

    Rehrauer, Hubert; Zoller, Stefan; Schlapbach, Ralph

    2007-01-01

    The web application MAGMA provides a simple and intuitive interface to identify differentially expressed genes from two-channel microarray data. While the underlying algorithms are not superior to those of similar web applications, MAGMA is particularly user friendly and can be used without prior training. The user interface guides the novice user through the most typical microarray analysis workflow consisting of data upload, annotation, normalization and statistical analysis. It automatically generates R-scripts that document MAGMA's entire data processing steps, thereby allowing the user to regenerate all results in his local R installation. The implementation of MAGMA follows the model-view-controller design pattern that strictly separates the R-based statistical data processing, the web-representation and the application logic. This modular design makes the application flexible and easily extendible by experts in one of the fields: statistical microarray analysis, web design or software development. State-of-the-art Java Server Faces technology was used to generate the web interface and to perform user input processing. MAGMA's object-oriented modular framework makes it easily extendible and applicable to other fields and demonstrates that modern Java technology is also suitable for rather small and concise academic projects. MAGMA is freely available at www.magma-fgcz.uzh.ch. PMID:17517778

  10. Optimization of Preprocessing and Densification of Sorghum Stover at Full-scale Operation

    SciTech Connect

    Neal A. Yancey; Jaya Shankar Tumuluru; Craig C. Conner; Christopher T. Wright

    2011-08-01

    Transportation costs can be a prohibitive step in bringing biomass to a preprocessing location or biofuel refinery. One alternative to transporting biomass in baled or loose format to a preprocessing location, is to utilize a mobile preprocessing system that can be relocated to various locations where biomass is stored, preprocess and densify the biomass, then ship it to the refinery as needed. The Idaho National Laboratory has a full scale 'Process Demonstration Unit' PDU which includes a stage 1 grinder, hammer mill, drier, pellet mill, and cooler with the associated conveyance system components. Testing at bench and pilot scale has been conducted to determine effects of moisture on preprocessing, crop varieties on preprocessing efficiency and product quality. The INLs PDU provides an opportunity to test the conclusions made at the bench and pilot scale on full industrial scale systems. Each component of the PDU is operated from a central operating station where data is collected to determine power consumption rates for each step in the process. The power for each electrical motor in the system is monitored from the control station to monitor for problems and determine optimal conditions for the system performance. The data can then be viewed to observe how changes in biomass input parameters (moisture and crop type for example), mechanical changes (screen size, biomass drying, pellet size, grinding speed, etc.,), or other variations effect the power consumption of the system. Sorgum in four foot round bales was tested in the system using a series of 6 different screen sizes including: 3/16 in., 1 in., 2 in., 3 in., 4 in., and 6 in. The effect on power consumption, product quality, and production rate were measured to determine optimal conditions.

  11. Effect of data normalization on fuzzy clustering of DNA microarray data

    PubMed Central

    Kim, Seo Young; Lee, Jae Won; Bae, Jong Sung

    2006-01-01

    Background Microarray technology has made it possible to simultaneously measure the expression levels of large numbers of genes in a short time. Gene expression data is information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. Clustering is an important tool for finding groups of genes with similar expression patterns in microarray data analysis. However, hard clustering methods, which assign each gene exactly to one cluster, are poorly suited to the analysis of microarray datasets because in such datasets the clusters of genes frequently overlap. Results In this study we applied the fuzzy partitional clustering method known as Fuzzy C-Means (FCM) to overcome the limitations of hard clustering. To identify the effect of data normalization, we used three normalization methods, the two common scale and location transformations and Lowess normalization methods, to normalize three microarray datasets and three simulated datasets. First we determined the optimal parameters for FCM clustering. We found that the optimal fuzzification parameter in the FCM analysis of a microarray dataset depended on the normalization method applied to the dataset during preprocessing. We additionally evaluated the effect of normalization of noisy datasets on the results obtained when hard clustering or FCM clustering was applied to those datasets. The effects of normalization were evaluated using both simulated datasets and microarray datasets. A comparative analysis showed that the clustering results depended on the normalization method used and the noisiness of the data. In particular, the selection of the fuzzification parameter value for the FCM method was sensitive to the normalization method used for datasets with large variations across samples. Conclusion Lowess normalization is more robust for clustering of genes from general microarray data than the two common scale and location adjustment methods

  12. An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era.

    PubMed

    Su, Zhenqiang; Fang, Hong; Hong, Huixiao; Shi, Leming; Zhang, Wenqian; Zhang, Wenwei; Zhang, Yanyan; Dong, Zirui; Lancashire, Lee J; Bessarabova, Marina; Yang, Xi; Ning, Baitang; Gong, Binsheng; Meehan, Joe; Xu, Joshua; Ge, Weigong; Perkins, Roger; Fischer, Matthias; Tong, Weida

    2014-12-03

    Gene expression microarray has been the primary biomarker platform ubiquitously applied in biomedical research, resulting in enormous data, predictive models, and biomarkers accrued. Recently, RNA-seq has looked likely to replace microarrays, but there will be a period where both technologies co-exist. This raises two important questions: Can microarray-based models and biomarkers be directly applied to RNA-seq data? Can future RNA-seq-based predictive models and biomarkers be applied to microarray data to leverage past investment? We systematically evaluated the transferability of predictive models and signature genes between microarray and RNA-seq using two large clinical data sets. The complexity of cross-platform sequence correspondence was considered in the analysis and examined using three human and two rat data sets, and three levels of mapping complexity were revealed. Three algorithms representing different modeling complexity were applied to the three levels of mappings for each of the eight binary endpoints and Cox regression was used to model survival times with expression data. In total, 240,096 predictive models were examined. Signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development, and microarray-based models can accurately predict RNA-seq-profiled samples; while RNA-seq-based models are less accurate in predicting microarray-profiled samples and are affected both by the choice of modeling algorithm and the gene mapping complexity. The results suggest continued usefulness of legacy microarray data and established microarray biomarkers and predictive models in the forthcoming RNA-seq era.

  13. ACTS (Advanced Communications Technology Satellite) Propagation Experiment: Preprocessing Software User's Manual

    NASA Technical Reports Server (NTRS)

    Crane, Robert K.; Wang, Xuhe; Westenhaver, David

    1996-01-01

    The preprocessing software manual describes the Actspp program originally developed to observe and diagnose Advanced Communications Technology Satellite (ACTS) propagation terminal/receiver problems. However, it has been quite useful for automating the preprocessing functions needed to convert the terminal output to useful attenuation estimates. Prior to having data acceptable for archival functions, the individual receiver system must be calibrated and the power level shifts caused by ranging tone modulation must be received. Actspp provides three output files: the daylog, the diurnal coefficient file, and the file that contains calibration information.

  14. Boosting model performance and interpretation by entangling preprocessing selection and variable selection.

    PubMed

    Gerretzen, Jan; Szymańska, Ewa; Bart, Jacob; Davies, Antony N; van Manen, Henk-Jan; van den Heuvel, Edwin R; Jansen, Jeroen J; Buydens, Lutgarde M C

    2016-09-28

    The aim of data preprocessing is to remove data artifacts-such as a baseline, scatter effects or noise-and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames. In that approach, the focus was solely on improving the predictive performance of the chemometric model. This is, however, only one of the two relevant criteria in modeling: interpretation of the model results can be just as important. Variable selection is often used to achieve such interpretation. Data artifacts, however, may hamper proper variable selection by masking the true relevant variables. The choice of preprocessing therefore has a huge impact on the outcome of variable selection methods and may thus hamper an objective interpretation of the final model. To enhance such objective interpretation, we here integrate variable selection into the preprocessing selection approach that is based on DoE. We show that the entanglement of preprocessing selection and variable selection not only improves the interpretation, but also the predictive performance of the model. This is achieved by analyzing several experimental data sets of which the true relevant variables are available as prior knowledge. We show that a selection of variables is provided that complies more with the true informative variables compared to individual optimization of both model aspects. Importantly, the approach presented in this work is generic. Different types of models (e.g. PCR, PLS, …) can be incorporated into it, as well as different variable selection methods and different preprocessing methods, according to the taste and experience of

  15. Influence of Hemp Fibers Pre-processing on Low Density Polyethylene Matrix Composites Properties

    NASA Astrophysics Data System (ADS)

    Kukle, S.; Vidzickis, R.; Zelca, Z.; Belakova, D.; Kajaks, J.

    2016-04-01

    In present research with short hemp fibres reinforced LLDPE matrix composites with fibres content in a range from 30 to 50 wt% subjected to four different pre-processing technologies were produced and such their properties as tensile strength and elongation at break, tensile modulus, melt flow index, micro hardness and water absorption dynamics were investigated. Capillary viscosimetry was used for fluidity evaluation and melt flow index (MFI) evaluated for all variants. MFI of fibres of two pre-processing variants were high enough to increase hemp fibres content from 30 to 50 wt% with moderate increase of water sorption capability.

  16. EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration

    PubMed Central

    Forment, Javier; Gilabert, Francisco; Robles, Antonio; Conejero, Vicente; Nuez, Fernando; Blanca, Jose M

    2008-01-01

    Background Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation. Results We have created EST2uni, an integrated, highly-configurable EST analysis pipeline and data mining software package that automates the pre-processing, clustering, annotation, database creation, and data mining of EST collections. The pipeline uses standard EST analysis tools and the software has a modular design to facilitate the addition of new analytical methods and their configuration. Currently implemented analyses include functional and structural annotation, SNP and microsatellite discovery, integration of previously known genetic marker data and gene expression results, and assistance in cDNA microarray design. It can be run in parallel in a PC cluster in order to reduce the time necessary for the analysis. It also creates a web site linked to the database, showing collection statistics, with complex query capabilities and tools for data mining and retrieval. Conclusion The software package presented here provides an efficient and complete bioinformatics tool for the management of EST collections which is very easy to adapt to the local needs of different EST projects. The code is freely available under the GPL license and can be obtained at . This site also provides detailed instructions for

  17. Microarrays Made Simple: "DNA Chips" Paper Activity

    ERIC Educational Resources Information Center

    Barnard, Betsy

    2006-01-01

    DNA microarray technology is revolutionizing biological science. DNA microarrays (also called DNA chips) allow simultaneous screening of many genes for changes in expression between different cells. Now researchers can obtain information about genes in days or weeks that used to take months or years. The paper activity described in this article…

  18. Microarrays Made Simple: "DNA Chips" Paper Activity

    ERIC Educational Resources Information Center

    Barnard, Betsy

    2006-01-01

    DNA microarray technology is revolutionizing biological science. DNA microarrays (also called DNA chips) allow simultaneous screening of many genes for changes in expression between different cells. Now researchers can obtain information about genes in days or weeks that used to take months or years. The paper activity described in this article…

  19. Tissue Microarrays in Clinical Oncology

    PubMed Central

    Voduc, David; Kenney, Challayne; Nielsen, Torsten O.

    2008-01-01

    The tissue microarray is a recently-implemented, high-throughput technology for the analysis of molecular markers in oncology. This research tool permits the rapid assessment of a biomarker in thousands of tumor samples, using commonly available laboratory assays such as immunohistochemistry and in-situ hybridization. Although introduced less than a decade ago, the TMA has proven to be invaluable in the study of tumor biology, the development of diagnostic tests, and the investigation of oncological biomarkers. This review describes the impact of TMA-based research in clinical oncology and its potential future applications. Technical aspects of TMA construction, and the advantages and disadvantages inherent to this technology are also discussed. PMID:18314063

  20. In control: systematic assessment of microarray performance.

    PubMed

    van Bakel, Harm; Holstege, Frank C P

    2004-10-01

    Expression profiling using DNA microarrays is a powerful technique that is widely used in the life sciences. How reliable are microarray-derived measurements? The assessment of performance is challenging because of the complicated nature of microarray experiments and the many different technology platforms. There is a mounting call for standards to be introduced, and this review addresses some of the issues that are involved. Two important characteristics of performance are accuracy and precision. The assessment of these factors can be either for the purpose of technology optimization or for the evaluation of individual microarray hybridizations. Microarray performance has been evaluated by at least four approaches in the past. Here, we argue that external RNA controls offer the most versatile system for determining performance and describe how such standards could be implemented. Other uses of external controls are discussed, along with the importance of probe sequence availability and the quantification of labelled material.

  1. Analysis of DNA microarray expression data.

    PubMed

    Simon, Richard

    2009-06-01

    DNA microarrays are powerful tools for studying biological mechanisms and for developing prognostic and predictive classifiers for identifying the patients who require treatment and are best candidates for specific treatments. Because microarrays produce so much data from each specimen, they offer great opportunities for discovery and great dangers or producing misleading claims. Microarray based studies require clear objectives for selecting cases and appropriate analysis methods. Effective analysis of microarray data, where the number of measured variables is orders of magnitude greater than the number of cases, requires specialized statistical methods which have recently been developed. Recent literature reviews indicate that serious problems of analysis exist a substantial proportion of publications. This manuscript attempts to provide a non-technical summary of the key principles of statistical design and analysis for studies that utilize microarray expression profiling.

  2. Microarray Applications in Microbial Ecology Research.

    SciTech Connect

    Gentry, T.; Schadt, C.; Zhou, J.

    2006-04-06

    Microarray technology has the unparalleled potential tosimultaneously determine the dynamics and/or activities of most, if notall, of the microbial populations in complex environments such as soilsand sediments. Researchers have developed several types of arrays thatcharacterize the microbial populations in these samples based on theirphylogenetic relatedness or functional genomic content. Several recentstudies have used these microarrays to investigate ecological issues;however, most have only analyzed a limited number of samples withrelatively few experiments utilizing the full high-throughput potentialof microarray analysis. This is due in part to the unique analyticalchallenges that these samples present with regard to sensitivity,specificity, quantitation, and data analysis. This review discussesspecific applications of microarrays to microbial ecology research alongwith some of the latest studies addressing the difficulties encounteredduring analysis of complex microbial communities within environmentalsamples. With continued development, microarray technology may ultimatelyachieve its potential for comprehensive, high-throughput characterizationof microbial populations in near real-time.

  3. Chaotic mixer improves microarray hybridization.

    PubMed

    McQuain, Mark K; Seale, Kevin; Peek, Joel; Fisher, Timothy S; Levy, Shawn; Stremler, Mark A; Haselton, Frederick R

    2004-02-15

    Hybridization is an important aspect of microarray experimental design which influences array signal levels and the repeatability of data within an array and across different arrays. Current methods typically require 24h and use target inefficiently. In these studies, we compare hybridization signals obtained in conventional static hybridization, which depends on diffusional target delivery, with signals obtained in a dynamic hybridization chamber, which employs a fluid mixer based on chaotic advection theory to deliver targets across a conventional glass slide array. Microarrays were printed with a pattern of 102 identical probe spots containing a 65-mer oligonucleotide capture probe. Hybridization of a 725-bp fluorescently labeled target was used to measure average target hybridization levels, local signal-to-noise ratios, and array hybridization uniformity. Dynamic hybridization for 1h with 1 or 10ng of target DNA increased hybridization signal intensities approximately threefold over a 24-h static hybridization. Similarly, a 10- or 60-min dynamic hybridization of 10ng of target DNA increased hybridization signal intensities fourfold over a 24h static hybridization. In time course studies, static hybridization reached a maximum within 8 to 12h using either 1 or 10ng of target. In time course studies using the dynamic hybridization chamber, hybridization using 1ng of target increased to a maximum at 4h and that using 10ng of target did not vary over the time points tested. In comparison to static hybridization, dynamic hybridization reduced the signal-to-noise ratios threefold and reduced spot-to-spot variation twofold. Therefore, we conclude that dynamic hybridization based on a chaotic mixer design improves both the speed of hybridization and the maximum level of hybridization while increasing signal-to-noise ratios and reducing spot-to-spot variation.

  4. PAA: an R/bioconductor package for biomarker discovery with protein microarrays

    PubMed Central

    Turewicz, Michael; Ahrens, Maike; May, Caroline; Marcus, Katrin; Eisenacher, Martin

    2016-01-01

    Summary: The R/Bioconductor package Protein Array Analyzer (PAA) facilitates a flexible analysis of protein microarrays for biomarker discovery (esp., ProtoArrays). It provides a complete data analysis workflow including preprocessing and quality control, uni- and multivariate feature selection as well as several different plots and results tables to outline and evaluate the analysis results. As a main feature, PAA’s multivariate feature selection methods are based on recursive feature elimination (e.g. SVM-recursive feature elimination, SVM-RFE) with stability ensuring strategies such as ensemble feature selection. This enables PAA to detect stable and reliable biomarker candidate panels. Availability and implementation: PAA is freely available (BSD 3-clause license) from http://www.bioconductor.org/packages/PAA/. Contact: michael.turewicz@rub.de or martin.eisenacher@rub.de PMID:26803161

  5. Assessing the Clinical Utility of SNP Microarray for Prader-Willi Syndrome due to Uniparental Disomy.

    PubMed

    Santoro, Stephanie L; Hashimoto, Sayaka; McKinney, Aimee; Mihalic Mosher, Theresa; Pyatt, Robert; Reshmi, Shalini C; Astbury, Caroline; Hickey, Scott E

    2017-01-01

    Maternal uniparental disomy (UPD) 15 is one of the molecular causes of Prader-Willi syndrome (PWS), a multisystem disorder which presents with neonatal hypotonia and feeding difficulty. Current diagnostic algorithms differ regarding the use of SNP microarray to detect PWS. We retrospectively examined the frequency with which SNP microarray could identify regions of homozygosity (ROH) in patients with PWS. We determined that 7/12 (58%) patients with previously confirmed PWS by methylation analysis and microsatellite-positive UPD studies had ROH (>10 Mb) by SNP microarray. Additional assessment of 5,000 clinical microarrays, performed from 2013 to present, determined that only a single case of ROH for chromosome 15 was not caused by an imprinting disorder or identity by descent. We observed that ROH for chromosome 15 is rarely incidental and strongly associated with hypotonic infants having features of PWS. Although UPD microsatellite studies remain essential to definitively establish the presence of UPD, SNP microarray has important utility in the timely diagnostic algorithm for PWS. © 2017 S. Karger AG, Basel.

  6. Algorithms for optimal dyadic decision trees

    SciTech Connect

    Hush, Don; Porter, Reid

    2009-01-01

    A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.

  7. Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability

    PubMed Central

    Uziela, Karolis; Honkela, Antti

    2015-01-01

    Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all difficult. Although RNA-sequencing (RNA-seq) is attracting the most attention, at present, the rate of new microarray studies submitted to public databases far exceeds the rate of new RNA-seq studies. There is clearly a need for methods that make it easier to combine data from different technologies. In this paper, we propose a new method for processing RNA-seq data that yields gene expression estimates that are much more similar to corresponding estimates from microarray data, hence greatly improving cross-platform comparability. The method we call PREBS is based on estimating the expression from RNA-seq reads overlapping the microarray probe regions, and processing these estimates with standard microarray summarisation algorithms. Using paired microarray and RNA-seq samples from TCGA LAML data set we show that PREBS expression estimates derived from RNA-seq are more similar to microarray-based expression estimates than those from other RNA-seq processing methods. In an experiment to retrieve paired microarray samples from a database using an RNA-seq query sample, gene signatures defined based on PREBS expression estimates were found to be much more accurate than those from other methods. PREBS also allows new ways of using RNA-seq data, such as expression estimation for microarray probe sets. An implementation of the proposed method is available in the Bioconductor package “prebs.” PMID:25966034

  8. Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability.

    PubMed

    Uziela, Karolis; Honkela, Antti

    2015-01-01

    Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all difficult. Although RNA-sequencing (RNA-seq) is attracting the most attention, at present, the rate of new microarray studies submitted to public databases far exceeds the rate of new RNA-seq studies. There is clearly a need for methods that make it easier to combine data from different technologies. In this paper, we propose a new method for processing RNA-seq data that yields gene expression estimates that are much more similar to corresponding estimates from microarray data, hence greatly improving cross-platform comparability. The method we call PREBS is based on estimating the expression from RNA-seq reads overlapping the microarray probe regions, and processing these estimates with standard microarray summarisation algorithms. Using paired microarray and RNA-seq samples from TCGA LAML data set we show that PREBS expression estimates derived from RNA-seq are more similar to microarray-based expression estimates than those from other RNA-seq processing methods. In an experiment to retrieve paired microarray samples from a database using an RNA-seq query sample, gene signatures defined based on PREBS expression estimates were found to be much more accurate than those from other methods. PREBS also allows new ways of using RNA-seq data, such as expression estimation for microarray probe sets. An implementation of the proposed method is available in the Bioconductor package "prebs."

  9. The application of SVD in stored grain pest image pre-processing

    NASA Astrophysics Data System (ADS)

    Mou, Yi; Zhou, Long

    2009-07-01

    The principle of singular value decomposition is introduced. Then the procedures of restoration, compression for pest image based on SVD are proposed. The experiments demonstrate that the SVD is one effective method in stored-gain pest image pre-processing.

  10. Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools

    PubMed Central

    Cheng, Yinhe; Tzeng, Tzy-Hwa Kathy

    2016-01-01

    This paper introduces a high-throughput software tool framework called sam2bam that enables users to significantly speed up pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156–186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize multiple processors, available memory, high-bandwidth storage, and hardware compression accelerators, if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting input data are provided by using plug-in tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of next generation sequencing (NGS) data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime of NGS data pre-processing from about 20 hours to about nine minutes for a whole-genome sequencing data set on the same system using up to 711 GB of memory. PMID:27861637

  11. Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools.

    PubMed

    Ogasawara, Takeshi; Cheng, Yinhe; Tzeng, Tzy-Hwa Kathy

    2016-01-01

    This paper introduces a high-throughput software tool framework called sam2bam that enables users to significantly speed up pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156-186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize multiple processors, available memory, high-bandwidth storage, and hardware compression accelerators, if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting input data are provided by using plug-in tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of next generation sequencing (NGS) data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime of NGS data pre-processing from about 20 hours to about nine minutes for a whole-genome sequencing data set on the same system using up to 711 GB of memory.

  12. Pre-processing SAR image stream to facilitate compression for transport on bandwidth-limited-link

    DOEpatents

    Rush, Bobby G.; Riley, Robert

    2015-09-29

    Pre-processing is applied to a raw VideoSAR (or similar near-video rate) product to transform the image frame sequence into a product that resembles more closely the type of product for which conventional video codecs are designed, while sufficiently maintaining utility and visual quality of the product delivered by the codec.

  13. Parafoveal Preprocessing in Reading Revisited: Evidence from a Novel Preview Manipulation

    ERIC Educational Resources Information Center

    Gagl, Benjamin; Hawelka, Stefan; Richlan, Fabio; Schuster, Sarah; Hutzler, Florian

    2014-01-01

    The study investigated parafoveal preprocessing by the means of the classical invisible boundary paradigm and a novel manipulation of the parafoveal previews (i.e., visual degradation). Eye movements were investigated on 5-letter target words with constraining (i.e., highly informative) initial letters or similarly constraining final letters.…

  14. Strategies for identifying statistically significant dense regions in microarray data.

    PubMed

    Yip, Andy M; Ng, Michael K; Wu, Edmond H; Chan, Tony F

    2007-01-01

    We propose and study the notion of dense regions for the analysis of categorized gene expression data and present some searching algorithms for discovering them. The algorithms can be applied to any categorical data matrices derived from gene expression level matrices. We demonstrate that dense regions are simple but useful and statistically significant patterns that can be used to 1) identify genes and/or samples of interest and 2) eliminate genes and/or samples corresponding to outliers, noise, or abnormalities. Some theoretical studies on the properties of the dense regions are presented which allow us to characterize dense regions into several classes and to derive tailor-made algorithms for different classes of regions. Moreover, an empirical simulation study on the distribution of the size of dense regions is carried out which is then used to assess the significance of dense regions and to derive effective pruning methods to speed up the searching algorithms. Real microarray data sets are employed to test our methods. Comparisons with six other well-known clustering algorithms using synthetic and real data are also conducted which confirm the superiority of our methods in discovering dense regions. The DRIFT code and a tutorial are available as supplemental material, which can be found on the Computer Society Digital Library at http://computer.org/tcbb/archives.htm.

  15. MARS: Microarray analysis, retrieval, and storage system

    PubMed Central

    Maurer, Michael; Molidor, Robert; Sturn, Alexander; Hartler, Juergen; Hackl, Hubert; Stocker, Gernot; Prokesch, Andreas; Scheideler, Marcel; Trajanoski, Zlatko

    2005-01-01

    Background Microarray analysis has become a widely used technique for the study of gene-expression patterns on a genomic scale. As more and more laboratories are adopting microarray technology, there is a need for powerful and easy to use microarray databases facilitating array fabrication, labeling, hybridization, and data analysis. The wealth of data generated by this high throughput approach renders adequate database and analysis tools crucial for the pursuit of insights into the transcriptomic behavior of cells. Results MARS (Microarray Analysis and Retrieval System) provides a comprehensive MIAME supportive suite for storing, retrieving, and analyzing multi color microarray data. The system comprises a laboratory information management system (LIMS), a quality control management, as well as a sophisticated user management system. MARS is fully integrated into an analytical pipeline of microarray image analysis, normalization, gene expression clustering, and mapping of gene expression data onto biological pathways. The incorporation of ontologies and the use of MAGE-ML enables an export of studies stored in MARS to public repositories and other databases accepting these documents. Conclusion We have developed an integrated system tailored to serve the specific needs of microarray based research projects using a unique fusion of Web based and standalone applications connected to the latest J2EE application server technology. The presented system is freely available for academic and non-profit institutions. More information can be found at . PMID:15836795

  16. Microarray-integrated optoelectrofluidic immunoassay system.

    PubMed

    Han, Dongsik; Park, Je-Kyun

    2016-05-01

    A microarray-based analytical platform has been utilized as a powerful tool in biological assay fields. However, an analyte depletion problem due to the slow mass transport based on molecular diffusion causes low reaction efficiency, resulting in a limitation for practical applications. This paper presents a novel method to improve the efficiency of microarray-based immunoassay via an optically induced electrokinetic phenomenon by integrating an optoelectrofluidic device with a conventional glass slide-based microarray format. A sample droplet was loaded between the microarray slide and the optoelectrofluidic device on which a photoconductive layer was deposited. Under the application of an AC voltage, optically induced AC electroosmotic flows caused by a microarray-patterned light actively enhanced the mass transport of target molecules at the multiple assay spots of the microarray simultaneously, which reduced tedious reaction time from more than 30 min to 10 min. Based on this enhancing effect, a heterogeneous immunoassay with a tiny volume of sample (5 μl) was successfully performed in the microarray-integrated optoelectrofluidic system using immunoglobulin G (IgG) and anti-IgG, resulting in improved efficiency compared to the static environment. Furthermore, the application of multiplex assays was also demonstrated by multiple protein detection.

  17. Value of Distributed Preprocessing of Biomass Feedstocks to a Bioenergy Industry

    SciTech Connect

    Christopher T Wright

    2006-07-01

    Biomass preprocessing is one of the primary operations in the feedstock assembly system and the front-end of a biorefinery. Its purpose is to chop, grind, or otherwise format the biomass into a suitable feedstock for conversion to ethanol and other bioproducts. Many variables such as equipment cost and efficiency, and feedstock moisture content, particle size, bulk density, compressibility, and flowability affect the location and implementation of this unit operation. Previous conceptual designs show this operation to be located at the front-end of the biorefinery. However, data are presented that show distributed preprocessing at the field-side or in a fixed preprocessing facility can provide significant cost benefits by producing a higher value feedstock with improved handling, transporting, and merchandising potential. In addition, data supporting the preferential deconstruction of feedstock materials due to their bio-composite structure identifies the potential for significant improvements in equipment efficiencies and compositional quality upgrades. Theses data are collected from full-scale low and high capacity hammermill grinders with various screen sizes. Multiple feedstock varieties with a range of moisture values were used in the preprocessing tests. The comparative values of the different grinding configurations, feedstock varieties, and moisture levels are assessed through post-grinding analysis of the different particle fractions separated with a medium-scale forage particle separator and a Rototap separator. The results show that distributed preprocessing produces a material that has bulk flowable properties and fractionation benefits that can improve the ease of transporting, handling and conveying the material to the biorefinery and improve the biochemical and thermochemical conversion processes.

  18. Preprocessing strategy influences graph-based exploration of altered functional networks in major depression.

    PubMed

    Borchardt, Viola; Lord, Anton Richard; Li, Meng; van der Meer, Johan; Heinze, Hans-Jochen; Bogerts, Bernhard; Breakspear, Michael; Walter, Martin

    2016-04-01

    Resting-state fMRI studies have gained widespread use in exploratory studies of neuropsychiatric disorders. Graph metrics derived from whole brain functional connectivity studies have been used to reveal disease-related variations in many neuropsychiatric disorders including major depression (MDD). These techniques show promise in developing diagnostics for these often difficult to identify disorders. However, the analysis of resting-state datasets is increasingly beset by a myriad of approaches and methods, each with underlying assumptions. Choosing the most appropriate preprocessing parameters a priori is difficult. Nevertheless, the specific methodological choice influences graph-theoretical network topologies as well as regional metrics. The aim of this study was to systematically compare different preprocessing strategies by evaluating their influence on group differences between healthy participants (HC) and depressive patients. We thus investigated the effects of common preprocessing variants, including global mean-signal regression (GMR), temporal filtering, detrending, and network sparsity on group differences between brain networks of HC and MDD patients measured by global and nodal graph theoretical metrics. Occurrence of group differences in global metrics was absent in the majority of tested preprocessing variants, but in local graph metrics it is sparse, variable, and highly dependent on the combination of preprocessing variant and sparsity threshold. Sparsity thresholds between 16 and 22% were shown to have the greatest potential to reveal differences between HC and MDD patients in global and local network metrics. Our study offers an overview of consequences of methodological decisions and which neurobiological characteristics of MDD they implicate, adding further caution to this rapidly growing field.

  19. Gene microarray data analysis using parallel point-symmetry-based clustering.

    PubMed

    Sarkar, Anasua; Maulik, Ujjwal

    2015-01-01

    Identification of co-expressed genes is the central goal in microarray gene expression analysis. Point-symmetry-based clustering is an important unsupervised learning technique for recognising symmetrical convex- or non-convex-shaped clusters. To enable fast clustering of large microarray data, we propose a distributed time-efficient scalable approach for point-symmetry-based K-Means algorithm. A natural basis for analysing gene expression data using symmetry-based algorithm is to group together genes with similar symmetrical expression patterns. This new parallel implementation also satisfies linear speedup in timing without sacrificing the quality of clustering solution on large microarray data sets. The parallel point-symmetry-based K-Means algorithm is compared with another new parallel symmetry-based K-Means and existing parallel K-Means over eight artificial and benchmark microarray data sets, to demonstrate its superiority, in both timing and validity. The statistical analysis is also performed to establish the significance of this message-passing-interface based point-symmetry K-Means implementation. We also analysed the biological relevance of clustering solutions.

  20. A Java-based tool for the design of classification microarrays

    PubMed Central

    Meng, Da; Broschat, Shira L; Call, Douglas R

    2008-01-01

    Background Classification microarrays are used for purposes such as identifying strains of bacteria and determining genetic relationships to understand the epidemiology of an infectious disease. For these cases, mixed microarrays, which are composed of DNA from more than one organism, are more effective than conventional microarrays composed of DNA from a single organism. Selection of probes is a key factor in designing successful mixed microarrays because redundant sequences are inefficient and limited representation of diversity can restrict application of the microarray. We have developed a Java-based software tool, called PLASMID, for use in selecting the minimum set of probe sequences needed to classify different groups of plasmids or bacteria. Results The software program was successfully applied to several different sets of data. The utility of PLASMID was illustrated using existing mixed-plasmid microarray data as well as data from a virtual mixed-genome microarray constructed from different strains of Streptococcus. Moreover, use of data from expression microarray experiments demonstrated the generality of PLASMID. Conclusion In this paper we describe a new software tool for selecting a set of probes for a classification microarray. While the tool was developed for the design of mixed microarrays–and mixed-plasmid microarrays in particular–it can also be used to design expression arrays. The user can choose from several clustering methods (including hierarchical, non-hierarchical, and a model-based genetic algorithm), several probe ranking methods, and several different display methods. A novel approach is used for probe redundancy reduction, and probe selection is accomplished via stepwise discriminant analysis. Data can be entered in different formats (including Excel and comma-delimited text), and dendrogram, heat map, and scatter plot images can be saved in several different formats (including jpeg and tiff). Weights generated using stepwise

  1. Progress in the application of DNA microarrays.

    PubMed Central

    Lobenhofer, E K; Bushel, P R; Afshari, C A; Hamadeh, H K

    2001-01-01

    Microarray technology has been applied to a variety of different fields to address fundamental research questions. The use of microarrays, or DNA chips, to study the gene expression profiles of biologic samples began in 1995. Since that time, the fundamental concepts behind the chip, the technology required for making and using these chips, and the multitude of statistical tools for analyzing the data have been extensively reviewed. For this reason, the focus of this review will be not on the technology itself but on the application of microarrays as a research tool and the future challenges of the field. PMID:11673116

  2. DNA Microarrays in Herbal Drug Research

    PubMed Central

    Chavan, Preeti; Joshi, Kalpana; Patwardhan, Bhushan

    2006-01-01

    Natural products are gaining increased applications in drug discovery and development. Being chemically diverse they are able to modulate several targets simultaneously in a complex system. Analysis of gene expression becomes necessary for better understanding of molecular mechanisms. Conventional strategies for expression profiling are optimized for single gene analysis. DNA microarrays serve as suitable high throughput tool for simultaneous analysis of multiple genes. Major practical applicability of DNA microarrays remains in DNA mutation and polymorphism analysis. This review highlights applications of DNA microarrays in pharmacodynamics, pharmacogenomics, toxicogenomics and quality control of herbal drugs and extracts. PMID:17173108

  3. Terrain matching image pre-process and its format transform in autonomous underwater navigation

    NASA Astrophysics Data System (ADS)

    Cao, Xuejun; Zhang, Feizhou; Yang, Dongkai; Yang, Bogang

    2007-06-01

    matching precision directly influences the final precision of integrated navigation system. Image matching assistant navigation is spatially matching and aiming at two underwater scenery images coming from two different sensors matriculating of the same scenery in order to confirm the relative displacement of the two images. In this way, we can obtain the vehicle's location in fiducial image known geographical relation, and the precise location information given from image matching location is transmitted to INS to eliminate its location error and greatly enhance the navigation precision of vehicle. Digital image data analysis and processing of image matching in underwater passive navigation is important. In regard to underwater geographic data analysis, we focus on the acquirement, disposal, analysis, expression and measurement of database information. These analysis items structure one of the important contents of underwater terrain matching and are propitious to know the seabed terrain configuration of navigation areas so that the best advantageous seabed terrain district and dependable navigation algorithm can be selected. In this way, we can improve the precision and reliability of terrain assistant navigation system. The pre-process and format transformation of digital image during underwater image matching are expatiated in this paper. The information of the terrain status in navigation areas need further study to provide the reliable data terrain characteristic and underwater overcast for navigation. Through realizing the choice of sea route, danger district prediction and navigating algorithm analysis, TAN can obtain more high location precision and probability, hence provide technological support for image matching of underwater passive navigation.

  4. Demystified...tissue microarray technology.

    PubMed

    Packeisen, J; Korsching, E; Herbst, H; Boecker, W; Buerger, H

    2003-08-01

    Several "high throughput methods" have been introduced into research and routine laboratories during the past decade. Providing a new approach to the analysis of genomic alterations and RNA or protein expression patterns, these new techniques generate a plethora of new data in a relatively short time, and promise to deliver clues to the diagnosis and treatment of human cancer. Along with these revolutionary developments, new tools for the interpretation of these large sets of data became necessary and are now widely available. Tissue microarray (TMA) technology is one of these new tools. It is based on the idea of applying miniaturisation and a high throughput approach to the analysis of intact tissues. The potential and the scientific value of TMAs in modern research have been demonstrated in a logarithmically increasing number of studies. The spectrum for additional applications is widening rapidly, and comprises quality control in histotechnology, longterm tissue banking, and the continuing education of pathologists. This review covers the basic technical aspects of TMA production and discusses the current and potential future applications of TMA technology.

  5. Integrating Microarray Data and GRNs.

    PubMed

    Koumakis, L; Potamias, G; Tsiknakis, M; Zervakis, M; Moustakis, V

    2016-01-01

    With the completion of the Human Genome Project and the emergence of high-throughput technologies, a vast amount of molecular and biological data are being produced. Two of the most important and significant data sources come from microarray gene-expression experiments and respective databanks (e,g., Gene Expression Omnibus-GEO (http://www.ncbi.nlm.nih.gov/geo)), and from molecular pathways and Gene Regulatory Networks (GRNs) stored and curated in public (e.g., Kyoto Encyclopedia of Genes and Genomes-KEGG (http://www.genome.jp/kegg/pathway.html), Reactome (http://www.reactome.org/ReactomeGWT/entrypoint.html)) as well as in commercial repositories (e.g., Ingenuity IPA (http://www.ingenuity.com/products/ipa)). The association of these two sources aims to give new insight in disease understanding and reveal new molecular targets in the treatment of specific phenotypes.Three major research lines and respective efforts that try to utilize and combine data from both of these sources could be identified, namely: (1) de novo reconstruction of GRNs, (2) identification of Gene-signatures, and (3) identification of differentially expressed GRN functional paths (i.e., sub-GRN paths that distinguish between different phenotypes). In this chapter, we give an overview of the existing methods that support the different types of gene-expression and GRN integration with a focus on methodologies that aim to identify phenotype-discriminant GRNs or subnetworks, and we also present our methodology.

  6. Mutational analysis using oligonucleotide microarrays

    PubMed Central

    Hacia, J.; Collins, F.

    1999-01-01

    The development of inexpensive high throughput methods to identify individual DNA sequence differences is important to the future growth of medical genetics. This has become increasingly apparent as epidemiologists, pathologists, and clinical geneticists focus more attention on the molecular basis of complex multifactorial diseases. Such undertakings will rely upon genetic maps based upon newly discovered, common, single nucleotide polymorphisms. Furthermore, candidate gene approaches used in identifying disease associated genes necessitate screening large sequence blocks for changes tracking with the disease state. Even after such genes are isolated, large scale mutational analyses will often be needed for risk assessment studies to define the likely medical consequences of carrying a mutated gene.
This review concentrates on the use of oligonucleotide arrays for hybridisation based comparative sequence analysis. Technological advances within the past decade have made it possible to apply this technology to many different aspects of medical genetics. These applications range from the detection and scoring of single nucleotide polymorphisms to mutational analysis of large genes. Although we discuss published scientific reports, unpublished work from the private sector12 could also significantly affect the future of this technology.


Keywords: mutational analysis; oligonucleotide microarrays; DNA chips PMID:10528850

  7. An image-data-compression algorithm

    NASA Technical Reports Server (NTRS)

    Hilbert, E. E.; Rice, R. F.

    1981-01-01

    Cluster Compression Algorithm (CCA) preprocesses Landsat image data immediately following satellite data sensor (receiver). Data are reduced by extracting pertinent image features and compressing this result into concise format for transmission to ground station. This results in narrower transmission bandwidth, increased data-communication efficiency, and reduced computer time in reconstructing and analyzing image. Similar technique could be applied to other types of recorded data to cut costs of transmitting, storing, distributing, and interpreting complex information.

  8. Joint Adaptive Pre-processing Resilience and Post-processing Concealment Schemes for 3D Video Transmission

    NASA Astrophysics Data System (ADS)

    El-Shafai, Walid

    2015-03-01

    3D video transmission over erroneous networks is still a considerable issue due to restricted resources and the presence of severe channel errors. Efficiently compressing 3D video with low transmission rate, while maintaining a high quality of received 3D video, is very challenging. Since it is not plausible to re-transmit all the corrupted macro-blocks (MBs) due to real time applications and limited resources. Thus it is mandatory to retrieve the lost MBs at the decoder side using sufficient post-processing schemes, such as error concealment (EC). In this paper, we propose an adaptive multi-mode EC (AMMEC) algorithm at the decoder based on utilizing pre-processing flexible macro-block ordering error resilience (FMO-ER) technique at the encoder; to efficiently conceal the erroneous MBs of intra and inter coded frames of 3D video. Experimental simulation results show that the proposed FMO-ER/AMMEC schemes can significantly improve the objective and subjective 3D video quality.

  9. Data preprocessing for determining outer/inner parallelization in the nested loop problem using OpenMP

    NASA Astrophysics Data System (ADS)

    Handhika, T.; Bustamam, A.; Ernastuti, Kerami, D.

    2017-07-01

    Multi-thread programming using OpenMP on the shared-memory architecture with hyperthreading technology allows the resource to be accessed by multiple processors simultaneously. Each processor can execute more than one thread for a certain period of time. However, its speedup depends on the ability of the processor to execute threads in limited quantities, especially the sequential algorithm which contains a nested loop. The number of the outer loop iterations is greater than the maximum number of threads that can be executed by a processor. The thread distribution technique that had been found previously only be applied by the high-level programmer. This paper generates a parallelization procedure for low-level programmer in dealing with 2-level nested loop problems with the maximum number of threads that can be executed by a processor is smaller than the number of the outer loop iterations. Data preprocessing which is related to the number of the outer loop and the inner loop iterations, the computational time required to execute each iteration and the maximum number of threads that can be executed by a processor are used as a strategy to determine which parallel region that will produce optimal speedup.

  10. Advanced Recording and Preprocessing of Physiological Signals. [data processing equipment for flow measurement of blood flow by ultrasonics

    NASA Technical Reports Server (NTRS)

    Bentley, P. B.

    1975-01-01

    The measurement of the volume flow-rate of blood in an artery or vein requires both an estimate of the flow velocity and its spatial distribution and the corresponding cross-sectional area. Transcutaneous measurements of these parameters can be performed using ultrasonic techniques that are analogous to the measurement of moving objects by use of a radar. Modern digital data recording and preprocessing methods were applied to the measurement of blood-flow velocity by means of the CW Doppler ultrasonic technique. Only the average flow velocity was measured and no distribution or size information was obtained. Evaluations of current flowmeter design and performance, ultrasonic transducer fabrication methods, and other related items are given. The main thrust was the development of effective data-handling and processing methods by application of modern digital techniques. The evaluation resulted in useful improvements in both the flowmeter instrumentation and the ultrasonic transducers. Effective digital processing algorithms that provided enhanced blood-flow measurement accuracy and sensitivity were developed. Block diagrams illustrative of the equipment setup are included.

  11. [DNA microarrays in parasitology and medical sciences].

    PubMed

    Jaros, Sławomir

    2006-01-01

    The article presents the current knowledge on the microarray technique and its applications in medical sciences and parasitology. The first part of the article is focused on the technical aspects (microarray preparation, different microarray platforms, probes preparation, hybridization and signal detection). The article also describes possible ways of proceeding during laboratory work on organism of which the genome sequence is not known or has been only partially sequenced. The second part of the review describes how microarray technique have been, or possibly will be, used for better understanding parasite life cycles and development, host-parasite relationship, comparative genomics of virulent organisms, develpoment vaccines against the most virulent parasites and host responses to infection.

  12. Protein Microarrays: Novel Developments and Applications

    PubMed Central

    Berrade, Luis; Garcia, Angie E.

    2011-01-01

    Protein microarray technology possesses some of the greatest potential for providing direct information on protein function and potential drug targets. For example, functional protein microarrays are ideal tools suited for the mapping of biological pathways. They can be used to study most major types of interactions and enzymatic activities that take place in biochemical pathways and have been used for the analysis of simultaneous multiple biomolecular interactions involving protein-protein, protein-lipid, protein-DNA and protein-small molecule interactions. Because of this unique ability to analyze many kinds of molecular interactions en masse, the requirement of very small sample amount and the potential to be miniaturized and automated, protein microarrays are extremely well suited for protein profiling, drug discovery, drug target identification and clinical prognosis and diagnosis. The aim of this review is to summarize the most recent developments in the production, applications and analysis of protein microarrays. PMID:21116694

  13. Construction of citrus gene coexpression networks from microarray data using random matrix theory.

    PubMed

    Du, Dongliang; Rawat, Nidhi; Deng, Zhanao; Gmitter, Fred G

    2015-01-01

    After the sequencing of citrus genomes, gene function annotation is becoming a new challenge. Gene coexpression analysis can be employed for function annotation using publicly available microarray data sets. In this study, 230 sweet orange (Citrus sinensis) microarrays were used to construct seven coexpression networks, including one condition-independent and six condition-dependent (Citrus canker, Huanglongbing, leaves, flavedo, albedo, and flesh) networks. In total, these networks contain 37 633 edges among 6256 nodes (genes), which accounts for 52.11% measurable genes of the citrus microarray. Then, these networks were partitioned into functional modules using the Markov Cluster Algorithm. Significantly enriched Gene Ontology biological process terms and KEGG pathway terms were detected for 343 and 60 modules, respectively. Finally, independent verification of these networks was performed using another expression data of 371 genes. This study provides new targets for further functional analyses in citrus.

  14. A renewed approach to the nonparametric analysis of replicated microarray experiments.

    PubMed

    Jung, Klaus; Quast, Karsten; Gannoun, Ali; Urfer, Wolfgang

    2006-04-01

    DNA-microarrays find broad employment in biochemical research. This technology allows the monitoring of the expression levels of thousands of genes at the same time. Often, the goal of a microarray study is to find differentially expressed genes in two different types of tissue, for example normal and cancerous. Multiple hypothesis testing is a useful statistical tool for such studies. One approach using multiple hypothesis testing is nonparametric analysis for replicated microarray experiments. In this paper we present an improved version of this method. We also show how p-values are calculated for all significant genes detected with this testing procedure. All algorithms were implemented in an R-package, and instructions on it's use are included. The package can be downloaded at http://www.statistik.unidortmund.de/de/content/einrichtungen/lehrstuehle/personen/jung.html

  15. Protein Microarray-Based IgE Immunoassay for Allergy Diagnosis.

    PubMed

    Jambari, Nuzul N; Wang, XiaoWei; Alcocer, Marcos

    2017-01-01

    Protein microarray is a miniaturized multi-analyte, solid-phased immunoassay where thousands of immobilized individual protein spots on a microscopic slide bind are bound to specific antibodies (immunoglobulins) from serum samples, which are then detected by fluorescent labeling. The image processing and pattern recognition are then quantitatively analyzed using advanced algorithms. Here, we describe the use of an in-house-produced complex protein microarray containing extracts and pure proteins that has been probed with antibodies present in the horse sera and detection by fluorophore-conjugated antibody and data analysis. The flexibility of the number and types of proteins that can be printed on the microarray allows different set of specific IgE immunoassay analysis to be carried out.

  16. Construction of citrus gene coexpression networks from microarray data using random matrix theory

    PubMed Central

    Du, Dongliang; Rawat, Nidhi; Deng, Zhanao; Gmitter, Fred G.

    2015-01-01

    After the sequencing of citrus genomes, gene function annotation is becoming a new challenge. Gene coexpression analysis can be employed for function annotation using publicly available microarray data sets. In this study, 230 sweet orange (Citrus sinensis) microarrays were used to construct seven coexpression networks, including one condition-independent and six condition-dependent (Citrus canker, Huanglongbing, leaves, flavedo, albedo, and flesh) networks. In total, these networks contain 37 633 edges among 6256 nodes (genes), which accounts for 52.11% measurable genes of the citrus microarray. Then, these networks were partitioned into functional modules using the Markov Cluster Algorithm. Significantly enriched Gene Ontology biological process terms and KEGG pathway terms were detected for 343 and 60 modules, respectively. Finally, independent verification of these networks was performed using another expression data of 371 genes. This study provides new targets for further functional analyses in citrus. PMID:26504573

  17. PATMA: parser of archival tissue microarray.

    PubMed

    Roszkowiak, Lukasz; Lopez, Carlos

    2016-01-01

    Tissue microarrays are commonly used in modern pathology for cancer tissue evaluation, as it is a very potent technique. Tissue microarray slides are often scanned to perform computer-aided histopathological analysis of the tissue cores. For processing the image, splitting the whole virtual slide into images of individual cores is required. The only way to distinguish cores corresponding to specimens in the tissue microarray is through their arrangement. Unfortunately, distinguishing the correct order of cores is not a trivial task as they are not labelled directly on the slide. The main aim of this study was to create a procedure capable of automatically finding and extracting cores from archival images of the tissue microarrays. This software supports the work of scientists who want to perform further image processing on single cores. The proposed method is an efficient and fast procedure, working in fully automatic or semi-automatic mode. A total of 89% of punches were correctly extracted with automatic selection. With an addition of manual correction, it is possible to fully prepare the whole slide image for extraction in 2 min per tissue microarray. The proposed technique requires minimum skill and time to parse big array of cores from tissue microarray whole slide image into individual core images.

  18. PATMA: parser of archival tissue microarray

    PubMed Central

    2016-01-01

    Tissue microarrays are commonly used in modern pathology for cancer tissue evaluation, as it is a very potent technique. Tissue microarray slides are often scanned to perform computer-aided histopathological analysis of the tissue cores. For processing the image, splitting the whole virtual slide into images of individual cores is required. The only way to distinguish cores corresponding to specimens in the tissue microarray is through their arrangement. Unfortunately, distinguishing the correct order of cores is not a trivial task as they are not labelled directly on the slide. The main aim of this study was to create a procedure capable of automatically finding and extracting cores from archival images of the tissue microarrays. This software supports the work of scientists who want to perform further image processing on single cores. The proposed method is an efficient and fast procedure, working in fully automatic or semi-automatic mode. A total of 89% of punches were correctly extracted with automatic selection. With an addition of manual correction, it is possible to fully prepare the whole slide image for extraction in 2 min per tissue microarray. The proposed technique requires minimum skill and time to parse big array of cores from tissue microarray whole slide image into individual core images. PMID:27920955

  19. Evaluation of Surface Chemistries for Antibody Microarrays

    SciTech Connect

    Seurynck-Servoss, Shannon L.; White, Amanda M.; Baird, Cheryl L.; Rodland, Karin D.; Zangar, Richard C.

    2007-12-01

    Antibody microarrays are an emerging technology that promises to be a powerful tool for the detection of disease biomarkers. The current technology for protein microarrays has been primarily derived from DNA microarrays and is not fully characterized for use with proteins. For example, there are a myriad of surface chemistries that are commercially available for antibody microarrays, but no rigorous studies that compare these different surfaces. Therefore, we have used an enzyme-linked immunosorbent assay (ELISA) microarray platform to analyze 16 different commercially available slide types. Full standard curves were generated for 24 different assays. We found that this approach provides a rigorous and quantitative system for comparing the different slide types based on spot size and morphology, slide noise, spot background, lower limit of detection, and reproducibility. These studies demonstrate that the properties of the slide surface affect the activity of immobilized antibodies and the quality of data produced. Although many slide types can produce useful data, glass slides coated with poly-L-lysine or aminosilane, with or without activation with a crosslinker, consistently produce superior results in the ELISA microarray analyses we performed.

  20. The Impact of Photobleaching on Microarray Analysis

    PubMed Central

    von der Haar, Marcel; Preuß, John-Alexander; von der Haar, Kathrin; Lindner, Patrick; Scheper, Thomas; Stahl, Frank

    2015-01-01

    DNA-Microarrays have become a potent technology for high-throughput analysis of genetic regulation. However, the wide dynamic range of signal intensities of fluorophore-based microarrays exceeds the dynamic range of a single array scan by far, thus limiting the key benefit of microarray technology: parallelization. The implementation of multi-scan techniques represents a promising approach to overcome these limitations. These techniques are, in turn, limited by the fluorophores’ susceptibility to photobleaching when exposed to the scanner’s laser light. In this paper the photobleaching characteristics of cyanine-3 and cyanine-5 as part of solid state DNA microarrays are studied. The effects of initial fluorophore intensity as well as laser scanner dependent variables such as the photomultiplier tube’s voltage on bleaching and imaging are investigated. The resulting data is used to develop a model capable of simulating the expected degree of signal intensity reduction caused by photobleaching for each fluorophore individually, allowing for the removal of photobleaching-induced, systematic bias in multi-scan procedures. Single-scan applications also benefit as they rely on pre-scans to determine the optimal scanner settings. These findings constitute a step towards standardization of microarray experiments and analysis and may help to increase the lab-to-lab comparability of microarray experiment results. PMID:26378589

  1. Evaluation of high-resolution microarray platforms for genomic profiling of bone tumours

    PubMed Central

    2010-01-01

    Background Several high-density oligonucleotide microarray platforms are available for genome-wide single nucleotide polymorphism (SNP) detection and microarray-based comparative genomic hybridisation (array CGH), which may be used to detect copy number aberrations in human tumours. As part of the EuroBoNeT network of excellence for research on bone tumours (eurobonet.eu), we have evaluated four different commercial high-resolution microarray platforms in order to identify the most appropriate technology for mapping DNA copy number aberrations in such tumours. Findings DNA from two different cytogenetically well-characterized bone sarcoma cell lines, representing a simple and a complex karyotype, respectively, was tested in duplicate on four high-resolution microarray platforms; Affymetrix Genome-Wide Human SNP Array 6.0, Agilent Human Genome CGH 244A, Illumina HumanExon510s-duo and Nimblegen HG18 CGH 385 k WG tiling v1.0. The data was analysed using the platform-specific analysis software, as well as a platform-independent analysis algorithm. DNA copy number was measured at six specific chromosomes or chromosomal regions, and compared with the expected ratio based on available cytogenetic information. All platforms performed well in terms of reproducibility and were able to delimit and score small amplifications and deletions at similar resolution, but Agilent microarrays showed better linearity and dynamic range. The platform-specific analysis software provided with each platform identified in general correct copy numbers, whereas using a platform-independent analysis algorithm, correct copy numbers were determined mainly for Agilent and Affymetrix microarrays. Conclusions All platforms performed reasonably well, but Agilent microarrays showed better dynamic range, and like Affymetrix microarrays performed well with the platform-independent analysis software, implying more robust data. Bone tumours like osteosarcomas are heterogeneous tumours with complex

  2. Contributions to Statistical Problems Related to Microarray Data

    ERIC Educational Resources Information Center

    Hong, Feng

    2009-01-01

    Microarray is a high throughput technology to measure the gene expression. Analysis of microarray data brings many interesting and challenging problems. This thesis consists three studies related to microarray data. First, we propose a Bayesian model for microarray data and use Bayes Factors to identify differentially expressed genes. Second, we…

  3. Contributions to Statistical Problems Related to Microarray Data

    ERIC Educational Resources Information Center

    Hong, Feng

    2009-01-01

    Microarray is a high throughput technology to measure the gene expression. Analysis of microarray data brings many interesting and challenging problems. This thesis consists three studies related to microarray data. First, we propose a Bayesian model for microarray data and use Bayes Factors to identify differentially expressed genes. Second, we…

  4. Functional analysis of differentially expressed genes associated with glaucoma from DNA microarray data.

    PubMed

    Wu, Y; Zang, W D; Jiang, W

    2014-11-11

    Microarray data of astrocytes extracted from the optic nerves of donors with and without glaucoma were analyzed to screen for differentially expressed genes (DEGs). Functional exploration with bioinformatic tools was then used to understand the roles of the identified DEGs in glaucoma. Microarray data were downloaded from the Gene Expression Omnibus (GEO) database, which contains 13 astrocyte samples, 6 from healthy subjects and 7 from patients suffering from glaucoma. Data were pre-processed, and DEGs were screened out using R software packages. Interactions between DEGs were identified, and networks were built using Search Tool for the Retrieval of Interacting Genes/Proteins (STRING). GENECODIS was utilized for the functional analysis of the DEGs, and GOTM was used for module division, for which functional annotation was conducted with the Database for Annotation, Visualization, and Integrated Discovery (DAVID). A total of 371 DEGs were identified between glaucoma-associated samples and normal samples. Three modules included in the PPID database were generated with 11, 12, and 2 significant functional annotations, including immune system processes, inflammatory responses, and synaptic vesicle endocytosis, respectively. We found that the most significantly enriched functions for each module were associated with immune function. Several genes that play interesting roles in the development of glaucoma are described; these genes may be potential biomarkers for glaucoma diagnosis or treatment.

  5. Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning

    PubMed Central

    Maulik, Ujjwal

    2014-01-01

    Microarrays have now gone from obscurity to being almost ubiquitous in biological research. At the same time, the statistical methodology for microarray analysis has progressed from simple visual assessments of results to novel algorithms for analyzing changes in expression profiles. In a micro-RNA (miRNA) or gene-expression profiling experiment, the expression levels of thousands of genes/miRNAs are simultaneously monitored to study the effects of certain treatments, diseases, and developmental stages on their expressions. Microarray-based gene expression profiling can be used to identify genes, whose expressions are changed in response to pathogens or other organisms by comparing gene expression in infected to that in uninfected cells or tissues. Recent studies have revealed that patterns of altered microarray expression profiles in cancer can serve as molecular biomarkers for tumor diagnosis, prognosis of disease-specific outcomes, and prediction of therapeutic responses. Microarray data sets containing expression profiles of a number of miRNAs or genes are used to identify biomarkers, which have dysregulation in normal and malignant tissues. However, small sample size remains a bottleneck to design successful classification methods. On the other hand, adequate number of microarray data that do not have clinical knowledge can be employed as additional source of information. In this paper, a combination of kernelized fuzzy rough set (KFRS) and semisupervised support vector machine (S3VM) is proposed for predicting cancer biomarkers from one miRNA and three gene expression data sets. Biomarkers are discovered employing three feature selection methods, including KFRS. The effectiveness of the proposed KFRS and S3VM combination on the microarray data sets is demonstrated, and the cancer biomarkers identified from miRNA data are reported. Furthermore, biological significance tests are conducted for miRNA cancer biomarkers. PMID:27170887

  6. Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning.

    PubMed

    Chakraborty, Debasis; Maulik, Ujjwal

    2014-01-01

    Microarrays have now gone from obscurity to being almost ubiquitous in biological research. At the same time, the statistical methodology for microarray analysis has progressed from simple visual assessments of results to novel algorithms for analyzing changes in expression profiles. In a micro-RNA (miRNA) or gene-expression profiling experiment, the expression levels of thousands of genes/miRNAs are simultaneously monitored to study the effects of certain treatments, diseases, and developmental stages on their expressions. Microarray-based gene expression profiling can be used to identify genes, whose expressions are changed in response to pathogens or other organisms by comparing gene expression in infected to that in uninfected cells or tissues. Recent studies have revealed that patterns of altered microarray expression profiles in cancer can serve as molecular biomarkers for tumor diagnosis, prognosis of disease-specific outcomes, and prediction of therapeutic responses. Microarray data sets containing expression profiles of a number of miRNAs or genes are used to identify biomarkers, which have dysregulation in normal and malignant tissues. However, small sample size remains a bottleneck to design successful classification methods. On the other hand, adequate number of microarray data that do not have clinical knowledge can be employed as additional source of information. In this paper, a combination of kernelized fuzzy rough set (KFRS) and semisupervised support vector machine (S(3)VM) is proposed for predicting cancer biomarkers from one miRNA and three gene expression data sets. Biomarkers are discovered employing three feature selection methods, including KFRS. The effectiveness of the proposed KFRS and S(3)VM combination on the microarray data sets is demonstrated, and the cancer biomarkers identified from miRNA data are reported. Furthermore, biological significance tests are conducted for miRNA cancer biomarkers.

  7. Data preprocessing and preliminary results of the moon-based ultraviolet telescope on CE-3 lander

    NASA Astrophysics Data System (ADS)

    Wang, f.

    2015-10-01

    The moon-based ultraviolet telescope (MUVT) is one of the payloads on the Chang'e-3(CE-3)lunar lander. Because of the advantages of having no atmospheric disturbances and the slow rotation of the Moon, we can make longterm continuous observations of a series of important remote celestial objects in the near ultraviolet band, and perform a sky survey of selected areas. We can find characteristic changes incelestial brightness with time by analyzing image data from the MUVT ,and deduce the radiation mechanism and physical properties of these celestial objects after comparing with a physical model. In order to explain the scientific purposes of MUVT, this article analyzes the preprocessing of MUVT image data and makes a preliminary evaluation of data quality. The results demonstrate that the methods used for data collection and preprocessing are effective, and the Level 2A and 2B image data satisfy the requirements of follow-up scientific researches.

  8. KONFIG and REKONFIG: Two interactive preprocessing to the Navy/NASA Engine Program (NNEP)

    NASA Technical Reports Server (NTRS)

    Fishbach, L. H.

    1981-01-01

    The NNEP is a computer program that is currently being used to simulate the thermodynamic cycle performance of almost all types of turbine engines by many government, industry, and university personnel. The NNEP uses arrays of input data to set up the engine simulation and component matching method as well as to describe the characteristics of the components. A preprocessing program (KONFIG) is described in which the user at a terminal on a time shared computer can interactively prepare the arrays of data required. It is intended to make it easier for the occasional or new user to operate NNEP. Another preprocessing program (REKONFIG) in which the user can modify the component specifications of a previously configured NNEP dataset is also described. It is intended to aid in preparing data for parametric studies and/or studies of similar engines such a mixed flow turbofans, turboshafts, etc.

  9. The Role of GRAIL Orbit Determination in Preprocessing of Gravity Science Measurements

    NASA Technical Reports Server (NTRS)

    Kruizinga, Gerhard; Asmar, Sami; Fahnestock, Eugene; Harvey, Nate; Kahan, Daniel; Konopliv, Alex; Oudrhiri, Kamal; Paik, Meegyeong; Park, Ryan; Strekalov, Dmitry; hide

    2013-01-01

    The Gravity Recovery And Interior Laboratory (GRAIL) mission has constructed a lunar gravity field with unprecedented uniform accuracy on the farside and nearside of the Moon. GRAIL lunar gravity field determination begins with preprocessing of the gravity science measurements by applying corrections for time tag error, general relativity, measurement noise and biases. Gravity field determination requires the generation of spacecraft ephemerides of an accuracy not attainable with the pre-GRAIL lunar gravity fields. Therefore, a bootstrapping strategy was developed, iterating between science data preprocessing and lunar gravity field estimation in order to construct sufficiently accurate orbit ephemerides.This paper describes the GRAIL measurements, their dependence on the spacecraft ephemerides and the role of orbit determination in the bootstrapping strategy. Simulation results will be presented that validate the bootstrapping strategy followed by bootstrapping results for flight data, which have led to the latest GRAIL lunar gravity fields.

  10. Preprocessing, classification modeling and feature selection using flow injection electrospray mass spectrometry metabolite fingerprint data.

    PubMed

    Enot, David P; Lin, Wanchang; Beckmann, Manfred; Parker, David; Overy, David P; Draper, John

    2008-01-01

    Metabolome analysis by flow injection electrospray mass spectrometry (FIE-MS) fingerprinting generates measurements relating to large numbers of m/z signals. Such data sets often exhibit high variance with a paucity of replicates, thus providing a challenge for data mining. We describe data preprocessing and modeling methods that have proved reliable in projects involving samples from a range of organisms. The protocols interact with software resources specifically for metabolomics provided in a Web-accessible data analysis package FIEmspro (http://users.aber.ac.uk/jhd) written in the R environment and requiring a moderate knowledge of R command-line usage. Specific emphasis is placed on describing the outcome of modeling experiments using FIE-MS data that require further preprocessing to improve quality. The salient features of both poor and robust (i.e., highly generalizable) multivariate models are outlined together with advice on validating classifiers and avoiding false discovery when seeking explanatory variables.

  11. Radar signal pre-processing to suppress surface bounce and multipath

    SciTech Connect

    Paglieroni, David W; Mast, Jeffrey E; Beer, N. Reginald

    2013-12-31

    A method and system for detecting the presence of subsurface objects within a medium is provided. In some embodiments, the imaging and detection system operates in a multistatic mode to collect radar return signals generated by an array of transceiver antenna pairs that is positioned across the surface and that travels down the surface. The imaging and detection system pre-processes that return signal to suppress certain undesirable effects. The imaging and detection system then generates synthetic aperture radar images from real aperture radar images generated from the pre-processed return signal. The imaging and detection system then post-processes the synthetic aperture radar images to improve detection of subsurface objects. The imaging and detection system identifies peaks in the energy levels of the post-processed image frame, which indicates the presence of a subsurface object.

  12. The Role of GRAIL Orbit Determination in Preprocessing of Gravity Science Measurements

    NASA Technical Reports Server (NTRS)

    Kruizinga, Gerhard; Asmar, Sami; Fahnestock, Eugene; Harvey, Nate; Kahan, Daniel; Konopliv, Alex; Oudrhiri, Kamal; Paik, Meegyeong; Park, Ryan; Strekalov, Dmitry; Watkins, Michael; Yuan, Dah-Ning

    2013-01-01

    The Gravity Recovery And Interior Laboratory (GRAIL) mission has constructed a lunar gravity field with unprecedented uniform accuracy on the farside and nearside of the Moon. GRAIL lunar gravity field determination begins with preprocessing of the gravity science measurements by applying corrections for time tag error, general relativity, measurement noise and biases. Gravity field determination requires the generation of spacecraft ephemerides of an accuracy not attainable with the pre-GRAIL lunar gravity fields. Therefore, a bootstrapping strategy was developed, iterating between science data preprocessing and lunar gravity field estimation in order to construct sufficiently accurate orbit ephemerides.This paper describes the GRAIL measurements, their dependence on the spacecraft ephemerides and the role of orbit determination in the bootstrapping strategy. Simulation results will be presented that validate the bootstrapping strategy followed by bootstrapping results for flight data, which have led to the latest GRAIL lunar gravity fields.

  13. A clinical evaluation of the RNCA study using Fourier filtering as a preprocessing method

    SciTech Connect

    Robeson, W.; Alcan, K.E.; Graham, M.C.; Palestro, C.; Oliver, F.H.; Benua, R.S.

    1984-06-01

    Forty-one patients (25 male, 16 female) were studied by Radionuclide Cardangiography (RNCA) in our institution. There were 42 rest studies and 24 stress studies (66 studies total). Sixteen patients were normal, 15 had ASHD, seven had a cardiomyopathy, and three had left-sided valvular regurgitation. Each study was preprocessed using both the standard nine-point smoothing method and Fourier filtering. Amplitude and phase images were also generated. Both preprocessing methods were compared with respect to image quality, border definition, reliability and reproducibility of the LVEF, and cine wall motion interpretation. Image quality and border definition were judged superior by the consensus of two independent observers in 65 of 66 studies (98%) using Fourier filtered data. The LVEF differed between the two processes by greater than .05 in 17 of 66 studies (26%) including five studies in which the LVEF could not be determined using nine-point smoothed data. LV wall motion was normal by both techniques in all control patients by cine analysis. However, cine wall motion analysis using Fourier filtered data demonstrated additional abnormalities in 17 of 25 studies (68%) in the ASHD group, including three uninterpretable studies using nine-point smoothed data. In the cardiomyopathy/valvular heart disease group, ten of 18 studies (56%) had additional wall motion abnormalities using Fourier filtered data (including four uninterpretable studies using nine-point smoothed data). We conclude that Fourier filtering is superior to the nine-point smooth preprocessing method now in general use in terms of image quality, border definition, generation of an LVEF, and cine wall motion analysis. The advent of the array processor makes routine preprocessing by Fourier filtering a feasible technologic advance in the development of the RNCA study.

  14. Pre-Processing Noise Cross-Correlations with Equalizing the Network Covariance Matrix Eigen-Spectrum

    NASA Astrophysics Data System (ADS)

    Seydoux, L.; de Rosny, J.; Shapiro, N.

    2016-12-01

    Theoretically, the extraction of Green functions from noise cross-correlation requires the ambient seismic wavefield to be generated by uncorrelated sources evenly distributed in the medium. Yet, this condition is often not verified. Strong events such as earthquakes often produce highly coherent transient signals. Also, the microseismic noise is generated at specific places on the Earth's surface with source regions often very localized in space. Different localized and persistent seismic sources may contaminate the cross-correlations of continuous records resulting in spurious arrivals or asymmetry and, finally, in biased travel-time measurements. Pre-processing techniques therefore must be applied to the seismic data in order to reduce the effect of noise anisotropy and the influence of strong localized events. Here we describe a pre-processing approach that uses the covariance matrix computed from signals recorded by a network of seismographs. We extend the widely used time and spectral equalization pre-processing to the equalization of the covariance matrix spectrum (i.e., its ordered eigenvalues). This approach can be considered as a spatial equalization. This method allows us to correct for the wavefield anisotropy in two ways: (1) the influence of strong directive sources is substantially attenuated, and (2) the weakly excited modes are reinforced, allowing to partially recover the conditions required for the Green's function retrieval. We also present an eigenvector-based spatial filter used to distinguish between surface and body waves. This last filter is used together with the equalization of the eigenvalue spectrum. We simulate two-dimensional wavefield in a heterogeneous medium with strongly dominating source. We show that our method greatly improves the travel-time measurements obtained from the inter-station cross-correlation functions. Also, we apply the developed method to the USArray data and pre-process the continuous records strongly influenced

  15. Optimizing fMRI Preprocessing Pipelines for Block-Design Tasks as a Function of Age.

    PubMed

    Churchill, Nathan W; Raamana, Pradeep; Spring, Robyn; Strother, Stephen C

    2017-02-12

    Functional Magnetic Resonance Imaging (fMRI) is a powerful neuroimaging tool, which is often hampered by significant noise confounds. There is evidence that our ability to detect activations in task fMRI is highly dependent on the preprocessing steps used to control noise and artifact. However, the vast majority of studies examining preprocessing pipelines in fMRI have focused on young adults. Given the widespread use of fMRI for characterizing the neurobiology of aging, it is critical to examine how the impact of preprocessing choices varies as a function of age. In this study, we employ the NPAIRS cross-validation framework, which optimizes pipelines based on metrics of prediction accuracy (P) and spatial reproducibility (R), to compare the effects of pipeline optimization between young (21-33 years) and older (61-82 years) cohorts, for three different block-design contrasts. Motion is shown to be a greater issue in the older cohort, and we introduce new statistical approaches to control for potential biases due to head motion during pipeline optimization. In comparison, data-driven methods of physiological noise correction show comparable benefits for both young and old cohorts. Using our optimization framework, we demonstrate that the optimal pipelines tend to be highly similar across age cohorts. In addition, there is a comparable, significant benefit of pipeline optimization across age cohorts, for (P, R) metrics and independent validation measures of activation overlap (both between-subject, within-session and within-subject, between-session). The choice of task contrast consistently shows a greater impact than the age cohort, for (P, R) metrics and activation overlap. Finally, adaptive pipeline optimization per task run shows improved sensitivity to age-related changes in brain activity, particularly for weaker, more complex cognitive contrasts. The current study provides the first detailed examination of preprocessing pipelines across age cohorts

  16. Data preprocessing for a vehicle-based localization system used in road traffic applications

    NASA Astrophysics Data System (ADS)

    Patelczyk, Timo; Löffler, Andreas; Biebl, Erwin

    2016-09-01

    This paper presents a fixed-point implementation of the preprocessing using a field programmable gate array (FPGA), which is required for a multipath joint angle and delay estimation (JADE) used in road traffic applications. This paper lays the foundation for many model-based parameter estimation methods. Here, a simulation of a vehicle-based localization system application for protecting vulnerable road users, which were equipped with appropriate transponders, is considered. For such safety critical applications, the robustness and real-time capability of the localization is particularly important. Additionally, a motivation to use a fixed-point implementation for the data preprocessing is a limited computing power of the head unit of a vehicle. This study aims to process the raw data provided by the localization system used in this paper. The data preprocessing applied includes a wideband calibration of the physical localization system, separation of relevant information from the received sampled signal, and preparation of the incoming data via further processing. Further, a channel matrix estimation was implemented to complete the data preprocessing, which contains information on channel parameters, e.g., the positions of the objects to be located. In the presented case of a vehicle-based localization system application we assume an urban environment, in which multipath propagation occurs. Since most methods for localization are based on uncorrelated signals, this fact must be addressed. Hence, a decorrelation of incoming data stream in terms of a further localization is required. This decorrelation was accomplished by considering several snapshots in different time slots. As a final aspect of the use of fixed-point arithmetic, quantization errors are considered. In addition, the resources and runtime of the presented implementation are discussed; these factors are strongly linked to a practical implementation.

  17. Hyperspectral imaging in medicine: image pre-processing problems and solutions in Matlab.

    PubMed

    Koprowski, Robert

    2015-11-01

    The paper presents problems and solutions related to hyperspectral image pre-processing. New methods of preliminary image analysis are proposed. The paper shows problems occurring in Matlab when trying to analyse this type of images. Moreover, new methods are discussed which provide the source code in Matlab that can be used in practice without any licensing restrictions. The proposed application and sample result of hyperspectral image analysis. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. ExpressYourself: a modular platform for processing and visualizing microarray data

    PubMed Central

    Luscombe, Nicholas M.; Royce, Thomas E.; Bertone, Paul; Echols, Nathaniel; Horak, Christine E.; Chang, Joseph T.; Snyder, Michael; Gerstein, Mark

    2003-01-01

    DNA microarrays are widely used in biological research; by analyzing differential hybridization on a single microarray slide, one can detect changes in mRNA expression levels, increases in DNA copy numbers and the location of transcription factor binding sites on a genomic scale. Having performed the experiments, the major challenge is to process large, noisy datasets in order to identify the specific array elements that are significantly differentially hybridized. This normally requires aggregating different, often incompatible programs into a multi-step pipeline. Here we present ExpressYourself, a fully integrated platform for processing microarray data. In completely automated fashion, it will correct the background array signal, normalize the Cy5 and Cy3 signals, score levels of differential hybridization, combine the results of replicate experiments, filter problematic regions of the array and assess the quality of individual and replicate experiments. ExpressYourself is designed with a highly modular architecture so various types of microarray analysis algorithms can readily be incorporated as they are developed; for example, the system currently implements several normalization methods, including those that simultaneously consider signal intensity and slide location. The processed data are presented using a web-based graphical interface to facilitate comparison with the original images of the array slides. In particular, Express Yourself is able to regenerate images of the original microarray after applying various steps of processing, which greatly facilities identification of position-specific artifacts. The program is freely available for use at http://bioinfo.mbb.yale.edu/expressyourself. PMID:12824348

  19. Foveal processing difficulty does not affect parafoveal preprocessing in young readers

    PubMed Central

    Marx, Christina; Hawelka, Stefan; Schuster, Sarah; Hutzler, Florian

    2017-01-01

    Recent evidence suggested that parafoveal preprocessing develops early during reading acquisition, that is, young readers profit from valid parafoveal information and exhibit a resultant preview benefit. For young readers, however, it is unknown whether the processing demands of the currently fixated word modulate the extent to which the upcoming word is parafoveally preprocessed – as it has been postulated (for adult readers) by the foveal load hypothesis. The present study used the novel incremental boundary technique to assess whether 4th and 6th Graders exhibit an effect of foveal load. Furthermore, we attempted to distinguish the foveal load effect from the spillover effect. These effects are hard to differentiate with respect to the expected pattern of results, but are conceptually different. The foveal load effect is supposed to reflect modulations of the extent of parafoveal preprocessing, whereas the spillover effect reflects the ongoing processing of the previous word whilst the reader’s fixation is already on the next word. The findings revealed that the young readers did not exhibit an effect of foveal load, but a substantial spillover effect. The implications for previous studies with adult readers and for models of eye movement control in reading are discussed. PMID:28139718

  20. Learning-based image preprocessing for robust computer-aided detection

    NASA Astrophysics Data System (ADS)

    Raghupathi, Laks; Devarakota, Pandu R.; Wolf, Matthias

    2013-03-01

    Recent studies have shown that low dose computed tomography (LDCT) can be an effective screening tool to reduce lung cancer mortality. Computer-aided detection (CAD) would be a beneficial second reader for radiologists in such cases. Studies demonstrate that while iterative reconstructions (IR) improve LDCT diagnostic quality, it however degrades CAD performance significantly (increased false positives) when applied directly. For improving CAD performance, solutions such as retraining with newer data or applying a standard preprocessing technique may not be suffice due to high prevalence of CT scanners and non-uniform acquisition protocols. Here, we present a learning-based framework that can adaptively transform a wide variety of input data to boost an existing CAD performance. This not only enhances their robustness but also their applicability in clinical workflows. Our solution consists of applying a suitable pre-processing filter automatically on the given image based on its characteristics. This requires the preparation of ground truth (GT) of choosing an appropriate filter resulting in improved CAD performance. Accordingly, we propose an efficient consolidation process with a novel metric. Using key anatomical landmarks, we then derive consistent feature descriptors for the classification scheme that then uses a priority mechanism to automatically choose an optimal preprocessing filter. We demonstrate CAD prototype∗ performance improvement using hospital-scale datasets acquired from North America, Europe and Asia. Though we demonstrated our results for a lung nodule CAD, this scheme is straightforward to extend to other post-processing tools dedicated to other organs and modalities.

  1. Data pre-processing to improve the mining of large feed databases.

    PubMed

    Maroto-Molina, F; Gómez-Cabrera, A; Guerrero-Ginel, J E; Garrido-Varo, A; Sauvant, D; Tran, G; Heuzé, V; Pérez-Marín, D C

    2013-07-01

    The information stored in animal feed databases is highly variable, in terms of both provenance and quality; therefore, data pre-processing is essential to ensure reliable results. Yet, pre-processing at best tends to be unsystematic; at worst, it may even be wholly ignored. This paper sought to develop a systematic approach to the various stages involved in pre-processing to improve feed database outputs. The database used contained analytical and nutritional data on roughly 20 000 alfalfa samples. A range of techniques were examined for integrating data from different sources, for detecting duplicates and, particularly, for detecting outliers. Special attention was paid to the comparison of univariate and multivariate solutions. Major issues relating to the heterogeneous nature of data contained in this database were explored, the observed outliers were characterized and ad hoc routines were designed for error control. Finally, a heuristic diagram was designed to systematize the various aspects involved in the detection and management of outliers and errors.

  2. Reducing Uncertainties of Hydrologic Model Predictions Using a New Ensemble Pre-Processing Approach

    NASA Astrophysics Data System (ADS)

    Khajehei, S.; Moradkhani, H.

    2015-12-01

    Ensemble Streamflow Prediction (ESP) was developed to characterize the uncertainty in hydrologic predictions. However, ESP outputs are still prone to bias due to the uncertainty in the forcing data, initial condition, and model structure. Among these, uncertainty in forcing data has a major impact on the reliability of hydrologic simulations/forecasts. Major steps have been taken in generating less uncertain precipitation forecasts such as the Ensemble Pre-Processing (EPP) to achieve this goal. EPP is introduced as a statistical procedure based on the bivariate joint distribution between observation and forecast to generate ensemble climatologic forecast from single-value forecast. The purpose of this study is to evaluate the performance of pre-processed ensemble precipitation forecast in generating ensemble streamflow predictions. Copula functions used in EPP, model the multivariate joint distribution between univariate variables with any level of dependency. Accordingly, ESP is generated by employing both raw ensemble precipitation forecast as well as pre-processed ensemble precipitation. The ensemble precipitation forecast is taken from Climate Forecast System (CFS) generated by National Weather Service's (NWS) National Centers for Environmental Prediction (NCEP) models. Study is conducted using the precipitation Runoff Modeling System (PRMS) over two basins in the Pacific Northwest USA for the period of 1979 to 2013. Results reveal that applying this new EPP will lead to reduction of uncertainty and overall improvement in the ESP.

  3. Reporting of Resting-State Functional Magnetic Resonance Imaging Preprocessing Methodologies.

    PubMed

    Waheed, Syed Hamza; Mirbagheri, Saeedeh; Agarwal, Shruti; Kamali, Arash; Yahyavi-Firouz-Abadi, Noushin; Chaudhry, Ammar; DiGianvittorio, Michael; Gujar, Sachin K; Pillai, Jay J; Sair, Haris I

    2016-11-01

    There has been a rapid increase in resting-state functional magnetic resonance imaging (rs-fMRI) literature in the past few years. We aim to highlight the variability in the current reporting practices of rs-fMRI acquisition and preprocessing parameters. The PubMed database was searched for the selection of appropriate articles in the rs-fMRI literature and the most recent 100 articles were selected based on our criteria. These articles were evaluated based on a checklist for reporting of certain preprocessing steps. All of the studies reported the temporal resolution for the scan and the software used for the analysis. Less than half of the studies reported physiologic monitoring, despiking, global signal regression, framewise displacement, and volume censoring. A majority of the studies mentioned the scanning duration, eye status, and smoothing kernel. Overall, we demonstrate the wide variability in reporting of preprocessing methods in rs-fMRI studies. Although there might be potential variability in reporting across studies due to individual requirements for a study, we suggest the need for standardizing reporting guidelines to ensure reproducibility.

  4. Optimization Based Tumor Classification from Microarray Gene Expression Data

    PubMed Central

    Dagliyan, Onur; Uney-Yuksektepe, Fadime; Kavakli, I. Halil; Turkay, Metin

    2011-01-01

    Background An important use of data obtained from microarray measurements is the classification of tumor types with respect to genes that are either up or down regulated in specific cancer types. A number of algorithms have been proposed to obtain such classifications. These algorithms usually require parameter optimization to obtain accurate results depending on the type of data. Additionally, it is highly critical to find an optimal set of markers among those up or down regulated genes that can be clinically utilized to build assays for the diagnosis or to follow progression of specific cancer types. In this paper, we employ a mixed integer programming based classification algorithm named hyper-box enclosure method (HBE) for the classification of some cancer types with a minimal set of predictor genes. This optimization based method which is a user friendly and efficient classifier may allow the clinicians to diagnose and follow progression of certain cancer types. Methodology/Principal Findings We apply HBE algorithm to some well known data sets such as leukemia, prostate cancer, diffuse large B-cell lymphoma (DLBCL), small round blue cell tumors (SRBCT) to find some predictor genes that can be utilized for diagnosis and prognosis in a robust manner with a high accuracy. Our approach does not require any modification or parameter optimization for each data set. Additionally, information gain attribute evaluator, relief attribute evaluator and correlation-based feature selection methods are employed for the gene selection. The results are compared with those from other studies and biological roles of selected genes in corresponding cancer type are described. Conclusions/Significance The performance of our algorithm overall was better than the other algorithms reported in the literature and classifiers found in WEKA data-mining package. Since it does not require a parameter optimization and it performs consistently very high prediction rate on different type of

  5. Chromosomal Microarray versus Karyotyping for Prenatal Diagnosis

    PubMed Central

    Wapner, Ronald J.; Martin, Christa Lese; Levy, Brynn; Ballif, Blake C.; Eng, Christine M.; Zachary, Julia M.; Savage, Melissa; Platt, Lawrence D.; Saltzman, Daniel; Grobman, William A.; Klugman, Susan; Scholl, Thomas; Simpson, Joe Leigh; McCall, Kimberly; Aggarwal, Vimla S.; Bunke, Brian; Nahum, Odelia; Patel, Ankita; Lamb, Allen N.; Thom, Elizabeth A.; Beaudet, Arthur L.; Ledbetter, David H.; Shaffer, Lisa G.; Jackson, Laird

    2013-01-01

    Background Chromosomal microarray analysis has emerged as a primary diagnostic tool for the evaluation of developmental delay and structural malformations in children. We aimed to evaluate the accuracy, efficacy, and incremental yield of chromosomal microarray analysis as compared with karyotyping for routine prenatal diagnosis. Methods Samples from women undergoing prenatal diagnosis at 29 centers were sent to a central karyotyping laboratory. Each sample was split in two; standard karyotyping was performed on one portion and the other was sent to one of four laboratories for chromosomal microarray. Results We enrolled a total of 4406 women. Indications for prenatal diagnosis were advanced maternal age (46.6%), abnormal result on Down’s syndrome screening (18.8%), structural anomalies on ultrasonography (25.2%), and other indications (9.4%). In 4340 (98.8%) of the fetal samples, microarray analysis was successful; 87.9% of samples could be used without tissue culture. Microarray analysis of the 4282 nonmosaic samples identified all the aneuploidies and unbalanced rearrangements identified on karyotyping but did not identify balanced translocations and fetal triploidy. In samples with a normal karyotype, microarray analysis revealed clinically relevant deletions or duplications in 6.0% with a structural anomaly and in 1.7% of those whose indications were advanced maternal age or positive screening results. Conclusions In the context of prenatal diagnostic testing, chromosomal microarray analysis identified additional, clinically significant cytogenetic information as compared with karyotyping and was equally efficacious in identifying aneuploidies and unbalanced rearrangements but did not identify balanced translocations and triploidies. (Funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development and others; ClinicalTrials.gov number, NCT01279733.) PMID:23215555

  6. Chromosomal microarray versus karyotyping for prenatal diagnosis.

    PubMed

    Wapner, Ronald J; Martin, Christa Lese; Levy, Brynn; Ballif, Blake C; Eng, Christine M; Zachary, Julia M; Savage, Melissa; Platt, Lawrence D; Saltzman, Daniel; Grobman, William A; Klugman, Susan; Scholl, Thomas; Simpson, Joe Leigh; McCall, Kimberly; Aggarwal, Vimla S; Bunke, Brian; Nahum, Odelia; Patel, Ankita; Lamb, Allen N; Thom, Elizabeth A; Beaudet, Arthur L; Ledbetter, David H; Shaffer, Lisa G; Jackson, Laird

    2012-12-06

    Chromosomal microarray analysis has emerged as a primary diagnostic tool for the evaluation of developmental delay and structural malformations in children. We aimed to evaluate the accuracy, efficacy, and incremental yield of chromosomal microarray analysis as compared with karyotyping for routine prenatal diagnosis. Samples from women undergoing prenatal diagnosis at 29 centers were sent to a central karyotyping laboratory. Each sample was split in two; standard karyotyping was performed on one portion and the other was sent to one of four laboratories for chromosomal microarray. We enrolled a total of 4406 women. Indications for prenatal diagnosis were advanced maternal age (46.6%), abnormal result on Down's syndrome screening (18.8%), structural anomalies on ultrasonography (25.2%), and other indications (9.4%). In 4340 (98.8%) of the fetal samples, microarray analysis was successful; 87.9% of samples could be used without tissue culture. Microarray analysis of the 4282 nonmosaic samples identified all the aneuploidies and unbalanced rearrangements identified on karyotyping but did not identify balanced translocations and fetal triploidy. In samples with a normal karyotype, microarray analysis revealed clinically relevant deletions or duplications in 6.0% with a structural anomaly and in 1.7% of those whose indications were advanced maternal age or positive screening results. In the context of prenatal diagnostic testing, chromosomal microarray analysis identified additional, clinically significant cytogenetic information as compared with karyotyping and was equally efficacious in identifying aneuploidies and unbalanced rearrangements but did not identify balanced translocations and triploidies. (Funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development and others; ClinicalTrials.gov number, NCT01279733.).

  7. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships

    PubMed Central

    2010-01-01

    Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245

  8. Immune-Signatures for Lung Cancer Diagnostics: Evaluation of Protein Microarray Data Normalization Strategies

    PubMed Central

    Brezina, Stefanie; Soldo, Regina; Kreuzhuber, Roman; Hofer, Philipp; Gsur, Andrea; Weinhaeusel, Andreas

    2015-01-01

    New minimal invasive diagnostic methods for early detection of lung cancer are urgently needed. It is known that the immune system responds to tumors with production of tumor-autoantibodies. Protein microarrays are a suitable highly multiplexed platform for identification of autoantibody signatures against tumor-associated antigens (TAA). These microarrays can be probed using 0.1 mg immunoglobulin G (IgG), purified from 10 µL of plasma. We used a microarray comprising recombinant proteins derived from 15,417 cDNA clones for the screening of 100 lung cancer samples, including 25 samples of each main histological entity of lung cancer, and 100 controls. Since this number of samples cannot be processed at once, the resulting data showed non-biological variances due to “batch effects”. Our aim was to evaluate quantile normalization, “distance-weighted discrimination” (DWD), and “ComBat” for their effectiveness in data pre-processing for elucidating diagnostic immune-signatures. “ComBat” data adjustment outperformed the other methods and allowed us to identify classifiers for all lung cancer cases versus controls and small-cell, squamous cell, large-cell, and adenocarcinoma of the lung with an accuracy of 85%, 94%, 96%, 92%, and 83% (sensitivity of 0.85, 0.92, 0.96, 0.88, 0.83; specificity of 0.85, 0.96, 0.96, 0.96, 0.83), respectively. These promising data would be the basis for further validation using targeted autoantibody tests. PMID:27600218

  9. Feature extraction and signal processing for nylon DNA microarrays

    PubMed Central

    Lopez, F; Rougemont, J; Loriod, B; Bourgeois, A; Loï, L; Bertucci, F; Hingamp, P; Houlgatte, R; Granjeaud, S

    2004-01-01

    Background High-density DNA microarrays require automatic feature extraction methodologies and softwares. These can be a potential source of non-reproducibility of gene expression measurements. Variation in feature location or in signal integration methodology may be a significant contribution to the observed variance in gene expression levels. Results We explore sources of variability in feature extraction from DNA microarrays on Nylon membrane with radioactive detection. We introduce a mathematical model of the signal emission and derive methods for correcting biases such as overshining, saturation or variation in probe amount. We also provide a quality metric which can be used qualitatively to flag weak or untrusted signals or quantitatively to modulate the weight of each experiment or gene in higher level analyses (clustering or discriminant analysis). Conclusions Our novel feature extraction methodology, based on a mathematical model of the radioactive emission, reduces variability due to saturation, neighbourhood effects and variable probe amount. Furthermore, we provide a fully automatic feature extraction software, BZScan, which implements the algorithms described in this paper. PMID:15222896

  10. Bayesian decomposition: analyzing microarray data within a biological context.

    PubMed

    Ochs, Michael F; Moloshok, Thomas D; Bidaut, Ghislain; Toby, Garabet

    2004-05-01

    The detection and correct identification of cancer, especially at an early stage, are vitally important for patient survival and quality of life. Since signaling pathways play critical roles in cancer development and metastasis, methods that reliably assess the activity of these pathways are critical to understand cancer and the response to therapy. Bayesian Decomposition (BD) identifies signatures of expression that can be linked directly to signaling pathway activity, allowing the changes in mRNA levels to be used as downstream indicators of pathway activity. Here, we demonstrate this ability by identifying the downstream expression signal associated with the mating response in Saccharomyces cerevisiae and showing that this signal disappears in deletion mutants of genes critical to the MAPK signaling cascade used to trigger the response. We also show the use of BD in the context of supervised learning, by analyzing the Mus musculus tissue-specific data set provided by Project Normal. The algorithm correctly removes routine metabolic processes, allowing tissue-specific signatures of expression to be identified. Gene ontology is used to interpret these signatures. Since a number of modern therapeutics specifically target signaling proteins, it is important to be able to identify changes in signaling pathways in order to use microarray data to interpret cancer response. By removing routine metabolic signatures and linking specific signatures to signaling pathway activity, BD makes it possible to link changes in microarray results to signaling pathways.

  11. Performing custom microRNA microarray experiments.

    PubMed

    Zhang, Xiaoxiao; Zeng, Yan

    2011-10-28

    microRNAs (miRNAs) are a large family of ˜ 22 nucleotides (nt) long RNA molecules that are widely expressed in eukaryotes (1). Complex genomes encode at least hundreds of miRNAs, which primarily inhibit the expression of a vast number of target genes post-transcriptionally (2, 3). miRNAs control a broad range of biological processes (1). In addition, altered miRNA expression has been associated with human diseases such as cancers, and miRNAs may serve as biomarkers for diseases and prognosis (4, 5). It is important, therefore, to understand the expression and functions of miRNAs under many different conditions. Three major approaches have been employed to profile miRNA expression: real-time PCR, microarray, and deep sequencing. The technique of miRNA microarray has the advantage of being high-throughput, generally less expensive, and most of the experimental and analysis steps can be carried out in a molecular biology laboratory at most universities, medical schools and associated hospitals. Here, we describe a method for performing custom miRNA microarray experiments. A miRNA probe set will be printed on glass slides to produce miRNA microarrays. RNA is isolated using a method or reagent that preserves small RNA species, and then labeled with a fluorescence dye. As a control, reference DNA oligonucleotides corresponding to a subset of miRNAs are also labeled with a different fluorescence dye. The reference DNA will serve to demonstrate the quality of the slide and hybridization and will also be used for data normalization. The RNA and DNA are mixed and hybridized to a microarray slide containing probes for most of the miRNAs in the database. After washing, the slide is scanned to obtain images, and intensities of the individual spots quantified. These raw signals will be further processed and analyzed as the expression data of the corresponding miRNAs. Microarray slides can be stripped and regenerated to reduce the cost of microarrays and to enhance the

  12. Validation of affinity reagents using antigen microarrays.

    PubMed

    Sjöberg, Ronald; Sundberg, Mårten; Gundberg, Anna; Sivertsson, Asa; Schwenk, Jochen M; Uhlén, Mathias; Nilsson, Peter

    2012-06-15

    There is a need for standardised validation of affinity reagents to determine their binding selectivity and specificity. This is of particular importance for systematic efforts that aim to cover the human proteome with different types of binding reagents. One such international program is the SH2-consortium, which was formed to generate a complete set of renewable affinity reagents to the SH2-domain containing human proteins. Here, we describe a microarray strategy to validate various affinity reagents, such as recombinant single-chain antibodies, mouse monoclonal antibodies and antigen-purified polyclonal antibodies using a highly multiplexed approach. An SH2-specific antigen microarray was designed and generated, containing more than 6000 spots displayed by 14 identical subarrays each with 406 antigens, where 105 of them represented SH2-domain containing proteins. Approximately 400 different affinity reagents of various types were analysed on these antigen microarrays carrying antigens of different types. The microarrays revealed not only very detailed specificity profiles for all the binders, but also showed that overlapping target sequences of spotted antigens were detected by off-target interactions. The presented study illustrates the feasibility of using antigen microarrays for integrative, high-throughput validation of various types of binders and antigens.

  13. Preprocessing of Holter ECGs for analysis of the dynamic interrelations between heart rate and ventricular repolarization duration variabilities.

    PubMed

    Merri, M; Alberti, M; Benhorin, J; Moss, A J

    1990-01-01

    The use of approved commercial digital Holter systems as equipment for data acquisition in cardiologic research presents several potential advantages. In fact, it may provide a standard data acquisition procedure that allows the scientists to concentrate on the data processing phase. As a consequence, it may contribute to the comparability of results among different laboratories. In general, some steps of preprocessing are needed before performing the final analysis on the data. In case of study on the variabilities of the heart rate and the duration of repolarization, preprocessing consists of beat-by-beat measurement time intervals in the 24-hour electrocardiograms. Further preprocessing steps may be required, depending on the particular analysis technique that will be applied to the data. In particular, the present study introduces an operator-free method of preprocessing Holter electrocardiograms for time series analysis of signals related to heart rate variability and variability in the duration of ventricular repolarization.

  14. Estimating the proportion of microarray probes expressed in an RNA sample

    PubMed Central

    Shi, Wei; de Graaf, Carolyn A.; Kinkel, Sarah A.; Achtman, Ariel H.; Baldwin, Tracey; Schofield, Louis; Scott, Hamish S.; Hilton, Douglas J.; Smyth, Gordon K.

    2010-01-01

    A fundamental question in microarray analysis is the estimation of the number of expressed probes in different RNA samples. Negative control probes available in the latest microarray platforms, such as Illumina whole genome expression BeadChips, provide a unique opportunity to estimate the number of expressed probes without setting a threshold. A novel algorithm was proposed in this study to estimate the number of expressed probes in an RNA sample by utilizing these negative controls to measure background noise. The performance of the algorithm was demonstrated by comparing different generations of Illumina BeadChips, comparing the set of probes targeting well-characterized RefSeq NM transcripts with other probes on the array and comparing pure samples with heterogenous samples. Furthermore, hematopoietic stem cells were found to have a larger transcriptome than progenitor cells. Aire knockout medullary thymic epithelial cells were shown to have significantly less expressed probes than matched wild-type cells. PMID:20056656

  15. Posttranslational Modification Assays on Functional Protein Microarrays.

    PubMed

    Neiswinger, Johnathan; Uzoma, Ijeoma; Cox, Eric; Rho, HeeSool; Jeong, Jun Seop; Zhu, Heng

    2016-10-03

    Protein microarray technology provides a straightforward yet powerful strategy for identifying substrates of posttranslational modifications (PTMs) and studying the specificity of the enzymes that catalyze these reactions. Protein microarray assays can be designed for individual enzymes or a mixture to establish connections between enzymes and substrates. Assays for four well-known PTMs-phosphorylation, acetylation, ubiquitylation, and SUMOylation-have been developed and are described here for use on functional protein microarrays. Phosphorylation and acetylation require a single enzyme and are easily adapted for use on an array. The ubiquitylation and SUMOylation cascades are very similar, and the combination of the E1, E2, and E3 enzymes plus ubiquitin or SUMO protein and ATP is sufficient for in vitro modification of many substrates.

  16. Microarray analysis of erythromycin resistance determinants.

    PubMed

    Volokhov, D; Chizhikov, V; Chumakov, K; Rasooly, A

    2003-01-01

    To develop a DNA microarray for analysis of genes encoding resistance determinants to erythromycin and the related macrolide, lincosamide and streptogramin B (MLS) compounds. We developed an oligonucleotide microarray containing seven oligonucleotide probes (oligoprobes) for each of the six genes (ermA, ermB, ermC, ereA, ereB and msrA/B) that account for more than 98% of MLS resistance in Staphylococcus aureus clinical isolates. The microarray was used to test reference and clinical S. aureus and Streptococcus pyrogenes strains. Target genes from clinical strains were amplified and fluorescently labelled using multiplex PCR target amplification. The microarray assay correctly identified the MLS resistance genes in the reference strains and clinical isolates of S. aureus, and the results were confirmed by direct DNA sequence analysis. Of 18 S. aureus clinical strains tested, 11 isolates carry MLS determinants. One gene (ermC) was found in all 11 clinical isolates tested, and two others, ermA and msrA/B, were found in five or more isolates. Indeed, eight (72%) of 11 clinical isolate strains contained two or three MLS resistance genes, in one of the three combinations (ermA with ermC, ermC with msrA/B, ermA with ermC and msrA/B). Oligonucleotide microarray can detect and identify the six MLS resistance determinants analysed in this study. Our results suggest that microarray-based detection of microbial antibiotic resistance genes might be a useful tool for identifying antibiotic resistance determinants in a wide range of bacterial strains, given the high homology among microbial MLS resistance genes.

  17. Hybridization and Selective Release of DNA Microarrays

    SciTech Connect

    Beer, N R; Baker, B; Piggott, T; Maberry, S; Hara, C M; DeOtte, J; Benett, W; Mukerjee, E; Dzenitis, J; Wheeler, E K

    2011-11-29

    DNA microarrays contain sequence specific probes arrayed in distinct spots numbering from 10,000 to over 1,000,000, depending on the platform. This tremendous degree of multiplexing gives microarrays great potential for environmental background sampling, broad-spectrum clinical monitoring, and continuous biological threat detection. In practice, their use in these applications is not common due to limited information content, long processing times, and high cost. The work focused on characterizing the phenomena of microarray hybridization and selective release that will allow these limitations to be addressed. This will revolutionize the ways that microarrays can be used for LLNL's Global Security missions. The goals of this project were two-fold: automated faster hybridizations and selective release of hybridized features. The first study area involves hybridization kinetics and mass-transfer effects. the standard hybridization protocol uses an overnight incubation to achieve the best possible signal for any sample type, as well as for convenience in manual processing. There is potential to significantly shorten this time based on better understanding and control of the rate-limiting processes and knowledge of the progress of the hybridization. In the hybridization work, a custom microarray flow cell was used to manipulate the chemical and thermal environment of the array and autonomously image the changes over time during hybridization. The second study area is selective release. Microarrays easily generate hybridization patterns and signatures, but there is still an unmet need for methodologies enabling rapid and selective analysis of these patterns and signatures. Detailed analysis of individual spots by subsequent sequencing could potentially yield significant information for rapidly mutating and emerging (or deliberately engineered) pathogens. In the selective release work, optical energy deposition with coherent light quickly provides the thermal energy to

  18. Characterization and simulation of cDNA microarray spots using a novel mathematical model

    PubMed Central

    Kim, Hye Young; Lee, Seo Eun; Kim, Min Jung; Han, Jin Il; Kim, Bo Kyung; Lee, Yong Sung; Lee, Young Seek; Kim, Jin Hyuk

    2007-01-01

    Background The quality of cDNA microarray data is crucial for expanding its application to other research areas, such as the study of gene regulatory networks. Despite the fact that a number of algorithms have been suggested to increase the accuracy of microarray gene expression data, it is necessary to obtain reliable microarray images by improving wet-lab experiments. As the first step of a cDNA microarray experiment, spotting cDNA probes is critical to determining the quality of spot images. Results We developed a governing equation of cDNA deposition during evaporation of a drop in the microarray spotting process. The governing equation included four parameters: the surface site density on the support, the extrapolated equilibrium constant for the binding of cDNA molecules with surface sites on glass slides, the macromolecular interaction factor, and the volume constant of a drop of cDNA solution. We simulated cDNA deposition from the single model equation by varying the value of the parameters. The morphology of the resulting cDNA deposit can be classified into three types: a doughnut shape, a peak shape, and a volcano shape. The spot morphology can be changed into a flat shape by varying the experimental conditions while considering the parameters of the governing equation of cDNA deposition. The four parameters were estimated by fitting the governing equation to the real microarray images. With the results of the simulation and the parameter estimation, the phenomenon of the formation of cDNA deposits in each type was investigated. Conclusion This study explains how various spot shapes can exist and suggests which parameters are to be adjusted for obtaining a good spot. This system is able to explore the cDNA microarray spotting process in a predictable, manageable and descriptive manner. We hope it can provide a way to predict the incidents that can occur during a real cDNA microarray experiment, and produce useful data for several research applications

  19. Optimal cDNA microarray design using expressed sequence tags for organisms with limited genomic information

    PubMed Central

    Chen, Yian A; Mckillen, David J; Wu, Shuyuan; Jenny, Matthew J; Chapman, Robert; Gross, Paul S; Warr, Gregory W; Almeida, Jonas S

    2004-01-01

    Background Expression microarrays are increasingly used to characterize environmental responses and host-parasite interactions for many different organisms. Probe selection for cDNA microarrays using expressed sequence tags (ESTs) is challenging due to high sequence redundancy and potential cross-hybridization between paralogous genes. In organisms with limited genomic information, like marine organisms, this challenge is even greater due to annotation uncertainty. No general tool is available for cDNA microarray probe selection for these organisms. Therefore, the goal of the design procedure described here is to select a subset of ESTs that will minimize sequence redundancy and characterize potential cross-hybridization while providing functionally representative probes. Results Sequence similarity between ESTs, quantified by the E-value of pair-wise alignment, was used as a surrogate for expected hybridization between corresponding sequences. Using this value as a measure of dissimilarity, sequence redundancy reduction was performed by hierarchical cluster analyses. The choice of how many microarray probes to retain was made based on an index developed for this research: a sequence diversity index (SDI) within a sequence diversity plot (SDP). This index tracked the decreasing within-cluster sequence diversity as the number of clusters increased. For a given stage in the agglomeration procedure, the EST having the highest similarity to all the other sequences within each cluster, the centroid EST, was selected as a microarray probe. A small dataset of ESTs from Atlantic white shrimp (Litopenaeus setiferus) was used to test this algorithm so that the detailed results could be examined. The functional representative level of the selected probes was quantified using Gene Ontology (GO) annotations. Conclusions For organisms with limited genomic information, combining hierarchical clustering methods to analyze ESTs can yield an optimal cDNA microarray design. If

  20. Unsupervised pattern recognition: an introduction to the whys and wherefores of clustering microarray data.

    PubMed

    Boutros, Paul C; Okey, Allan B

    2005-12-01

    Clustering has become an integral part of microarray data analysis and interpretation. The algorithmic basis of clustering -- the application of unsupervised machine-learning techniques to identify the patterns inherent in a data set -- is well established. This review discusses the biological motivations for and applications of these techniques to integrating gene expression data with other biological information, such as functional annotation, promoter data and proteomic data.

  1. Protein Microarrays for the Detection of Biothreats

    NASA Astrophysics Data System (ADS)

    Herr, Amy E.

    Although protein microarrays have proven to be an important tool in proteomics research, the technology is emerging as useful for public health and defense applications. Recent progress in the measurement and characterization of biothreat agents is reviewed in this chapter. Details concerning validation of various protein microarray formats, from contact-printed sandwich assays to supported lipid bilayers, are presented. The reviewed technologies have important implications for in vitro characterization of toxin-ligand interactions, serotyping of bacteria, screening of potential biothreat inhibitors, and as core components of biosensors, among others, research and engineering applications.

  2. The use of microarrays in microbial ecology

    SciTech Connect

    Andersen, G.L.; He, Z.; DeSantis, T.Z.; Brodie, E.L.; Zhou, J.

    2009-09-15

    Microarrays have proven to be a useful and high-throughput method to provide targeted DNA sequence information for up to many thousands of specific genetic regions in a single test. A microarray consists of multiple DNA oligonucleotide probes that, under high stringency conditions, hybridize only to specific complementary nucleic acid sequences (targets). A fluorescent signal indicates the presence and, in many cases, the abundance of genetic regions of interest. In this chapter we will look at how microarrays are used in microbial ecology, especially with the recent increase in microbial community DNA sequence data. Of particular interest to microbial ecologists, phylogenetic microarrays are used for the analysis of phylotypes in a community and functional gene arrays are used for the analysis of functional genes, and, by inference, phylotypes in environmental samples. A phylogenetic microarray that has been developed by the Andersen laboratory, the PhyloChip, will be discussed as an example of a microarray that targets the known diversity within the 16S rRNA gene to determine microbial community composition. Using multiple, confirmatory probes to increase the confidence of detection and a mismatch probe for every perfect match probe to minimize the effect of cross-hybridization by non-target regions, the PhyloChip is able to simultaneously identify any of thousands of taxa present in an environmental sample. The PhyloChip is shown to reveal greater diversity within a community than rRNA gene sequencing due to the placement of the entire gene product on the microarray compared with the analysis of up to thousands of individual molecules by traditional sequencing methods. A functional gene array that has been developed by the Zhou laboratory, the GeoChip, will be discussed as an example of a microarray that dynamically identifies functional activities of multiple members within a community. The recent version of GeoChip contains more than 24,000 50mer

  3. MicroRNA expression profiling using microarrays.

    PubMed

    Love, Cassandra; Dave, Sandeep

    2013-01-01

    MicroRNAs are small noncoding RNAs which are able to regulate gene expression at both the transcriptional and translational levels. There is a growing recognition of the role of microRNAs in nearly every tissue type and cellular process. Thus there is an increasing need for accurate quantitation of microRNA expression in a variety of tissues. Microarrays provide a robust method for the examination of microRNA expression. In this chapter, we describe detailed methods for the use of microarrays to measure microRNA expression and discuss methods for the analysis of microRNA expression data.

  4. Overview of DNA microarrays: types, applications, and their future.

    PubMed

    Bumgarner, Roger

    2013-01-01

    This unit provides an overview of DNA microarrays. Microarrays are a technology in which thousands of nucleic acids are bound to a surface and are used to measure the relative concentration of nucleic acid sequences in a mixture via hybridization and subsequent detection of the hybridization events. This overview first discusses the history of microarrays and the antecedent technologies that led to their development. This is followed by discussion of the methods of manufacture of microarrays and the most common biological applications. The unit ends with a brief description of the limitations of microarrays and discusses how microarrays are being rapidly replaced by DNA sequencing technologies.

  5. Development and application of a microarray meter tool to optimize microarray experiments

    PubMed Central

    Rouse, Richard JD; Field, Katrine; Lapira, Jennifer; Lee, Allen; Wick, Ivan; Eckhardt, Colleen; Bhasker, C Ramana; Soverchia, Laura; Hardiman, Gary

    2008-01-01

    Background Successful microarray experimentation requires a complex interplay between the slide chemistry, the printing pins, the nucleic acid probes and targets, and the hybridization milieu. Optimization of these parameters and a careful evaluation of emerging slide chemistries are a prerequisite to any large scale array fabrication effort. We have developed a 'microarray meter' tool which assesses the inherent variations associated with microarray measurement prior to embarking on large scale projects. Findings The microarray meter consists of nucleic acid targets (reference and dynamic range control) and probe components. Different plate designs containing identical probe material were formulated to accommodate different robotic and pin designs. We examined the variability in probe quality and quantity (as judged by the amount of DNA printed and remaining post-hybridization) using three robots equipped with capillary printing pins. Discussion The generation of microarray data with minimal variation requires consistent quality control of the (DNA microarray) manufacturing and experimental processes. Spot reproducibility is a measure primarily of the variations associated with printing. The microarray meter assesses array quality by measuring the DNA content for every feature. It provides a post-hybridization analysis of array quality by scoring probe performance using three metrics, a) a measure of variability in the signal intensities, b) a measure of the signal dynamic range and c) a measure of variability of the spot morphologies. PMID:18710498

  6. A fast meteor detection algorithm

    NASA Astrophysics Data System (ADS)

    Gural, P.

    2016-01-01

    A low latency meteor detection algorithm for use with fast steering mirrors had been previously developed to track and telescopically follow meteors in real-time (Gural, 2007). It has been rewritten as a generic clustering and tracking software module for meteor detection that meets both the demanding throughput requirements of a Raspberry Pi while also maintaining a high probability of detection. The software interface is generalized to work with various forms of front-end video pre-processing approaches and provides a rich product set of parameterized line detection metrics. Discussion will include the Maximum Temporal Pixel (MTP) compression technique as a fast thresholding option for feeding the detection module, the detection algorithm trade for maximum processing throughput, details on the clustering and tracking methodology, processing products, performance metrics, and a general interface description.

  7. Evolutionary approach for relative gene expression algorithms.

    PubMed

    Czajkowski, Marcin; Kretowski, Marek

    2014-01-01

    A Relative Expression Analysis (RXA) uses ordering relationships in a small collection of genes and is successfully applied to classiffication using microarray data. As checking all possible subsets of genes is computationally infeasible, the RXA algorithms require feature selection and multiple restrictive assumptions. Our main contribution is a specialized evolutionary algorithm (EA) for top-scoring pairs called EvoTSP which allows finding more advanced gene relations. We managed to unify the major variants of relative expression algorithms through EA and introduce weights to the top-scoring pairs. Experimental validation of EvoTSP on public available microarray datasets showed that the proposed solution significantly outperforms in terms of accuracy other relative expression algorithms and allows exploring much larger solution space.

  8. Evolutionary Approach for Relative Gene Expression Algorithms

    PubMed Central

    Czajkowski, Marcin

    2014-01-01

    A Relative Expression Analysis (RXA) uses ordering relationships in a small collection of genes and is successfully applied to classiffication using microarray data. As checking all possible subsets of genes is computationally infeasible, the RXA algorithms require feature selection and multiple restrictive assumptions. Our main contribution is a specialized evolutionary algorithm (EA) for top-scoring pairs called EvoTSP which allows finding more advanced gene relations. We managed to unify the major variants of relative expression algorithms through EA and introduce weights to the top-scoring pairs. Experimental validation of EvoTSP on public available microarray datasets showed that the proposed solution significantly outperforms in terms of accuracy other relative expression algorithms and allows exploring much larger solution space. PMID:24790574

  9. Acquisition, preprocessing, and reconstruction of ultralow dose volumetric CT scout for organ-based CT scan planning

    SciTech Connect

    Yin, Zhye De Man, Bruno; Yao, Yangyang; Wu, Mingye; Montillo, Albert; Edic, Peter M.; Kalra, Mannudeep

    2015-05-15

    Purpose: Traditionally, 2D radiographic preparatory scan images (scout scans) are used to plan diagnostic CT scans. However, a 3D CT volume with a full 3D organ segmentation map could provide superior information for customized scan planning and other purposes. A practical challenge is to design the volumetric scout acquisition and processing steps to provide good image quality (at least good enough to enable 3D organ segmentation) while delivering a radiation dose similar to that of the conventional 2D scout. Methods: The authors explored various acquisition methods, scan parameters, postprocessing methods, and reconstruction methods through simulation and cadaver data studies to achieve an ultralow dose 3D scout while simultaneously reducing the noise and maintaining the edge strength around the target organ. Results: In a simulation study, the 3D scout with the proposed acquisition, preprocessing, and reconstruction strategy provided a similar level of organ segmentation capability as a traditional 240 mAs diagnostic scan, based on noise and normalized edge strength metrics. At the same time, the proposed approach delivers only 1.25% of the dose of a traditional scan. In a cadaver study, the authors’ pictorial-structures based organ localization algorithm successfully located the major abdominal-thoracic organs from the ultralow dose 3D scout obtained with the proposed strategy. Conclusions: The authors demonstrated that images with a similar degree of segmentation capability (interpretability) as conventional dose CT scans can be achieved with an ultralow dose 3D scout acquisition and suitable postprocessing. Furthermore, the authors applied these techniques to real cadaver CT scans with a CTDI dose level of less than 0.1 mGy and successfully generated a 3D organ localization map.

  10. Acquisition, preprocessing, and reconstruction of ultralow dose volumetric CT scout for organ-based CT scan planning.

    PubMed

    Yin, Zhye; Yao, Yangyang; Montillo, Albert; Wu, Mingye; Edic, Peter M; Kalra, Mannudeep; De Man, Bruno

    2015-05-01

    Traditionally, 2D radiographic preparatory scan images (scout scans) are used to plan diagnostic CT scans. However, a 3D CT volume with a full 3D organ segmentation map could provide superior information for customized scan planning and other purposes. A practical challenge is to design the volumetric scout acquisition and processing steps to provide good image quality (at least good enough to enable 3D organ segmentation) while delivering a radiation dose similar to that of the conventional 2D scout. The authors explored various acquisition methods, scan parameters, postprocessing methods, and reconstruction methods through simulation and cadaver data studies to achieve an ultralow dose 3D scout while simultaneously reducing the noise and maintaining the edge strength around the target organ. In a simulation study, the 3D scout with the proposed acquisition, preprocessing, and reconstruction strategy provided a similar level of organ segmentation capability as a traditional 240 mAs diagnostic scan, based on noise and normalized edge strength metrics. At the same time, the proposed approach delivers only 1.25% of the dose of a traditional scan. In a cadaver study, the authors' pictorial-structures based organ localization algorithm successfully located the major abdominal-thoracic organs from the ultralow dose 3D scout obtained with the proposed strategy. The authors demonstrated that images with a similar degree of segmentation capability (interpretability) as conventional dose CT scans can be achieved with an ultralow dose 3D scout acquisition and suitable postprocessing. Furthermore, the authors applied these techniques to real cadaver CT scans with a CTDI dose level of less than 0.1 mGy and successfully generated a 3D organ localization map.

  11. A Combinational Clustering Based Method for cDNA Microarray Image Segmentation.

    PubMed

    Shao, Guifang; Li, Tiejun; Zuo, Wangda; Wu, Shunxiang; Liu, Tundong

    2015-01-01

    Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1) Stanford Microarray Database (SMD), 2) Gene Expression Omnibus (GEO), 3) Baylor College of Medicine (BCM), 4) Swiss Institute of Bioinformatics (SIB), 5) Joe DeRisi's individual tiff files (DeRisi), and 6) University of California, San Francisco (UCSF), indicate that the improved approach is

  12. A Combinational Clustering Based Method for cDNA Microarray Image Segmentation

    PubMed Central

    Shao, Guifang; Li, Tiejun; Zuo, Wangda; Wu, Shunxiang; Liu, Tundong

    2015-01-01

    Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1) Stanford Microarray Database (SMD), 2) Gene Expression Omnibus (GEO), 3) Baylor College of Medicine (BCM), 4) Swiss Institute of Bioinformatics (SIB), 5) Joe DeRisi’s individual tiff files (DeRisi), and 6) University of California, San Francisco (UCSF), indicate that the improved approach is

  13. ArrayQuest: a web resource for the analysis of DNA microarray data

    PubMed Central

    Argraves, Gary L; Jani, Saurin; Barth, Jeremy L; Argraves, W Scott

    2005-01-01

    Background Numerous microarray analysis programs have been created through the efforts of Open Source software development projects. Providing browser-based interfaces that allow these programs to be executed over the Internet enhances the applicability and utility of these analytic software tools. Results Here we present ArrayQuest, a web-based DNA microarray analysis process controller. Key features of ArrayQuest are that (1) it is capable of executing numerous analysis programs such as those written in R, BioPerl and C++; (2) new analysis programs can be added to ArrayQuest Methods Library at the request of users or developers; (3) input DNA microarray data can be selected from public databases (i.e., the Medical University of South Carolina (MUSC) DNA Microarray Database or Gene Expression Omnibus (GEO)) or it can be uploaded to the ArrayQuest center-point web server into a password-protected area; and (4) analysis jobs are distributed across computers configured in a backend cluster. To demonstrate the utility of ArrayQuest we have populated the methods library with methods for analysis of Affymetrix DNA microarray data. Conclusion ArrayQuest enables browser-based implementation of DNA microarray data analysis programs that can be executed on a Linux-based platform. Importantly, ArrayQuest is a platform that will facilitate the distribution and implementation of new analysis algorithms and is therefore of use to both developers of analysis applications as well as users. ArrayQuest is freely available for use at . PMID:16321157

  14. ArrayQuest: a web resource for the analysis of DNA microarray data.

    PubMed

    Argraves, Gary L; Jani, Saurin; Barth, Jeremy L; Argraves, W Scott

    2005-12-01

    Numerous microarray analysis programs have been created through the efforts of Open Source software development projects. Providing browser-based interfaces that allow these programs to be executed over the Internet enhances the applicability and utility of these analytic software tools. Here we present ArrayQuest, a web-based DNA microarray analysis process controller. Key features of ArrayQuest are that (1) it is capable of executing numerous analysis programs such as those written in R, BioPerl and C++; (2) new analysis programs can be added to ArrayQuest Methods Library at the request of users or developers; (3) input DNA microarray data can be selected from public databases (i.e., the Medical University of South Carolina (MUSC) DNA Microarray Database or Gene Expression Omnibus (GEO)) or it can be uploaded to the ArrayQuest center-point web server into a password-protected area; and (4) analysis jobs are distributed across computers configured in a backend cluster. To demonstrate the utility of ArrayQuest we have populated the methods library with methods for analysis of Affymetrix DNA microarray data. ArrayQuest enables browser-based implementation of DNA microarray data analysis programs that can be executed on a Linux-based platform. Importantly, ArrayQuest is a platform that will facilitate the distribution and implementation of new analysis algorithms and is therefore of use to both developers of analysis applications as well as users. ArrayQuest is freely available for use at http://proteogenomics.musc.edu/arrayquest.html.

  15. Preprocessing with Photoshop Software on Microscopic Images of A549 Cells in Epithelial-Mesenchymal Transition.

    PubMed

    Ren, Zhou-Xin; Yu, Hai-Bin; Shen, Jun-Ling; Li, Ya; Li, Jian-Sheng

    2015-06-01

    To establish a preprocessing method for cell morphometry in microscopic images of A549 cells in epithelial-mesenchymal transition (EMT). Adobe Photoshop CS2 (Adobe Systems, Inc.) was used for preprocessing the images. First, all images were processed for size uniformity and high distinguishability between the cell and background area. Then, a blank image with the same size and grids was established and cross points of the grids were added into a distinct color. The blank image was merged into a processed image. In the merged images, the cells with 1 or more cross points were chosen, and then the cell areas were enclosed and were replaced in a distinct color. Except for chosen cellular areas, all areas were changed into a unique hue. Three observers quantified roundness of cells in images with the image preprocess (IPP) or without the method (Controls), respectively. Furthermore, 1 observer measured the roundness 3 times with the 2 methods, respectively. The results between IPPs and Controls were compared for repeatability and reproducibility. As compared with the Control method, among 3 observers, use of the IPP method resulted in a higher number and a higher percentage of same-chosen cells in an image. The relative average deviation values of roundness, either for 3 observers or 1 observer, were significantly higher in Controls than in IPPs (p < 0.01 or 0.001). The values of intraclass correlation coefficient, both in Single Type or Average, were higher in IPPs than in Controls both for 3 observers and 1 observer. Processed with Adobe Photoshop, a chosen cell from an image was more objective, regular, and accurate, creating an increase of reproducibility and repeatability on morphometry of A549 cells in epithelial to mesenchymal transition.

  16. Cloudy Solar Software - Enhanced Capabilities for Finding, Pre-processing, and Visualizing Solar Data

    NASA Astrophysics Data System (ADS)

    Istvan Etesi, Laszlo; Tolbert, K.; Schwartz, R.; Zarro, D.; Dennis, B.; Csillaghy, A.

    2010-05-01

    In our project "Extending the Virtual Solar Observatory (VSO)” we have combined some of the features available in Solar Software (SSW) to produce an integrated environment for data analysis, supporting the complete workflow from data location, retrieval, preparation, and analysis to creating publication-quality figures. Our goal is an integrated analysis experience in IDL, easy-to-use but flexible enough to allow more sophisticated procedures such as multi-instrument analysis. To that end, we have made the transition from a locally oriented setting where all the analysis is done on the user's computer, to an extended analysis environment where IDL has access to services available on the Internet. We have implemented a form of Cloud Computing that uses the VSO search and a new data retrieval and pre-processing server (PrepServer) that provides remote execution of instrument-specific data preparation. We have incorporated the interfaces to the VSO search and the PrepServer into an IDL widget (SHOW_SYNOP) that provides user-friendly searching and downloading of raw solar data and optionally sends search results for pre-processing to the PrepServer prior to downloading the data. The raw and pre-processed data can be displayed with our plotting suite, PLOTMAN, which can handle different data types (light curves, images, and spectra) and perform basic data operations such as zooming, image overlays, solar rotation, etc. PLOTMAN is highly configurable and suited for visual data analysis and for creating publishable figures. PLOTMAN and SHOW_SYNOP work hand-in-hand for a convenient working environment. Our environment supports a growing number of solar instruments that currently includes RHESSI, SOHO/EIT, TRACE, SECCHI/EUVI, HINODE/XRT, and HINODE/EIS.

  17. A Two Dimensional Overlapped Subaperture Polar Format Algorithm Based on Stepped-chirp Signal

    PubMed Central

    Mao, Xinhua; Zhu, Daiyin; Nie, Xin; Zhu, Zhaoda

    2008-01-01

    In this work, a 2-D subaperture polar format algorithm (PFA) based on stepped-chirp signal is proposed. Instead of traditional pulse synthesis preprocessing, the presented method integrates the pulse synthesis process into the range subaperture processing. Meanwhile, due to the multi-resolution property of subaperture processing, this algorithm is able to compensate the space-variant phase error caused by the radar motion during the period of a pulse cluster. Point target simulation has validated the presented algorithm. PMID:27879887

  18. Interest rate prediction: a neuro-hybrid approach with data preprocessing

    NASA Astrophysics Data System (ADS)

    Mehdiyev, Nijat; Enke, David

    2014-07-01

    The following research implements a differential evolution-based fuzzy-type clustering method with a fuzzy inference neural network after input preprocessing with regression analysis in order to predict future interest rates, particularly 3-month T-bill rates. The empirical results of the proposed model is compared against nonparametric models, such as locally weighted regression and least squares support vector machines, along with two linear benchmark models, the autoregressive model and the random walk model. The root mean square error is reported for comparison.

  19. Comparative Evaluation of Preprocessing Freeware on Chromatography/Mass Spectrometry Data for Signature Discovery

    SciTech Connect

    Coble, Jamie B.; Fraga, Carlos G.

    2014-07-07

    Preprocessing software is crucial for the discovery of chemical signatures in metabolomics, chemical forensics, and other signature-focused disciplines that involve analyzing large data sets from chemical instruments. Here, four freely available and published preprocessing tools known as metAlign, MZmine, SpectConnect, and XCMS were evaluated for impurity profiling using nominal mass GC/MS data and accurate mass LC/MS data. Both data sets were previously collected from the analysis of replicate samples from multiple stocks of a nerve-agent precursor. Each of the four tools had their parameters set for the untargeted detection of chromatographic peaks from impurities present in the stocks. The peak table generated by each preprocessing tool was analyzed to determine the number of impurity components detected in all replicate samples per stock. A cumulative set of impurity components was then generated using all available peak tables and used as a reference to calculate the percent of component detections for each tool, in which 100% indicated the detection of every component. For the nominal mass GC/MS data, metAlign performed the best followed by MZmine, SpectConnect, and XCMS with detection percentages of 83, 60, 47, and 42%, respectively. For the accurate mass LC/MS data, the order was metAlign, XCMS, and MZmine with detection percentages of 80, 45, and 35%, respectively. SpectConnect did not function for the accurate mass LC/MS data. Larger detection percentages were obtained by combining the top performer with at least one of the other tools such as 96% by combining metAlign with MZmine for the GC/MS data and 93% by combining metAlign with XCMS for the LC/MS data. In terms of quantitative performance, the reported peak intensities had average absolute biases of 41, 4.4, 1.3 and 1.3% for SpectConnect, metAlign, XCMS, and MZmine, respectively, for the GC/MS data. For the LC/MS data, the average absolute biases were 22, 4.5, and 3.1% for metAlign, MZmine, and XCMS

  20. Computer-assisted bone age assessment: image preprocessing and epiphyseal/metaphyseal ROI extraction.

    PubMed

    Pietka, E; Gertych, A; Pospiech, S; Cao, F; Huang, H K; Gilsanz, V

    2001-08-01

    Clinical assessment of skeletal maturity is based on a visual comparison of a left-hand wrist radiograph with atlas patterns. Using a new digital hand atlas an image analysis methodology is being developed. To assist radiologists in bone age estimation. The analysis starts with a preprocessing function yielding epiphyseal/metaphyseal regions of interest (EMROIs). Then, these regions are subjected to a feature extraction function. Accuracy has been measured independently at three stages of the image analysis: detection of phalangeal tip, extraction of the EMROIs, and location of diameters and lower edge of the EMROIs. Extracted features describe the stage of skeletal development more objectively than visual comparison.

  1. Reservoir computing with a slowly modulated mask signal for preprocessing using a mutually coupled optoelectronic system

    NASA Astrophysics Data System (ADS)

    Tezuka, Miwa; Kanno, Kazutaka; Bunsen, Masatoshi

    2016-08-01

    Reservoir computing is a machine-learning paradigm based on information processing in the human brain. We numerically demonstrate reservoir computing with a slowly modulated mask signal for preprocessing by using a mutually coupled optoelectronic system. The performance of our system is quantitatively evaluated by a chaotic time series prediction task. Our system can produce comparable performance with reservoir computing with a single feedback system and a fast modulated mask signal. We showed that it is possible to slow down the modulation speed of the mask signal by using the mutually coupled system in reservoir computing.

  2. Experimental examination of similarity measures and preprocessing methods used for image registration

    NASA Technical Reports Server (NTRS)

    Svedlow, M.; Mcgillem, C. D.; Anuta, P. E.

    1976-01-01

    The criterion used to measure the similarity between images and thus find the position where the images are registered is examined. The three similarity measures considered are the correlation coefficient, the sum of the absolute differences, and the correlation function. Three basic types of preprocessing are then discussed: taking the magnitude of the gradient of the images, thresholding the images at their medians, and thresholding the magnitude of the gradient of the images at an arbitrary level to be determined experimentally. These multitemporal registration techniques are applied to remote imagery of agricultural areas.

  3. Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.

    PubMed

    Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar

    2016-04-01

    Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data.

  4. Diagnostic Oligonucleotide Microarray Fingerprinting of Bacillus Isolates

    SciTech Connect

    Chandler, Darrell P.; Alferov, Oleg; Chernov, Boris; Daly, Don S.; Golova, Julia; Perov, Alexander N.; Protic, Miroslava; Robison, Richard; Shipma, Matthew; White, Amanda M.; Willse, Alan R.

    2006-01-01

    A diagnostic, genome-independent microbial fingerprinting method using DNA oligonucleotide microarrays was used for high-resolution differentiation between closely related Bacillus strains, including two strains of Bacillus anthracis that are monomorphic (indistinguishable) via amplified fragment length polymorphism fingerprinting techniques. Replicated hybridizations on 391-probe nonamer arrays were used to construct a prototype fingerprint library for quantitative comparisons. Descriptive analysis of the fingerprints, including phylogenetic reconstruction, is consistent with previous taxonomic organization of the genus. Newly developed statistical analysis methods were used to quantitatively compare and objectively confirm apparent differences in microarray fingerprints with the statistical rigor required for microbial forensics and clinical diagnostics. These data suggest that a relatively simple fingerprinting microarray and statistical analysis method can differentiate between species in the Bacillus cereus complex, and between strains of B. anthracis. A synthetic DNA standard was used to understand underlying microarray and process-level variability, leading to specific recommendations for the development of a standard operating procedure and/or continued technology enhancements for microbial forensics and diagnostics.

  5. Analytical Protein Microarrays: Advancements Towards Clinical Applications.

    PubMed

    Sauer, Ursula

    2017-01-29

    Protein microarrays represent a powerful technology with the potential to serve as tools for the detection of a broad range of analytes in numerous applications such as diagnostics, drug development, food safety, and environmental monitoring. Key features of analytical protein microarrays include high throughput and relatively low costs due to minimal reagent consumption, multiplexing, fast kinetics and hence measurements, and the possibility of functional integration. So far, especially fundamental studies in molecular and cell biology have been conducted using protein microarrays, while the potential for clinical, notably point-of-care applications is not yet fully utilized. The question arises what features have to be implemented and what improvements have to be made in order to fully exploit the technology. In the past we have identified various obstacles that have to be overcome in order to promote protein microarray technology in the diagnostic field. Issues that need significant improvement to make the technology more attractive for the diagnostic market are for instance: too low sensitivity and deficiency in reproducibility, inadequate analysis time, lack of high-quality antibodies and validated reagents, lack of automation and portable instruments, and cost of instruments necessary for chip production and read-out. The scope of the paper at hand is to review approaches to solve these problems.

  6. Analytical Protein Microarrays: Advancements Towards Clinical Applications

    PubMed Central

    Sauer, Ursula

    2017-01-01

    Protein microarrays represent a powerful technology with the potential to serve as tools for the detection of a broad range of analytes in numerous applications such as diagnostics, drug development, food safety, and environmental monitoring. Key features of analytical protein microarrays include high throughput and relatively low costs due to minimal reagent consumption, multiplexing, fast kinetics and hence measurements, and the possibility of functional integration. So far, especially fundamental studies in molecular and cell biology have been conducted using protein microarrays, while the potential for clinical, notably point-of-care applications is not yet fully utilized. The question arises what features have to be implemented and what improvements have to be made in order to fully exploit the technology. In the past we have identified various obstacles that have to be overcome in order to promote protein microarray technology in the diagnostic field. Issues that need significant improvement to make the technology more attractive for the diagnostic market are for instance: too low sensitivity and deficiency in reproducibility, inadequate analysis time, lack of high-quality antibodies and validated reagents, lack of automation and portable instruments, and cost of instruments necessary for chip production and read-out. The scope of the paper at hand is to review approaches to solve these problems. PMID:28146048

  7. Microarrays (DNA Chips) for the Classroom Laboratory

    ERIC Educational Resources Information Center

    Barnard, Betsy; Sussman, Michael; BonDurant, Sandra Splinter; Nienhuis, James; Krysan, Patrick

    2006-01-01

    We have developed and optimized the necessary laboratory materials to make DNA microarray technology accessible to all high school students at a fraction of both cost and data size. The primary component is a DNA chip/array that students "print" by hand and then analyze using research tools that have been adapted for classroom use. The…

  8. MICROARRAY DATA ANALYSIS USING MULTIPLE STATISTICAL MODELS

    EPA Science Inventory

    Microarray Data Analysis Using Multiple Statistical Models

    Wenjun Bao1, Judith E. Schmid1, Amber K. Goetz1, Ming Ouyang2, William J. Welsh2,Andrew I. Brooks3,4, ChiYi Chu3,Mitsunori Ogihara3,4, Yinhe Cheng5, David J. Dix1. 1National Health and Environmental Effects Researc...

  9. Raman-based microarray readout: a review.

    PubMed

    Haisch, Christoph

    2016-07-01

    For a quarter of a century, microarrays have been part of the routine analytical toolbox. Label-based fluorescence detection is still the commonest optical readout strategy. Since the 1990s, a continuously increasing number of label-based as well as label-free experiments on Raman-based microarray readout concepts have been reported. This review summarizes the possible concepts and methods and their advantages and challenges. A common label-based strategy is based on the binding of selective receptors as well as Raman reporter molecules to plasmonic nanoparticles in a sandwich immunoassay, which results in surface-enhanced Raman scattering signals of the reporter molecule. Alternatively, capture of the analytes can be performed by receptors on a microarray surface. Addition of plasmonic nanoparticles again leads to a surface-enhanced Raman scattering signal, not of a label but directly of the analyte. This approach is mostly proposed for bacteria and cell detection. However, although many promising readout strategies have been discussed in numerous publications, rarely have any of them made the step from proof of concept to a practical application, let alone routine use. Graphical Abstract Possible realization of a SERS (Surface-Enhanced Raman Scattering) system for microarray readout.

  10. DISC-BASED IMMUNOASSAY MICROARRAYS. (R825433)

    EPA Science Inventory

    Microarray technology as applied to areas that include genomics, diagnostics, environmental, and drug discovery, is an interesting research topic for which different chip-based devices have been developed. As an alternative, we have explored the principle of compact disc-based...

  11. Microarrays (DNA Chips) for the Classroom Laboratory

    ERIC Educational Resources Information Center

    Barnard, Betsy; Sussman, Michael; BonDurant, Sandra Splinter; Nienhuis, James; Krysan, Patrick

    2006-01-01

    We have developed and optimized the necessary laboratory materials to make DNA microarray technology accessible to all high school students at a fraction of both cost and data size. The primary component is a DNA chip/array that students "print" by hand and then analyze using research tools that have been adapted for classroom use. The…

  12. Annotating nonspecific SAGE tags with microarray data.

    PubMed

    Ge, Xijin; Jung, Yong-Chul; Wu, Qingfa; Kibbe, Warren A; Wang, San Ming

    2006-01-01

    SAGE (serial analysis of gene expression) detects transcripts by extracting short tags from the transcripts. Because of the limited length, many SAGE tags are shared by transcripts from different genes. Relying on sequence information in the general gene expression database has limited power to solve this problem due to the highly heterogeneous nature of the deposited sequences. Considering that the complexity of gene expression at a single tissue level should be much simpler than that in the general expression database, we reasoned that by restricting gene expression to tissue level, the accuracy of gene annotation for the nonspecific SAGE tags should be significantly improved. To test the idea, we developed a tissue-specific SAGE annotation database based on microarray data (). This database contains microarray expression information represented as UniGene clusters for 73 normal human tissues and 18 cancer tissues and cell lines. The nonspecific SAGE tag is first matched to the database by the same tissue type used by both SAGE and microarray analysis; then the multiple UniGene clusters assigned to the nonspecific SAGE tag are searched in the database under the matched tissue type. The UniGene cluster presented solely or at higher expression levels in the database is annotated to represent the specific gene for the nonspecific SAGE tags. The accuracy of gene annotation by this database was largely confirmed by experimental data. Our study shows that microarray data provide a useful source for annotating the nonspecific SAGE tags.

  13. Microarray technology: a promising tool in nutrigenomics.

    PubMed

    Masotti, Andrea; Da Sacco, Letizia; Bottazzo, Gian Franco; Alisi, Anna

    2010-08-01

    Microarray technology is a powerful tool for the global evaluation of gene expression profiles in tissues and for understanding many of the factors controlling the regulation of gene transcription. This technique not only provides a considerable amount of information on markers and predictive factors that may potentially characterize a specific clinical picture, but also promises new applications for therapy. One of the most recent applications of microarrays concerns nutritional genomics. Nutritional genomics, known as nutrigenomics, aims to identify and understand mechanisms of molecular interaction between nutrients and/or other dietary bioactive compounds and the genome. Actually, many nutrigenomic studies utilize new approaches such as microarrays, genomics, and bioinformatics to understand how nutrients influence gene expression. The coupling of these new technologies with nutrigenomics promises to lead to improvements in diet and health. In fact, it may provide new information which can be used to ameliorate dietary regimens and to discover novel natural agents for the treatment of important diseases such as diabetes and cancer. This critical review gives an overview of the clinical relevance of a nutritional approach to several important diseases, and proposes the use of microarray for nutrigenomic studies.

  14. DISC-BASED IMMUNOASSAY MICROARRAYS. (R825433)

    EPA Science Inventory

    Microarray technology as applied to areas that include genomics, diagnostics, environmental, and drug discovery, is an interesting research topic for which different chip-based devices have been developed. As an alternative, we have explored the principle of compact disc-based...

  15. MICROARRAY DATA ANALYSIS USING MULTIPLE STATISTICAL MODELS

    EPA Science Inventory

    Microarray Data Analysis Using Multiple Statistical Models

    Wenjun Bao1, Judith E. Schmid1, Amber K. Goetz1, Ming Ouyang2, William J. Welsh2,Andrew I. Brooks3,4, ChiYi Chu3,Mitsunori Ogihara3,4, Yinhe Cheng5, David J. Dix1. 1National Health and Environmental Effects Researc...

  16. Shrinkage covariance matrix approach for microarray data

    NASA Astrophysics Data System (ADS)

    Karjanto, Suryaefiza; Aripin, Rasimah

    2013-04-01

    Microarray technology was developed for the purpose of monitoring the expression levels of thousands of genes. A microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints including the high cost of producing microarray chips. As a result, the widely used standard covariance estimator is not appropriate for this purpose. One such technique is the Hotelling's T2 statistic which is a multivariate test statistic for comparing means between two groups. It requires that the number of observations (n) exceeds the number of genes (p) in the set but in microarray studies it is common that n < p. This leads to a biased estimate of the covariance matrix. In this study, the Hotelling's T2 statistic with the shrinkage approach is proposed to estimate the covariance matrix for testing differential gene expression. The performance of this approach is then compared with other commonly used multivariate tests using a widely analysed diabetes data set as illustrations. The results across the methods are consistent, implying that this approach provides an alternative to existing techniques.

  17. Strategy for the design of custom cDNA microarrays.

    PubMed

    Lorenz, Matthias G O; Cortes, Lizette M; Lorenz, Juergen J; Liu, Edison T

    2003-06-01

    DNA microarrays are valuable but expensive tools for expression profiling of cells, tissues, and organs. The design of custom microarrays leads to cost reduction without necessarily compromising their biological value. Here we present a strategy for designing custom cDNA microarrays and constructed a microarray for mouse immunology research (ImmunoChip). The strategy used interrogates expressed sequence tag databases available in the public domain but overcomes many of the problems encountered. Immunologically relevant clusters were selected based on the expression of expressed sequence tags in relevant libraries. Selected clusters were organized in modules, and the best representative clones were identified. When tested, this microarray was found to have minimal clone identity errors or phage contamination and identified molecular signatures of lymphoid cell lines. Our proposed design of custom microarrays avoids probe redundancy, allows the organization of the chip to optimize chip production, and reduces microarray production costs. The strategy described is also useful for the design of oligonucleotide microarrays.

  18. PRACTICAL STRATEGIES FOR PROCESSING AND ANALYZING SPOTTED OLIGONUCLEOTIDE MICROARRAY DATA

    EPA Science Inventory

    Thoughtful data analysis is as important as experimental design, biological sample quality, and appropriate experimental procedures for making microarrays a useful supplement to traditional toxicology. In the present study, spotted oligonucleotide microarrays were used to profile...

  19. A Method of Microarray Data Storage Using Array Data Type

    PubMed Central

    Tsoi, Lam C.; Zheng, W. Jim

    2009-01-01

    A well-designed microarray database can provide valuable information on gene expression levels. However, designing an efficient microarray database with minimum space usage is not an easy task since designers need to integrate the microarray data with the information of genes, probe annotation, and the descriptions of each microarray experiment. Developing better methods to store microarray data can greatly improve the efficiency and usefulness of such data. A new schema is proposed to store microarray data by using array data type in an object-relational database management system – PostgreSQL. The implemented database can store all the microarray data from the same chip in an array data structure. The variable length array data type in PostgreSQL can store microarray data from same chip. The implementation of our schema can help to increase the data retrieval and space efficiency. PMID:17392028

  20. PRACTICAL STRATEGIES FOR PROCESSING AND ANALYZING SPOTTED OLIGONUCLEOTIDE MICROARRAY DATA

    EPA Science Inventory

    Thoughtful data analysis is as important as experimental design, biological sample quality, and appropriate experimental procedures for making microarrays a useful supplement to traditional toxicology. In the present study, spotted oligonucleotide microarrays were used to profile...

  1. Contour Error Map Algorithm

    NASA Technical Reports Server (NTRS)

    Merceret, Francis; Lane, John; Immer, Christopher; Case, Jonathan; Manobianco, John

    2005-01-01

    The contour error map (CEM) algorithm and the software that implements the algorithm are means of quantifying correlations between sets of time-varying data that are binarized and registered on spatial grids. The present version of the software is intended for use in evaluating numerical weather forecasts against observational sea-breeze data. In cases in which observational data come from off-grid stations, it is necessary to preprocess the observational data to transform them into gridded data. First, the wind direction is gridded and binarized so that D(i,j;n) is the input to CEM based on forecast data and d(i,j;n) is the input to CEM based on gridded observational data. Here, i and j are spatial indices representing 1.25-km intervals along the west-to-east and south-to-north directions, respectively; and n is a time index representing 5-minute intervals. A binary value of D or d = 0 corresponds to an offshore wind, whereas a value of D or d = 1 corresponds to an onshore wind. CEM includes two notable subalgorithms: One identifies and verifies sea-breeze boundaries; the other, which can be invoked optionally, performs an image-erosion function for the purpose of attempting to eliminate river-breeze contributions in the wind fields.

  2. Examining microarray slide quality for the EPA using SNL's hyperspectral microarray scanner.

    SciTech Connect

    Rohde, Rachel M.; Timlin, Jerilyn Ann

    2005-11-01

    This report summarizes research performed at Sandia National Laboratories (SNL) in collaboration with the Environmental Protection Agency (EPA) to assess microarray quality on arrays from two platforms of interest to the EPA. Custom microarrays from two novel, commercially produced array platforms were imaged with SNL's unique hyperspectral imaging technology and multivariate data analysis was performed to investigate sources of emission on the arrays. No extraneous sources of emission were evident in any of the array areas scanned. This led to the conclusions that either of these array platforms could produce high quality, reliable microarray data for the EPA toxicology programs. Hyperspectral imaging results are presented and recommendations for microarray analyses using these platforms are detailed within the report.

  3. Identifying Fishes through DNA Barcodes and Microarrays

    PubMed Central

    Kochzius, Marc; Seidel, Christian; Antoniou, Aglaia; Botla, Sandeep Kumar; Campo, Daniel; Cariani, Alessia; Vazquez, Eva Garcia; Hauschild, Janet; Hervet, Caroline; Hjörleifsdottir, Sigridur; Hreggvidsson, Gudmundur; Kappel, Kristina; Landi, Monica; Magoulas, Antonios; Marteinsson, Viggo; Nölte, Manfred; Planes, Serge; Tinti, Fausto; Turan, Cemal; Venugopal, Moleyur N.; Weber, Hannes; Blohm, Dietmar

    2010-01-01

    Background International fish trade reached an import value of 62.8 billion Euro in 2006, of which 44.6% are covered by the European Union. Species identification is a key problem throughout the life cycle of fishes: from eggs and larvae to adults in fisheries research and control, as well as processed fish products in consumer protection. Methodology/Principal Findings This study aims to evaluate the applicability of the three mitochondrial genes 16S rRNA (16S), cytochrome b (cyt b), and cytochrome oxidase subunit I (COI) for the identification of 50 European marine fish species by combining techniques of “DNA barcoding” and microarrays. In a DNA barcoding approach, neighbour Joining (NJ) phylogenetic trees of 369 16S, 212 cyt b, and 447 COI sequences indicated that cyt b and COI are suitable for unambiguous identification, whereas 16S failed to discriminate closely related flatfish and gurnard species. In course of probe design for DNA microarray development, each of the markers yielded a high number of potentially species-specific probes in silico, although many of them were rejected based on microarray hybridisation experiments. None of the markers provided probes to discriminate the sibling flatfish and gurnard species. However, since 16S-probes were less negatively influenced by the “position of label” effect and showed the lowest rejection rate and the highest mean signal intensity, 16S is more suitable for DNA microarray probe design than cty b and COI. The large portion of rejected COI-probes after hybridisation experiments (>90%) renders the DNA barcoding marker as rather unsuitable for this high-throughput technology. Conclusions/Significance Based on these data, a DNA microarray containing 64 functional oligonucleotide probes for the identification of 30 out of the 50 fish species investigated was developed. It represents the next step towards an automated and easy-to-handle method to identify fish, ichthyoplankton, and fish products. PMID

  4. Design of a covalently bonded glycosphingolipid microarray.

    PubMed

    Arigi, Emma; Blixt, Ola; Buschard, Karsten; Clausen, Henrik; Levery, Steven B

    2012-01-01

    Glycosphingolipids (GSLs) are well known ubiquitous constituents of all eukaryotic cell membranes, yet their normal biological functions are not fully understood. As with other glycoconjugates and saccharides, solid phase display on microarrays potentially provides an effective platform for in vitro study of their functional interactions. However, with few exceptions, the most widely used microarray platforms display only the glycan moiety of GSLs, which not only ignores potential modulating effects of the lipid aglycone, but inherently limits the scope of application, excluding, for example, the major classes of plant and fungal GSLs. In this work, a prototype "universal" GSL-based covalent microarray has been designed, and preliminary evaluation of its potential utility in assaying protein-GSL binding interactions investigated. An essential step in development involved the enzymatic release of the fatty acyl moiety of the ceramide aglycone of selected mammalian GSLs with sphingolipid N-deacylase (SCDase). Derivatization of the free amino group of a typical lyso-GSL, lyso-G(M1), with a prototype linker assembled from succinimidyl-[(N-maleimidopropionamido)-diethyleneglycol] ester and 2-mercaptoethylamine, was also tested. Underivatized or linker-derivatized lyso-GSL were then immobilized on N-hydroxysuccinimide- or epoxide-activated glass microarray slides and probed with carbohydrate binding proteins of known or partially known specificities (i.e., cholera toxin B-chain; peanut agglutinin, a monoclonal antibody to sulfatide, Sulph 1; and a polyclonal antiserum reactive to asialo-G(M2)). Preliminary evaluation of the method indicated successful immobilization of the GSLs, and selective binding of test probes. The potential utility of this methodology for designing covalent microarrays that incorporate GSLs for serodiagnosis is discussed.

  5. Methods in molecular cardiology: microarray technology

    PubMed Central

    van den Bosch, B.; Doevendans, P.A.; Lips, D.; Smeets, H.J.M.

    2003-01-01

    It has become more and more evident that changes in expression levels of genes can play an important role in cardiovascular diseases. Specific gene expression profiles may explain, for example, the pathophysiology of myocardial hypertrophy and pump failure and may provide clues for therapeutic interventions. Knowledge of gene expression patterns can also be applied for diagnostic and prognostic purposes, in which differences in gene activity can be used for classification. DNA microarray technology has become the method of choice to simultaneously study the expression of many different genes in a single assay. Each microarray contains many thousands of different DNA sequences attached to a glass slide. The amount of messenger RNA, which is a measure of gene activity, is compared for each gene on the microarray by labelling the mRNA with different fluorescently labelled nucleotides (Cy3 or Cy5) for the test and reference samples. After hybridisation to the microarray the relative amounts of a particular gene transcript in the two samples can be determined by measuring the signal intensities for the fluorescent groups (Cy3 and Cy5) and calculating signal ratios. This paper describes the development of in-house microarray technology, using commercially available cDNA collections. Several technical approaches will be compared and an overview of the pitfalls and possibilities will be presented. The technology will be explained in the context of our project to determine gene expression differences between normal, hypertrophic and failing heart. ImagesFigure 1Figure 2Figure 3Figure 4Figure 5Figure 6Figure 7Figure 9 PMID:25696214

  6. Density based pruning for identification of differentially expressed genes from microarray data

    PubMed Central

    2010-01-01

    Motivation Identification of differentially expressed genes from microarray datasets is one of the most important analyses for microarray data mining. Popular algorithms such as statistical t-test rank genes based on a single statistics. The false positive rate of these methods can be improved by considering other features of differentially expressed genes. Results We proposed a pattern recognition strategy for identifying differentially expressed genes. Genes are mapped to a two dimension feature space composed of average difference of gene expression and average expression levels. A density based pruning algorithm (DB Pruning) is developed to screen out potential differentially expressed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test, rank product, and fold change. Conclusions Density based pruning of non-differentially expressed genes is an effective method for enhancing statistical testing based algorithms for identifying differentially expressed genes. It improves t-test, rank product, and fold change by 11% to 50% in the numbers of identified true differentially expressed genes. The source code of DB pruning is freely available on our website http://mleg.cse.sc.edu/degprune PMID:21047384

  7. Robust symmetrical number system preprocessing for minimizing encoding errors in photonic analog-to-digital converters

    NASA Astrophysics Data System (ADS)

    Arvizo, Mylene R.; Calusdian, James; Hollinger, Kenneth B.; Pace, Phillip E.

    2011-08-01

    A photonic analog-to-digital converter (ADC) preprocessing architecture based on the robust symmetrical number system (RSNS) is presented. The RSNS preprocessing architecture is a modular scheme in which a modulus number of comparators are used at the output of each Mach-Zehnder modulator channel. The number of comparators with a logic 1 in each channel represents the integer values within each RSNS modulus sequence. When considered together, the integers within each sequence change one at a time at the next code position, resulting in an integer Gray code property. The RSNS ADC has the feature that the maximum nonlinearity is less than a least significant bit (LSB). Although the observed dynamic range (greatest length of combined sequences that contain no ambiguities) of the RSNS ADC is less than the optimum symmetrical number system ADC, the integer Gray code properties make it attractive for error control. A prototype is presented to demonstrate the feasibility of the concept and to show the important RSNS property that the largest nonlinearity is always less than a LSB. Also discussed are practical considerations related to multi-gigahertz implementations.

  8. ITSG-Grace2016 data preprocessing methodologies revisited: impact of using Level-1A data products

    NASA Astrophysics Data System (ADS)

    Klinger, Beate; Mayer-Gürr, Torsten

    2017-04-01

    For the ITSG-Grace2016 release, the gravity field recovery is based on the use of official GRACE (Gravity Recovery and Climate Experiment) Level-1B data products, generated by the Jet Propulsion Laboratory (JPL). Before gravity field recovery, the Level-1B instrument data are preprocessed. This data preprocessing step includes the combination of Level-1B star camera (SCA1B) and angular acceleration (ACC1B) data for an improved attitude determination (sensor fusion), instrument data screening and ACC1B data calibration. Based on a Level-1A test dataset, provided for individual month throughout the GRACE period by the Center of Space Research at the University of Texas at Austin (UTCSR), the impact of using Level-1A instead of Level-1B data products within the ITSG-Grace2016 processing chain is analyzed. We discuss (1) the attitude determination through an optimal combination of SCA1A and ACC1A data using our sensor fusion approach, (2) the impact of the new attitude product on temporal gravity field solutions, and (3) possible benefits of using Level-1A data for instrument data screening and calibration. As the GRACE mission is currently reaching its end-of-life, the presented work aims not only at a better understanding of GRACE science data to reduce the impact of possible error sources on the gravity field recovery, but it also aims at preparing Level-1A data handling capabilities for the GRACE Follow-On mission.

  9. A Technical Review on Biomass Processing: Densification, Preprocessing, Modeling and Optimization

    SciTech Connect

    Jaya Shankar Tumuluru; Christopher T. Wright

    2010-06-01

    It is now a well-acclaimed fact that burning fossil fuels and deforestation are major contributors to climate change. Biomass from plants can serve as an alternative renewable and carbon-neutral raw material for the production of bioenergy. Low densities of 40–60 kg/m3 for lignocellulosic and 200–400 kg/m3 for woody biomass limits their application for energy purposes. Prior to use in energy applications these materials need to be densified. The densified biomass can have bulk densities over 10 times the raw material helping to significantly reduce technical limitations associated with storage, loading and transportation. Pelleting, briquetting, or extrusion processing are commonly used methods for densification. The aim of the present research is to develop a comprehensive review of biomass processing that includes densification, preprocessing, modeling and optimization. The specific objective include carrying out a technical review on (a) mechanisms of particle bonding during densification; (b) methods of densification including extrusion, briquetting, pelleting, and agglomeration; (c) effects of process and feedstock variables and biomass biochemical composition on the densification (d) effects of preprocessing such as grinding, preheating, steam explosion, and torrefaction on biomass quality and binding characteristics; (e) models for understanding the compression characteristics; and (f) procedures for response surface modeling and optimization.

  10. MODIStsp: An R package for automatic preprocessing of MODIS Land Products time series

    NASA Astrophysics Data System (ADS)

    Busetto, L.; Ranghetti, L.

    2016-12-01

    MODIStsp is a new R package allowing automating the creation of raster time series derived from MODIS Land Products. It allows performing several preprocessing steps (e.g. download, mosaicing, reprojection and resize) on MODIS products on a selected time period and area. All processing parameters can be set with a user-friendly GUI, allowing users to select which specific layers of the original MODIS HDF files have to be processed and which Quality Indicators have to be extracted from the aggregated MODIS Quality Assurance layers. Moreover, the tool allows on-the-fly computation of time series of Spectral Indexes (either standard or custom-specified by the user through the GUI) from surface reflectance bands. Outputs are saved as single-band rasters corresponding to each available acquisition date and output layer. Virtual files allowing easy access to the entire time series as a single file using common image processing/GIS software or R scripts can be also created. Non-interactive execution within an R script and stand-alone execution outside an R environment exploiting a previously created Options File are also possible, the latter allowing scheduling execution of MODIStsp to automatically update a time series when a new image is available. The proposed software constitutes a very useful tool for the Remote Sensing community, since it allows performing all the main preprocessing steps required for the creation of time series of MODIS data within a common framework, and without requiring any particular programming skills by its users.

  11. Star sensor image acquisition and preprocessing hardware system based on CMOS image sensor and FGPA

    NASA Astrophysics Data System (ADS)

    Hao, Xuetao; Jiang, Jie; Zhang, Guangjun

    2003-09-01

    Star Sensor is an avionics instrument used to provide the absolute 3-axis attitude of a spacecraft utilizing star observations. It consists of an electronic camera and associated processing electronics. As outcome of advancing state-of-the-art, new generation star sensor features faster, lower cost, power dissipation and size than the first generation star sensor. This paper describes a star sensor anterior image acquisition and pre-processing hardware system based on CMOS image-sensor and FPGA technology. Practically, star images are produced by a simple simulator on PC, acquired by CMOS image sensor, pre-processed by FPGA, saved in SRAM, read out by EPP protocol and validated by an image process software on PC. The hardware part of system acquires images thought CMOS image-sensor controlled by FPGA, then processes image data by a circuit module of FPGA, and save images to SRAM for test. Basic image data for star recognition and attitude determination of spacecrafts are provided by it. As an important reference for developing star sensor prototype, the system validates the performance advantages of new generation star sensor.

  12. Applying Enhancement Filters in the Pre-processing of Images of Lymphoma

    NASA Astrophysics Data System (ADS)

    Henrique Silva, Sérgio; Zanchetta do Nascimento, Marcelo; Alves Neves, Leandro; Ramos Batista, Valério

    2015-01-01

    Lymphoma is a type of cancer that affects the immune system, and is classified as Hodgkin or non-Hodgkin. It is one of the ten types of cancer that are the most common on earth. Among all malignant neoplasms diagnosed in the world, lymphoma ranges from three to four percent of them. Our work presents a study of some filters devoted to enhancing images of lymphoma at the pre-processing step. Here the enhancement is useful for removing noise from the digital images. We have analysed the noise caused by different sources like room vibration, scraps and defocusing, and in the following classes of lymphoma: follicular, mantle cell and B-cell chronic lymphocytic leukemia. The filters Gaussian, Median and Mean-Shift were applied to different colour models (RGB, Lab and HSV). Afterwards, we performed a quantitative analysis of the images by means of the Structural Similarity Index. This was done in order to evaluate the similarity between the images. In all cases we have obtained a certainty of at least 75%, which rises to 99% if one considers only HSV. Namely, we have concluded that HSV is an important choice of colour model at pre-processing histological images of lymphoma, because in this case the resulting image will get the best enhancement.

  13. Preprocessing of Hinode/SOT Vector Magnetograms for Nonlinear Force-Free Coronal Magnetic Field Modeling

    NASA Astrophysics Data System (ADS)

    Wiegelmann, T.; Thalmann, J. K.; Schrijver, C. J.; De Rosa, M. L.; Metcalf, T. R.

    2008-09-01

    The solar magnetic field is key to understanding the physical processes in the solar atmosphere. Nonlinear force-free codes have been shown to be useful in extrapolating the coronal field from underlying vector boundary data (for an overview see Schrijver et al. (2006)). However, we can only measure the magnetic field vector routinely with high accuracy in the photosphere with, e.g., Hinode/SOT, and unfortunately these data do not fulfill the force-free consistency condition as defined by Aly (1989). We must therefore apply some transformations to these data before nonlinear force-free extrapolation codes can be legitimately applied. To this end, we have developed a minimization procedure that uses the measured photospheric field vectors as input to approximate a more chromospheric like field (The method was dubbed preprocessing. See Wiegelmann et al. (2006) for details). The procedure includes force-free consistency integrals and spatial smoothing. The method has been intensively tested with model active regions (see Metcalf et al. 2008) and been applied to several ground based vector magnetogram data before. Here we apply the preprocessing program to photospheric magnetic field measurements with the Hinode/SOT instrument.

  14. Preprocessing of A-scan GPR data based on energy features

    NASA Astrophysics Data System (ADS)

    Dogan, Mesut; Turhan-Sayan, Gonul

    2016-05-01

    There is an increasing demand for noninvasive real-time detection and classification of buried objects in various civil and military applications. The problem of detection and annihilation of landmines is particularly important due to strong safety concerns. The requirement for a fast real-time decision process is as important as the requirements for high detection rates and low false alarm rates. In this paper, we introduce and demonstrate a computationally simple, timeefficient, energy-based preprocessing approach that can be used in ground penetrating radar (GPR) applications to eliminate reflections from the air-ground boundary and to locate the buried objects, simultaneously, at one easy step. The instantaneous power signals, the total energy values and the cumulative energy curves are extracted from the A-scan GPR data. The cumulative energy curves, in particular, are shown to be useful to detect the presence and location of buried objects in a fast and simple way while preserving the spectral content of the original A-scan data for further steps of physics-based target classification. The proposed method is demonstrated using the GPR data collected at the facilities of IPA Defense, Ankara at outdoor test lanes. Cylindrically shaped plastic containers were buried in fine-medium sand to simulate buried landmines. These plastic containers were half-filled by ammonium nitrate including metal pins. Results of this pilot study are demonstrated to be highly promising to motivate further research for the use of energy-based preprocessing features in landmine detection problem.

  15. Fast data preprocessing with Graphics Processing Units for inverse problem solving in light-scattering measurements

    NASA Astrophysics Data System (ADS)

    Derkachov, G.; Jakubczyk, T.; Jakubczyk, D.; Archer, J.; Woźniak, M.

    2017-07-01

    Utilising Compute Unified Device Architecture (CUDA) platform for Graphics Processing Units (GPUs) enables significant reduction of computation time at a moderate cost, by means of parallel computing. In the paper [Jakubczyk et al., Opto-Electron. Rev., 2016] we reported using GPU for Mie scattering inverse problem solving (up to 800-fold speed-up). Here we report the development of two subroutines utilising GPU at data preprocessing stages for the inversion procedure: (i) A subroutine, based on ray tracing, for finding spherical aberration correction function. (ii) A subroutine performing the conversion of an image to a 1D distribution of light intensity versus azimuth angle (i.e. scattering diagram), fed from a movie-reading CPU subroutine running in parallel. All subroutines are incorporated in PikeReader application, which we make available on GitHub repository. PikeReader returns a sequence of intensity distributions versus a common azimuth angle vector, corresponding to the recorded movie. We obtained an overall ∼ 400 -fold speed-up of calculations at data preprocessing stages using CUDA codes running on GPU in comparison to single thread MATLAB-only code running on CPU.

  16. Pre-Processing and Cross-Correlation Techniques for Time-Distance Helioseismology

    NASA Astrophysics Data System (ADS)

    Wang, N.; de Ridder, S.; Zhao, J.

    2014-12-01

    In chaotic wave fields excited by a random distribution of noise sources a cross-correlation of the recordings made at two stations yield the interstation wave-field response. After early successes in helioseismology, laboratory studies and earth-seismology, this technique found broad application in global and regional seismology. This development came with an increasing understanding of pre-processing and cross-correlation workflows to yield an optimal signal-to-noise ratio (SNR). Helioseismologist rely heavily on stacking to increase the SNR. Until now, they have not studied different spectral-whitening and cross-correlation workflows and relies heavily on stacking to increase the SNR. The recordings vary considerably between sunspots and regular portions of the sun. Within the sunspot the periodic effects of the observation satellite orbit are difficult to remove. We remove a running alpha-mean from the data and apply a soft clip to deal with data glitches. The recordings contain energy of both flow and waves. A frequency domain filter selects the wave energy. Then the data is input to several pre-processing and cross-correlation techniques, common to earth seismology. We anticipate that spectral whitening will flatten the energy spectrum of the cross-correlations. We also expect that the cross-correlations converge faster to their expected value when the data is processed over overlapping windows. The result of this study are expected to aid in decreasing the stacking while maintaining good SNR.

  17. CMS Preprocessing Subsystem user`s guide. Software version 1.2

    SciTech Connect

    Didier, B.T.; Gash, J.D.; Greitzer, F.L.; Havre, S.L.; Ramsdell, J.V.; Turney, C.R.

    1993-10-01

    The Common Mapping Standard (CMS) Data Production System (CDPS) produces and distributes CMS data in compliance with the Common Mapping Standard Interface Control Document, Revision 2.2. Historically, tactical mission planning systems have been the primary clients of CMS data. CDPS is composed of two subsystems, the CMS Preprocessing Subsystem (CPS) and the CMS Distribution Subsystem (CDS). This guide describes the operation of CPS, which is responsible for the management of source data and the production of CMS data from source data. The CPS system was developed for use on a workstation running Ultrix 4.2, and X Window System Version X11R4, and Motif Version 1.1. This subsystem is organized into four major functional groups: CPS Executive; Manage Source Data; Manage CMS Data Preprocessing; and CPS System Utilities. CPS supports the production of CMS data from the following source chart, image, and elevation data products: Global Navigation Chart; Jet Navigation Chart; Operational Navigation Chart; Tactical Pilotage Chart; Joint Operations Graphics-Air; Topographic Line Map; ARC Digital Raster Imagery; Digital Terrain Elevation Data (Level 1); and Low Flying Chart.

  18. The Image Registration of Fourier-Mellin Based on the Combination of Projection and Gradient Preprocessing

    NASA Astrophysics Data System (ADS)

    Gao, D.; Zhao, X.; Pan, X.

    2017-09-01

    Image registration is one of the most important applications in the field of image processing. The method of Fourier Merlin transform, which has the advantages of high precision and good robustness to change in light and shade, partial blocking, noise influence and so on, is widely used. However, not only this method can't obtain the unique mutual power pulse function for non-parallel image pairs, even part of image pairs also can't get the mutual power function pulse. In this paper, an image registration method based on Fourier-Mellin transformation in the view of projection-gradient preprocessing is proposed. According to the projection conformational equation, the method calculates the matrix of image projection transformation to correct the tilt image; then, gradient preprocessing and Fourier-Mellin transformation are performed on the corrected image to obtain the registration parameters. Eventually,