Science.gov

Sample records for microarray preprocessing algorithms

  1. Micro-Analyzer: automatic preprocessing of Affymetrix microarray data.

    PubMed

    Guzzi, Pietro Hiram; Cannataro, Mario

    2013-08-01

    A current trend in genomics is the investigation of the cell mechanism using different technologies, in order to explain the relationship among genes, molecular processes and diseases. For instance, the combined use of gene-expression arrays and genomic arrays has been demonstrated as an effective instrument in clinical practice. Consequently, in a single experiment different kind of microarrays may be used, resulting in the production of different types of binary data (images and textual raw data). The analysis of microarray data requires an initial preprocessing phase, that makes raw data suitable for use on existing analysis platforms, such as the TIGR M4 (TM4) Suite. An additional challenge to be faced by emerging data analysis platforms is the ability to treat in a combined way those different microarray formats coupled with clinical data. In fact, resulting integrated data may include both numerical and symbolic data (e.g. gene expression and SNPs regarding molecular data), as well as temporal data (e.g. the response to a drug, time to progression and survival rate), regarding clinical data. Raw data preprocessing is a crucial step in analysis but is often performed in a manual and error prone way using different software tools. Thus novel, platform independent, and possibly open source tools enabling the semi-automatic preprocessing and annotation of different microarray data are needed. The paper presents Micro-Analyzer (Microarray Analyzer), a cross-platform tool for the automatic normalization, summarization and annotation of Affymetrix gene expression and SNP binary data. It represents the evolution of the μ-CS tool, extending the preprocessing to SNP arrays that were not allowed in μ-CS. The Micro-Analyzer is provided as a Java standalone tool and enables users to read, preprocess and analyse binary microarray data (gene expression and SNPs) by invoking TM4 platform. It avoids: (i) the manual invocation of external tools (e.g. the Affymetrix Power

  2. Impact of Microarray Preprocessing Techniques in Unraveling Biological Pathways.

    PubMed

    Deandrés-Galiana, Enrique J; Fernández-Martínez, Juan Luis; Saligan, Leorey N; Sonis, Stephen T

    2016-12-01

    To better understand the impact of microarray preprocessing normalization techniques on the analysis of biological pathways in the prediction of chronic fatigue (CF) following radiation therapy, this study has compared the list of predictive genes found using the Robust Multiarray Averaging (RMA) and the Affymetrix MAS5 method, with the list that is obtained working with raw data (without any preprocessing). First, we modeled the spiked-in data set where differentially expressed genes were known and spiked-in at different known concentrations, showing that the precisions established by different gene ranking methods were higher than working with raw data. The results obtained from the spiked-in experiment were extrapolated to the CF data set to run learning and blind validation. RMA and MAS5 provided different sets of discriminatory genes that have a higher predictive accuracy in the learning phase, but lower predictive accuracy during the blind validation phase, suggesting that the genetic signatures generated using both preprocessing techniques cannot be generalizable. The pathways found using the raw data set better described what is a priori known for the CF disease. Besides, RMA produced more reliable pathways than MAS5. Understanding the strengths of these two preprocessing techniques in phenotype prediction is critical for precision medicine. Particularly, this article concludes that biological pathways might be better unraveled working with raw expression data. Moreover, the interpretation of the predictive gene profiles generated by RMA and MAS5 should be done with caution. This is an important conclusion with a high translational impact that should be confirmed in other disease data sets.

  3. Fully Automated Complementary DNA Microarray Segmentation using a Novel Fuzzy-based Algorithm.

    PubMed

    Saberkari, Hamidreza; Bahrami, Sheyda; Shamsi, Mousa; Amoshahy, Mohammad Javad; Ghavifekr, Habib Badri; Sedaaghi, Mohammad Hossein

    2015-01-01

    DNA microarray is a powerful approach to study simultaneously, the expression of 1000 of genes in a single experiment. The average value of the fluorescent intensity could be calculated in a microarray experiment. The calculated intensity values are very close in amount to the levels of expression of a particular gene. However, determining the appropriate position of every spot in microarray images is a main challenge, which leads to the accurate classification of normal and abnormal (cancer) cells. In this paper, first a preprocessing approach is performed to eliminate the noise and artifacts available in microarray cells using the nonlinear anisotropic diffusion filtering method. Then, the coordinate center of each spot is positioned utilizing the mathematical morphology operations. Finally, the position of each spot is exactly determined through applying a novel hybrid model based on the principle component analysis and the spatial fuzzy c-means clustering (SFCM) algorithm. Using a Gaussian kernel in SFCM algorithm will lead to improving the quality in complementary DNA microarray segmentation. The performance of the proposed algorithm has been evaluated on the real microarray images, which is available in Stanford Microarray Databases. Results illustrate that the accuracy of microarray cells segmentation in the proposed algorithm reaches to 100% and 98% for noiseless/noisy cells, respectively.

  4. Fully Automated Complementary DNA Microarray Segmentation using a Novel Fuzzy-based Algorithm

    PubMed Central

    Saberkari, Hamidreza; Bahrami, Sheyda; Shamsi, Mousa; Amoshahy, Mohammad Javad; Ghavifekr, Habib Badri; Sedaaghi, Mohammad Hossein

    2015-01-01

    DNA microarray is a powerful approach to study simultaneously, the expression of 1000 of genes in a single experiment. The average value of the fluorescent intensity could be calculated in a microarray experiment. The calculated intensity values are very close in amount to the levels of expression of a particular gene. However, determining the appropriate position of every spot in microarray images is a main challenge, which leads to the accurate classification of normal and abnormal (cancer) cells. In this paper, first a preprocessing approach is performed to eliminate the noise and artifacts available in microarray cells using the nonlinear anisotropic diffusion filtering method. Then, the coordinate center of each spot is positioned utilizing the mathematical morphology operations. Finally, the position of each spot is exactly determined through applying a novel hybrid model based on the principle component analysis and the spatial fuzzy c-means clustering (SFCM) algorithm. Using a Gaussian kernel in SFCM algorithm will lead to improving the quality in complementary DNA microarray segmentation. The performance of the proposed algorithm has been evaluated on the real microarray images, which is available in Stanford Microarray Databases. Results illustrate that the accuracy of microarray cells segmentation in the proposed algorithm reaches to 100% and 98% for noiseless/noisy cells, respectively. PMID:26284175

  5. Genetic Algorithm for Optimization: Preprocessing with n Dimensional Bisection and Error Estimation

    NASA Technical Reports Server (NTRS)

    Sen, S. K.; Shaykhian, Gholam Ali

    2006-01-01

    A knowledge of the appropriate values of the parameters of a genetic algorithm (GA) such as the population size, the shrunk search space containing the solution, crossover and mutation probabilities is not available a priori for a general optimization problem. Recommended here is a polynomial-time preprocessing scheme that includes an n-dimensional bisection and that determines the foregoing parameters before deciding upon an appropriate GA for all problems of similar nature and type. Such a preprocessing is not only fast but also enables us to get the global optimal solution and its reasonably narrow error bounds with a high degree of confidence.

  6. Cancer microarray data feature selection using multi-objective binary particle swarm optimization algorithm.

    PubMed

    Annavarapu, Chandra Sekhara Rao; Dara, Suresh; Banka, Haider

    2016-01-01

    Cancer investigations in microarray data play a major role in cancer analysis and the treatment. Cancer microarray data consists of complex gene expressed patterns of cancer. In this article, a Multi-Objective Binary Particle Swarm Optimization (MOBPSO) algorithm is proposed for analyzing cancer gene expression data. Due to its high dimensionality, a fast heuristic based pre-processing technique is employed to reduce some of the crude domain features from the initial feature set. Since these pre-processed and reduced features are still high dimensional, the proposed MOBPSO algorithm is used for finding further feature subsets. The objective functions are suitably modeled by optimizing two conflicting objectives i.e., cardinality of feature subsets and distinctive capability of those selected subsets. As these two objective functions are conflicting in nature, they are more suitable for multi-objective modeling. The experiments are carried out on benchmark gene expression datasets, i.e., Colon, Lymphoma and Leukaemia available in literature. The performance of the selected feature subsets with their classification accuracy and validated using 10 fold cross validation techniques. A detailed comparative study is also made to show the betterment or competitiveness of the proposed algorithm.

  7. Cancer Classification in Microarray Data using a Hybrid Selective Independent Component Analysis and υ-Support Vector Machine Algorithm

    PubMed Central

    Saberkari, Hamidreza; Shamsi, Mousa; Joroughi, Mahsa; Golabi, Faegheh; Sedaaghi, Mohammad Hossein

    2014-01-01

    Microarray data have an important role in identification and classification of the cancer tissues. Having a few samples of microarrays in cancer researches is always one of the most concerns which lead to some problems in designing the classifiers. For this matter, preprocessing gene selection techniques should be utilized before classification to remove the noninformative genes from the microarray data. An appropriate gene selection method can significantly improve the performance of cancer classification. In this paper, we use selective independent component analysis (SICA) for decreasing the dimension of microarray data. Using this selective algorithm, we can solve the instability problem occurred in the case of employing conventional independent component analysis (ICA) methods. First, the reconstruction error and selective set are analyzed as independent components of each gene, which have a small part in making error in order to reconstruct new sample. Then, some of the modified support vector machine (υ-SVM) algorithm sub-classifiers are trained, simultaneously. Eventually, the best sub-classifier with the highest recognition rate is selected. The proposed algorithm is applied on three cancer datasets (leukemia, breast cancer and lung cancer datasets), and its results are compared with other existing methods. The results illustrate that the proposed algorithm (SICA + υ-SVM) has higher accuracy and validity in order to increase the classification accuracy. Such that, our proposed algorithm exhibits relative improvements of 3.3% in correctness rate over ICA + SVM and SVM algorithms in lung cancer dataset. PMID:25426433

  8. Cancer Classification in Microarray Data using a Hybrid Selective Independent Component Analysis and υ-Support Vector Machine Algorithm.

    PubMed

    Saberkari, Hamidreza; Shamsi, Mousa; Joroughi, Mahsa; Golabi, Faegheh; Sedaaghi, Mohammad Hossein

    2014-10-01

    Microarray data have an important role in identification and classification of the cancer tissues. Having a few samples of microarrays in cancer researches is always one of the most concerns which lead to some problems in designing the classifiers. For this matter, preprocessing gene selection techniques should be utilized before classification to remove the noninformative genes from the microarray data. An appropriate gene selection method can significantly improve the performance of cancer classification. In this paper, we use selective independent component analysis (SICA) for decreasing the dimension of microarray data. Using this selective algorithm, we can solve the instability problem occurred in the case of employing conventional independent component analysis (ICA) methods. First, the reconstruction error and selective set are analyzed as independent components of each gene, which have a small part in making error in order to reconstruct new sample. Then, some of the modified support vector machine (υ-SVM) algorithm sub-classifiers are trained, simultaneously. Eventually, the best sub-classifier with the highest recognition rate is selected. The proposed algorithm is applied on three cancer datasets (leukemia, breast cancer and lung cancer datasets), and its results are compared with other existing methods. The results illustrate that the proposed algorithm (SICA + υ-SVM) has higher accuracy and validity in order to increase the classification accuracy. Such that, our proposed algorithm exhibits relative improvements of 3.3% in correctness rate over ICA + SVM and SVM algorithms in lung cancer dataset.

  9. Experimental evaluation of video preprocessing algorithms for automatic target hand-off

    NASA Astrophysics Data System (ADS)

    McIngvale, P. H.; Guyton, R. D.

    It is pointed out that the Automatic Target Hand-Off Correlator (ATHOC) hardware has been modified to permit operation in a nonreal-time mode as a programmable laboratory test unit using video recordings as inputs and allowing several preprocessing algorithms to be software programmable. In parallel with this hardware modification effort, an analysis and simulation effort has been underway to help determine which of the many available preprocessing algorithms should be implemented in the ATHOC software. It is noted that videotapes from a current technology airborne target acquisition system and an imaging infrared missile seeker were recorded and used in the laboratory experiments. These experiments are described and the results are presented. A set of standard parameters is found for each case. Consideration of the background in the target scene is found to be important. Analog filter cutoff frequencies of 2.5 MHz for low pass and 300 kHz for high pass are found to give best results. EPNC = 1 is found to be slightly better than EPNC = 0. It is also shown that trilevel gives better results than bilevel.

  10. A Data Preprocessing Algorithm for Classification Model Based On Rough Sets

    NASA Astrophysics Data System (ADS)

    Xiang-wei, Li; Yian-fang, Qi

    Aimed to solve the limitation of abundant data to constructing classification modeling in data mining, the paper proposed a novel effective preprocessing algorithm based on rough sets. Firstly, we construct the relation Information System using original data sets. Secondly, make use of attribute reduction theory of Rough sets to produce the Core of Information System. Core is the most important and necessary information which cannot reduce in original Information System. So it can get a same effect as original data sets to data analysis, and can construct classification modeling using it. Thirdly, construct indiscernibility matrix using reduced Information System, and finally, get the classification of original data sets. Compared to existing techniques, the developed algorithm enjoy following advantages: (1) avoiding the abundant data in follow-up data processing, and (2) avoiding large amount of computation in whole data mining process. (3) The results become more effective because of introducing the attributes reducing theory of Rough Sets.

  11. 3-D image pre-processing algorithms for improved automated tracing of neuronal arbors.

    PubMed

    Narayanaswamy, Arunachalam; Wang, Yu; Roysam, Badrinath

    2011-09-01

    The accuracy and reliability of automated neurite tracing systems is ultimately limited by image quality as reflected in the signal-to-noise ratio, contrast, and image variability. This paper describes a novel combination of image processing methods that operate on images of neurites captured by confocal and widefield microscopy, and produce synthetic images that are better suited to automated tracing. The algorithms are based on the curvelet transform (for denoising curvilinear structures and local orientation estimation), perceptual grouping by scalar voting (for elimination of non-tubular structures and improvement of neurite continuity while preserving branch points), adaptive focus detection, and depth estimation (for handling widefield images without deconvolution). The proposed methods are fast, and capable of handling large images. Their ability to handle images of unlimited size derives from automated tiling of large images along the lateral dimension, and processing of 3-D images one optical slice at a time. Their speed derives in part from the fact that the core computations are formulated in terms of the Fast Fourier Transform (FFT), and in part from parallel computation on multi-core computers. The methods are simple to apply to new images since they require very few adjustable parameters, all of which are intuitive. Examples of pre-processing DIADEM Challenge images are used to illustrate improved automated tracing resulting from our pre-processing methods.

  12. DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach

    NASA Astrophysics Data System (ADS)

    Tchagang, Alain B.; Tewfik, Ahmed H.

    2006-12-01

    Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNA microarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray experimental data for example, the goal of biclustering algorithms is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this study, we develop novel biclustering algorithms using basic linear algebra and arithmetic tools. The proposed biclustering algorithms can be used to search for all biclusters with constant values, biclusters with constant values on rows, biclusters with constant values on columns, and biclusters with coherent values from a set of data in a timely manner and without solving any optimization problem. We also show how one of the proposed biclustering algorithms can be adapted to identify biclusters with coherent evolution. The algorithms developed in this study discover all valid biclusters of each type, while almost all previous biclustering approaches will miss some.

  13. Clustering Short Time-Series Microarray

    NASA Astrophysics Data System (ADS)

    Ping, Loh Wei; Hasan, Yahya Abu

    2008-01-01

    Most microarray analyses are carried out on static gene expressions. However, the dynamical study of microarrays has lately gained more attention. Most researches on time-series microarray emphasize on the bioscience and medical aspects but few from the numerical aspect. This study attempts to analyze short time-series microarray mathematically using STEM clustering tool which formally preprocess data followed by clustering. We next introduce the Circular Mould Distance (CMD) algorithm with combinations of both preprocessing and clustering analysis. Both methods are subsequently compared in terms of efficiencies.

  14. Microarrays

    ERIC Educational Resources Information Center

    Plomin, Robert; Schalkwyk, Leonard C.

    2007-01-01

    Microarrays are revolutionizing genetics by making it possible to genotype hundreds of thousands of DNA markers and to assess the expression (RNA transcripts) of all of the genes in the genome. Microarrays are slides the size of a postage stamp that contain millions of DNA sequences to which single-stranded DNA or RNA can hybridize. This…

  15. Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

    NASA Astrophysics Data System (ADS)

    Qin, Cheng-Zhi; Zhan, Lijun

    2012-06-01

    As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU

  16. Fast-SNP: a fast matrix pre-processing algorithm for efficient loopless flux optimization of metabolic models

    PubMed Central

    Saa, Pedro A.; Nielsen, Lars K.

    2016-01-01

    Motivation: Computation of steady-state flux solutions in large metabolic models is routinely performed using flux balance analysis based on a simple LP (Linear Programming) formulation. A minimal requirement for thermodynamic feasibility of the flux solution is the absence of internal loops, which are enforced using ‘loopless constraints’. The resulting loopless flux problem is a substantially harder MILP (Mixed Integer Linear Programming) problem, which is computationally expensive for large metabolic models. Results: We developed a pre-processing algorithm that significantly reduces the size of the original loopless problem into an easier and equivalent MILP problem. The pre-processing step employs a fast matrix sparsification algorithm—Fast- sparse null-space pursuit (SNP)—inspired by recent results on SNP. By finding a reduced feasible ‘loop-law’ matrix subject to known directionalities, Fast-SNP considerably improves the computational efficiency in several metabolic models running different loopless optimization problems. Furthermore, analysis of the topology encoded in the reduced loop matrix enabled identification of key directional constraints for the potential permanent elimination of infeasible loops in the underlying model. Overall, Fast-SNP is an effective and simple algorithm for efficient formulation of loop-law constraints, making loopless flux optimization feasible and numerically tractable at large scale. Availability and Implementation: Source code for MATLAB including examples is freely available for download at http://www.aibn.uq.edu.au/cssb-resources under Software. Optimization uses Gurobi, CPLEX or GLPK (the latter is included with the algorithm). Contact: lars.nielsen@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27559155

  17. LANDSAT data preprocessing

    NASA Technical Reports Server (NTRS)

    Austin, W. W.

    1983-01-01

    The effect on LANDSAT data of a Sun angle correction, an intersatellite LANDSAT-2 and LANDSAT-3 data range adjustment, and the atmospheric correction algorithm was evaluated. Fourteen 1978 crop year LACIE sites were used as the site data set. The preprocessing techniques were applied to multispectral scanner channel data and transformed data were plotted and used to analyze the effectiveness of the preprocessing techniques. Ratio transformations effectively reduce the need for preprocessing techniques to be applied directly to the data. Subtractive transformations are more sensitive to Sun angle and atmospheric corrections than ratios. Preprocessing techniques, other than those applied at the Goddard Space Flight Center, should only be applied as an option of the user. While performed on LANDSAT data the study results are also applicable to meteorological satellite data.

  18. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

    PubMed

    Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A

    2015-06-01

    Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.

  19. Improved Data Preprocessing Algorithm for Time-Domain Induced Polarization Method with Digital Notch Filter

    NASA Astrophysics Data System (ADS)

    Ge, Shuang-Chao; Deng, Ming; Chen, Kai; Li, Bin; Li, Yuan

    2016-12-01

    Time-domain induced polarization (TDIP) measurement is seriously affected by power line interference and other field noise. Moreover, existing TDIP instruments generally output only the apparent chargeability, without providing complete secondary field information. To increase the robustness of TDIP method against interference and obtain more detailed secondary field information, an improved dataprocessing algorithm is proposed here. This method includes an efficient digital notch filter which can effectively eliminate all the main components of the power line interference. Hardware model of this filter was constructed and Vhsic Hardware Description Language code for it was generated using Digital Signal Processor Builder. In addition, a time-location method was proposed to extract secondary field information in case of unexpected data loss or failure of the synchronous technologies. Finally, the validity and accuracy of the method and the notch filter were verified by using the Cole-Cole model implemented by SIMULINK software. Moreover, indoor and field tests confirmed the application effect of the algorithm in the fieldwork.

  20. LS-CAP: an algorithm for identifying cytogenetic aberrations in hepatocellular carcinoma using microarray data.

    PubMed

    He, Xianmin; Wei, Qing; Sun, Meiqian; Fu, Xuping; Fan, Sichang; Li, Yao

    2006-05-01

    Biological techniques such as Array-Comparative genomic hybridization (CGH), fluorescent in situ hybridization (FISH) and affymetrix single nucleotide pleomorphism (SNP) array have been used to detect cytogenetic aberrations. However, on genomic scale, these techniques are labor intensive and time consuming. Comparative genomic microarray analysis (CGMA) has been used to identify cytogenetic changes in hepatocellular carcinoma (HCC) using gene expression microarray data. However, CGMA algorithm can not give precise localization of aberrations, fails to identify small cytogenetic changes, and exhibits false negatives and positives. Locally un-weighted smoothing cytogenetic aberrations prediction (LS-CAP) based on local smoothing and binomial distribution can be expected to address these problems. LS-CAP algorithm was built and used on HCC microarray profiles. Eighteen cytogenetic abnormalities were identified, among them 5 were reported previously, and 12 were proven by CGH studies. LS-CAP effectively reduced the false negatives and positives, and precisely located small fragments with cytogenetic aberrations.

  1. Simultaneous data pre-processing and SVM classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils.

    PubMed

    Devos, Olivier; Downey, Gerard; Duponchel, Ludovic

    2014-04-01

    Classification is an important task in chemometrics. For several years now, support vector machines (SVMs) have proven to be powerful for infrared spectral data classification. However such methods require optimisation of parameters in order to control the risk of overfitting and the complexity of the boundary. Furthermore, it is established that the prediction ability of classification models can be improved using pre-processing in order to remove unwanted variance in the spectra. In this paper we propose a new methodology based on genetic algorithm (GA) for the simultaneous optimisation of SVM parameters and pre-processing (GENOPT-SVM). The method has been tested for the discrimination of the geographical origin of Italian olive oil (Ligurian and non-Ligurian) on the basis of near infrared (NIR) or mid infrared (FTIR) spectra. Different classification models (PLS-DA, SVM with mean centre data, GENOPT-SVM) have been tested and statistically compared using McNemar's statistical test. For the two datasets, SVM with optimised pre-processing give models with higher accuracy than the one obtained with PLS-DA on pre-processed data. In the case of the NIR dataset, most of this accuracy improvement (86.3% compared with 82.8% for PLS-DA) occurred using only a single pre-processing step. For the FTIR dataset, three optimised pre-processing steps are required to obtain SVM model with significant accuracy improvement (82.2%) compared to the one obtained with PLS-DA (78.6%). Furthermore, this study demonstrates that even SVM models have to be developed on the basis of well-corrected spectral data in order to obtain higher classification rates.

  2. Toward 'smart' DNA microarrays: algorithms for improving data quality and statistical inference

    NASA Astrophysics Data System (ADS)

    Bakewell, David J. G.; Wit, Ernst

    2007-12-01

    DNA microarrays are a laboratory tool for understanding biological processes at the molecular scale and future applications of this technology include healthcare, agriculture, and environment. Despite their usefulness, however, the information microarrays make available to the end-user is not used optimally, and the data is often noisy and of variable quality. This paper describes the use of hierarchical Maximum Likelihood Estimation (MLE) for generating algorithms that improve the quality of microarray data and enhance statistical inference about gene behavior. The paper describes examples of recent work that improves microarray performance, demonstrated using data from both Monte Carlo simulations and published experiments. One example looks at the variable quality of cDNA spots on a typical microarray surface. It is shown how algorithms, derived using MLE, are used to "weight" these spots according to their morphological quality, and subsequently lead to improved detection of gene activity. Another example, briefly discussed, addresses the "noisy data about too many genes" issue confronting many analysts who are also interested in the collective action of a group of genes, often organized as a pathway or complex. Preliminary work is described where MLE is used to "share" variance information across a pre-assigned group of genes of interest, leading to improved detection of gene activity.

  3. Stable feature selection and classification algorithms for multiclass microarray data

    PubMed Central

    2012-01-01

    Background Recent studies suggest that gene expression profiles are a promising alternative for clinical cancer classification. One major problem in applying DNA microarrays for classification is the dimension of obtained data sets. In this paper we propose a multiclass gene selection method based on Partial Least Squares (PLS) for selecting genes for classification. The new idea is to solve multiclass selection problem with the PLS method and decomposition to a set of two-class sub-problems: one versus rest (OvR) and one versus one (OvO). We use OvR and OvO two-class decomposition for other recently published gene selection method. Ranked gene lists are highly unstable in the sense that a small change of the data set often leads to big changes in the obtained ordered lists. In this paper, we take a look at the assessment of stability of the proposed methods. We use the linear support vector machines (SVM) technique in different variants: one versus one, one versus rest, multiclass SVM (MSVM) and the linear discriminant analysis (LDA) as a classifier. We use balanced bootstrap to estimate the prediction error and to test the variability of the obtained ordered lists. Results This paper focuses on effective identification of informative genes. As a result, a new strategy to find a small subset of significant genes is designed. Our results on real multiclass cancer data show that our method has a very high accuracy rate for different combinations of classification methods, giving concurrently very stable feature rankings. Conclusions This paper shows that the proposed strategies can improve the performance of selected gene sets substantially. OvR and OvO techniques applied to existing gene selection methods improve results as well. The presented method allows to obtain a more reliable classifier with less classifier error. In the same time the method generates more stable ordered feature lists in comparison with existing methods. Reviewers This article was reviewed

  4. An algorithm for finding biologically significant features in microarray data based on a priori manifold learning.

    PubMed

    Hira, Zena M; Trigeorgis, George; Gillies, Duncan F

    2014-01-01

    Microarray databases are a large source of genetic data, which, upon proper analysis, could enhance our understanding of biology and medicine. Many microarray experiments have been designed to investigate the genetic mechanisms of cancer, and analytical approaches have been applied in order to classify different types of cancer or distinguish between cancerous and non-cancerous tissue. However, microarrays are high-dimensional datasets with high levels of noise and this causes problems when using machine learning methods. A popular approach to this problem is to search for a set of features that will simplify the structure and to some degree remove the noise from the data. The most widely used approach to feature extraction is principal component analysis (PCA) which assumes a multivariate Gaussian model of the data. More recently, non-linear methods have been investigated. Among these, manifold learning algorithms, for example Isomap, aim to project the data from a higher dimensional space onto a lower dimension one. We have proposed a priori manifold learning for finding a manifold in which a representative set of microarray data is fused with relevant data taken from the KEGG pathway database. Once the manifold has been constructed the raw microarray data is projected onto it and clustering and classification can take place. In contrast to earlier fusion based methods, the prior knowledge from the KEGG databases is not used in, and does not bias the classification process--it merely acts as an aid to find the best space in which to search the data. In our experiments we have found that using our new manifold method gives better classification results than using either PCA or conventional Isomap.

  5. SPACE: an algorithm to predict and quantify alternatively spliced isoforms using microarrays

    PubMed Central

    Anton, Miguel A; Gorostiaga, Dorleta; Guruceaga, Elizabeth; Segura, Victor; Carmona-Saez, Pedro; Pascual-Montano, Alberto; Pio, Ruben; Montuenga, Luis M; Rubio, Angel

    2008-01-01

    Exon and exon+junction microarrays are promising tools for studying alternative splicing. Current analytical tools applied to these arrays lack two relevant features: the ability to predict unknown spliced forms and the ability to quantify the concentration of known and unknown isoforms. SPACE is an algorithm that has been developed to (1) estimate the number of different transcripts expressed under several conditions, (2) predict the precursor mRNA splicing structure and (3) quantify the transcript concentrations including unknown forms. The results presented here show its robustness and accuracy for real and simulated data. PMID:18312629

  6. Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords.

    PubMed

    Luque-Baena, R M; Urda, D; Gonzalo Claros, M; Franco, L; Jerez, J M

    2014-06-01

    Genetic algorithms are widely used in the estimation of expression profiles from microarrays data. However, these techniques are unable to produce stable and robust solutions suitable to use in clinical and biomedical studies. This paper presents a novel two-stage evolutionary strategy for gene feature selection combining the genetic algorithm with biological information extracted from the KEGG database. A comparative study is carried out over public data from three different types of cancer (leukemia, lung cancer and prostate cancer). Even though the analyses only use features having KEGG information, the results demonstrate that this two-stage evolutionary strategy increased the consistency, robustness and accuracy of a blind discrimination among relapsed and healthy individuals. Therefore, this approach could facilitate the definition of gene signatures for the clinical prognosis and diagnostic of cancer diseases in a near future. Additionally, it could also be used for biological knowledge discovery about the studied disease.

  7. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling.

    PubMed

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.

  8. Development of a Physical Model-Based Algorithm for the Detection of Single-Nucleotide Substitutions by Using Tiling Microarrays

    PubMed Central

    Ono, Naoaki; Suzuki, Shingo; Furusawa, Chikara; Shimizu, Hiroshi; Yomo, Tetsuya

    2013-01-01

    High-density DNA microarrays are useful tools for analyzing sequence changes in DNA samples. Although microarray analysis provides informative signals from a large number of probes, the analysis and interpretation of these signals have certain inherent limitations, namely, complex dependency of signals on the probe sequences and the existence of false signals arising from non-specific binding between probe and target. In this study, we have developed a novel algorithm to detect the single-base substitutions by using microarray data based on a thermodynamic model of hybridization. We modified the thermodynamic model by introducing a penalty for mismatches that represent the effects of substitutions on hybridization affinity. This penalty results in significantly higher detection accuracy than other methods, indicating that the incorporation of hybridization free energy can improve the analysis of sequence variants by using microarray data. PMID:23382915

  9. Classifier dependent feature preprocessing methods

    NASA Astrophysics Data System (ADS)

    Rodriguez, Benjamin M., II; Peterson, Gilbert L.

    2008-04-01

    In mobile applications, computational complexity is an issue that limits sophisticated algorithms from being implemented on these devices. This paper provides an initial solution to applying pattern recognition systems on mobile devices by combining existing preprocessing algorithms for recognition. In pattern recognition systems, it is essential to properly apply feature preprocessing tools prior to training classification models in an attempt to reduce computational complexity and improve the overall classification accuracy. The feature preprocessing tools extended for the mobile environment are feature ranking, feature extraction, data preparation and outlier removal. Most desktop systems today are capable of processing a majority of the available classification algorithms without concern of processing while the same is not true on mobile platforms. As an application of pattern recognition for mobile devices, the recognition system targets the problem of steganalysis, determining if an image contains hidden information. The measure of performance shows that feature preprocessing increases the overall steganalysis classification accuracy by an average of 22%. The methods in this paper are tested on a workstation and a Nokia 6620 (Symbian operating system) camera phone with similar results.

  10. Forward-Masked Frequency Selectivity Improvements in Simulated and Actual Cochlear Implant Users Using a Preprocessing Algorithm

    PubMed Central

    Jürgens, Tim

    2016-01-01

    Frequency selectivity can be quantified using masking paradigms, such as psychophysical tuning curves (PTCs). Normal-hearing (NH) listeners show sharp PTCs that are level- and frequency-dependent, whereas frequency selectivity is strongly reduced in cochlear implant (CI) users. This study aims at (a) assessing individual shapes of PTCs in CI users, (b) comparing these shapes to those of simulated CI listeners (NH listeners hearing through a CI simulation), and (c) increasing the sharpness of PTCs using a biologically inspired dynamic compression algorithm, BioAid, which has been shown to sharpen the PTC shape in hearing-impaired listeners. A three-alternative-forced-choice forward-masking technique was used to assess PTCs in 8 CI users (with their own speech processor) and 11 NH listeners (with and without listening through a vocoder to simulate electric hearing). CI users showed flat PTCs with large interindividual variability in shape, whereas simulated CI listeners had PTCs of the same average flatness, but more homogeneous shapes across listeners. The algorithm BioAid was used to process the stimuli before entering the CI users’ speech processor or the vocoder simulation. This algorithm was able to partially restore frequency selectivity in both groups, particularly in seven out of eight CI users, meaning significantly sharper PTCs than in the unprocessed condition. The results indicate that algorithms can improve the large-scale sharpness of frequency selectivity in some CI users. This finding may be useful for the design of sound coding strategies particularly for situations in which high frequency selectivity is desired, such as for music perception. PMID:27604785

  11. Evaluation of multivariate calibration models with different pre-processing and processing algorithms for a novel resolution and quantitation of spectrally overlapped quaternary mixture in syrup

    NASA Astrophysics Data System (ADS)

    Moustafa, Azza A.; Hegazy, Maha A.; Mohamed, Dalia; Ali, Omnia

    2016-02-01

    A novel approach for the resolution and quantitation of severely overlapped quaternary mixture of carbinoxamine maleate (CAR), pholcodine (PHL), ephedrine hydrochloride (EPH) and sunset yellow (SUN) in syrup was demonstrated utilizing different spectrophotometric assisted multivariate calibration methods. The applied methods have used different processing and pre-processing algorithms. The proposed methods were partial least squares (PLS), concentration residuals augmented classical least squares (CRACLS), and a novel method; continuous wavelet transforms coupled with partial least squares (CWT-PLS). These methods were applied to a training set in the concentration ranges of 40-100 μg/mL, 40-160 μg/mL, 100-500 μg/mL and 8-24 μg/mL for the four components, respectively. The utilized methods have not required any preliminary separation step or chemical pretreatment. The validity of the methods was evaluated by an external validation set. The selectivity of the developed methods was demonstrated by analyzing the drugs in their combined pharmaceutical formulation without any interference from additives. The obtained results were statistically compared with the official and reported methods where no significant difference was observed regarding both accuracy and precision.

  12. Preprocessing of compressed digital video

    NASA Astrophysics Data System (ADS)

    Segall, C. Andrew; Karunaratne, Passant V.; Katsaggelos, Aggelos K.

    2000-12-01

    Pre-processing algorithms improve on the performance of a video compression system by removing spurious noise and insignificant features from the original images. This increases compression efficiency and attenuates coding artifacts. Unfortunately, determining the appropriate amount of pre-filtering is a difficult problem, as it depends on both the content of an image as well as the target bit-rate of compression algorithm. In this paper, we explore a pre- processing technique that is loosely coupled to the quantization decisions of a rate control mechanism. This technique results in a pre-processing system that operates directly on the Displaced Frame Difference (DFD) and is applicable to any standard-compatible compression system. Results explore the effect of several standard filters on the DFD. An adaptive technique is then considered.

  13. A Regression-Based Differential Expression Detection Algorithm for Microarray Studies with Ultra-Low Sample Size

    PubMed Central

    McDonough, Molly; Rabe, Brian; Saha, Margaret

    2015-01-01

    Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality. PMID:25738861

  14. Biomarker Discovery Based on Hybrid Optimization Algorithm and Artificial Neural Networks on Microarray Data for Cancer Classification.

    PubMed

    Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Pirhadi, Shiva; Garshasbi, Masoud

    2015-01-01

    The improvement of high-through-put gene profiling based microarrays technology has provided monitoring the expression value of thousands of genes simultaneously. Detailed examination of changes in expression levels of genes can help physicians to have efficient diagnosing, classification of tumors and cancer's types as well as effective treatments. Finding genes that can classify the group of cancers correctly based on hybrid optimization algorithms is the main purpose of this paper. In this paper, a hybrid particle swarm optimization and genetic algorithm method are used for gene selection and also artificial neural network (ANN) is adopted as the classifier. In this work, we have improved the ability of the algorithm for the classification problem by finding small group of biomarkers and also best parameters of the classifier. The proposed approach is tested on three benchmark gene expression data sets: Blood (acute myeloid leukemia, acute lymphoblastic leukemia), colon and breast datasets. We used 10-fold cross-validation to achieve accuracy and also decision tree algorithm to find the relation between the biomarkers for biological point of view. To test the ability of the trained ANN models to categorize the cancers, we analyzed additional blinded samples that were not previously used for the training procedure. Experimental results show that the proposed method can reduce the dimension of the data set and confirm the most informative gene subset and improve classification accuracy with best parameters based on datasets.

  15. The preprocessed doacross loop

    NASA Technical Reports Server (NTRS)

    Saltz, Joel H.; Mirchandaney, Ravi

    1990-01-01

    Dependencies between loop iterations cannot always be characterized during program compilation. Doacross loops typically make use of a-priori knowledge of inter-iteration dependencies to carry out required synchronizations. A type of doacross loop is proposed that allows the scheduling of iterations of a loop among processors without advance knowledge of inter-iteration dependencies. The method proposed for loop iterations requires that parallelizable preprocessing and postprocessing steps be carried out during program execution.

  16. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm

    PubMed Central

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed. PMID:27446945

  17. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm.

    PubMed

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu; Tian, Suyan

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed.

  18. Comparing Binaural Pre-processing Strategies III

    PubMed Central

    Warzybok, Anna; Ernst, Stephan M. A.

    2015-01-01

    A comprehensive evaluation of eight signal pre-processing strategies, including directional microphones, coherence filters, single-channel noise reduction, binaural beamformers, and their combinations, was undertaken with normal-hearing (NH) and hearing-impaired (HI) listeners. Speech reception thresholds (SRTs) were measured in three noise scenarios (multitalker babble, cafeteria noise, and single competing talker). Predictions of three common instrumental measures were compared with the general perceptual benefit caused by the algorithms. The individual SRTs measured without pre-processing and individual benefits were objectively estimated using the binaural speech intelligibility model. Ten listeners with NH and 12 HI listeners participated. The participants varied in age and pure-tone threshold levels. Although HI listeners required a better signal-to-noise ratio to obtain 50% intelligibility than listeners with NH, no differences in SRT benefit from the different algorithms were found between the two groups. With the exception of single-channel noise reduction, all algorithms showed an improvement in SRT of between 2.1 dB (in cafeteria noise) and 4.8 dB (in single competing talker condition). Model predictions with binaural speech intelligibility model explained 83% of the measured variance of the individual SRTs in the no pre-processing condition. Regarding the benefit from the algorithms, the instrumental measures were not able to predict the perceptual data in all tested noise conditions. The comparable benefit observed for both groups suggests a possible application of noise reduction schemes for listeners with different hearing status. Although the model can predict the individual SRTs without pre-processing, further development is necessary to predict the benefits obtained from the algorithms at an individual level. PMID:26721922

  19. Retinex Preprocessing for Improved Multi-Spectral Image Classification

    NASA Technical Reports Server (NTRS)

    Thompson, B.; Rahman, Z.; Park, S.

    2000-01-01

    The goal of multi-image classification is to identify and label "similar regions" within a scene. The ability to correctly classify a remotely sensed multi-image of a scene is affected by the ability of the classification process to adequately compensate for the effects of atmospheric variations and sensor anomalies. Better classification may be obtained if the multi-image is preprocessed before classification, so as to reduce the adverse effects of image formation. In this paper, we discuss the overall impact on multi-spectral image classification when the retinex image enhancement algorithm is used to preprocess multi-spectral images. The retinex is a multi-purpose image enhancement algorithm that performs dynamic range compression, reduces the dependence on lighting conditions, and generally enhances apparent spatial resolution. The retinex has been successfully applied to the enhancement of many different types of grayscale and color images. We show in this paper that retinex preprocessing improves the spatial structure of multi-spectral images and thus provides better within-class variations than would otherwise be obtained without the preprocessing. For a series of multi-spectral images obtained with diffuse and direct lighting, we show that without retinex preprocessing the class spectral signatures vary substantially with the lighting conditions. Whereas multi-dimensional clustering without preprocessing produced one-class homogeneous regions, the classification on the preprocessed images produced multi-class non-homogeneous regions. This lack of homogeneity is explained by the interaction between different agronomic treatments applied to the regions: the preprocessed images are closer to ground truth. The principle advantage that the retinex offers is that for different lighting conditions classifications derived from the retinex preprocessed images look remarkably "similar", and thus more consistent, whereas classifications derived from the original

  20. Preprocessing for classification of thermograms in breast cancer detection

    NASA Astrophysics Data System (ADS)

    Neumann, Łukasz; Nowak, Robert M.; Okuniewski, Rafał; Oleszkiewicz, Witold; Cichosz, Paweł; Jagodziński, Dariusz; Matysiewicz, Mateusz

    2016-09-01

    Performance of binary classification of breast cancer suffers from high imbalance between classes. In this article we present the preprocessing module designed to negate the discrepancy in training examples. Preprocessing module is based on standardization, Synthetic Minority Oversampling Technique and undersampling. We show how each algorithm influences classification accuracy. Results indicate that described module improves overall Area Under Curve up to 10% on the tested dataset. Furthermore we propose other methods of dealing with imbalanced datasets in breast cancer classification.

  1. Rapid large-scale oligonucleotide selection for microarrays.

    PubMed

    Rahmann, Sven

    2002-01-01

    We present the first algorithm that selects oligonucleotide probes (e.g. 25-mers) for microarray experiments on a large scale. For example, oligos for human genes can be found within 50 hours. This becomes possible by using the longest common substring as a specificity measure for candidate oligos. We present an algorithm based on a suffix array with additional information that is efficient both in terms of memory usage and running time to rank all candidate oligos according to their specificity. We also introduce the concept of master sequences to describe the sequences from which oligos are to be selected. Constraints such as oligo length, melting temperature, and self-complementarity are incorporated in the master sequence at a preprocessing stage and thus kept separate from the main selection problem. As a result, custom oligos can now be designed for any sequenced genome, just as the technology for on-site chip synthesis is becoming increasingly mature.

  2. Wavelet Preprocessing of Acoustic Signals

    DTIC Science & Technology

    1991-12-01

    wavelet transform to preprocess acoustic broadband signals in a system that discriminates between different classes of acoustic bursts. This is motivated by the similarity between the proportional bandwidth filters provided by the wavelet transform and those found in biological hearing systems. The experiment involves comparing statistical pattern classifier effects of wavelet and FFT preprocessed acoustic signals. The data used was from the DARPA Phase I database, which consists of artificially generated signals with real ocean background. The

  3. Context-based preprocessing of molecular docking data

    PubMed Central

    2013-01-01

    Background Data preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generation of comprehensive models from large datasets. In this article, we propose a context-based data preprocessing approach to mine data from molecular docking simulation results. The test cases used a fully-flexible receptor (FFR) model of Mycobacterium tuberculosis InhA enzyme (FFR_InhA) and four different ligands. Results We generated an initial set of attributes as well as their respective instances. To improve this initial set, we applied two selection strategies. The first was based on our context-based approach while the second used the CFS (Correlation-based Feature Selection) machine learning algorithm. Additionally, we produced an extra dataset containing features selected by combining our context strategy and the CFS algorithm. To demonstrate the effectiveness of the proposed method, we evaluated its performance based on various predictive (RMSE, MAE, Correlation, and Nodes) and context (Precision, Recall and FScore) measures. Conclusions Statistical analysis of the results shows that the proposed context-based data preprocessing approach significantly improves predictive and context measures and outperforms the CFS algorithm. Context-based data preprocessing improves mining results by producing superior interpretable models, which makes it well-suited for practical applications in molecular docking simulations using FFR models. PMID:24564276

  4. Optimal Preprocessing Of GPS Data

    NASA Technical Reports Server (NTRS)

    Wu, Sien-Chong; Melbourne, William G.

    1994-01-01

    Improved technique for preprocessing data from Global Positioning System (GPS) receiver reduces processing time and number of data to be stored. Technique optimal in sense it maintains strength of data. Also sometimes increases ability to resolve ambiguities in numbers of cycles of received GPS carrier signals.

  5. Optimal Preprocessing Of GPS Data

    NASA Technical Reports Server (NTRS)

    Wu, Sien-Chong; Melbourne, William G.

    1994-01-01

    Improved technique for preprocessing data from Global Positioning System receiver reduces processing time and number of data to be stored. Optimal in sense that it maintains strength of data. Also increases ability to resolve ambiguities in numbers of cycles of received GPS carrier signals.

  6. Arabic handwritten: pre-processing and segmentation

    NASA Astrophysics Data System (ADS)

    Maliki, Makki; Jassim, Sabah; Al-Jawad, Naseer; Sellahewa, Harin

    2012-06-01

    This paper is concerned with pre-processing and segmentation tasks that influence the performance of Optical Character Recognition (OCR) systems and handwritten/printed text recognition. In Arabic, these tasks are adversely effected by the fact that many words are made up of sub-words, with many sub-words there associated one or more diacritics that are not connected to the sub-word's body; there could be multiple instances of sub-words overlap. To overcome these problems we investigate and develop segmentation techniques that first segment a document into sub-words, link the diacritics with their sub-words, and removes possible overlapping between words and sub-words. We shall also investigate two approaches for pre-processing tasks to estimate sub-words baseline, and to determine parameters that yield appropriate slope correction, slant removal. We shall investigate the use of linear regression on sub-words pixels to determine their central x and y coordinates, as well as their high density part. We also develop a new incremental rotation procedure to be performed on sub-words that determines the best rotation angle needed to realign baselines. We shall demonstrate the benefits of these proposals by conducting extensive experiments on publicly available databases and in-house created databases. These algorithms help improve character segmentation accuracy by transforming handwritten Arabic text into a form that could benefit from analysis of printed text.

  7. MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach

    PubMed Central

    Abduallah, Yasser; Byron, Kevin; Du, Zongxuan; Cervantes-Cervantes, Miguel

    2017-01-01

    Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool. PMID:28243601

  8. MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach.

    PubMed

    Abduallah, Yasser; Turki, Turki; Byron, Kevin; Du, Zongxuan; Cervantes-Cervantes, Miguel; Wang, Jason T L

    2017-01-01

    Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool.

  9. Wavelet preprocessing of acoustic signals

    NASA Astrophysics Data System (ADS)

    Huang, W. Y.; Solorzano, M. R.

    1991-12-01

    This paper describes results using the wavelet transform to preprocess acoustic broadband signals in a system that discriminates between different classes of acoustic bursts. This is motivated by the similarity between the proportional bandwidth filters provided by the wavelet transform and those found in biological hearing systems. The experiment involves comparing statistical pattern classifier effects of wavelet and FFT preprocessed acoustic signals. The data used was from the DARPA Phase 1 database, which consists of artificially generated signals with real ocean background. The results show that the wavelet transform did provide improved performance when classifying in a frame-by-frame basis. The DARPA Phase 1 database is well matched to proportional bandwidth filtering; i.e., signal classes that contain high frequencies do tend to have shorter duration in this database. It is also noted that the decreasing background levels at high frequencies compensate for the poor match of the wavelet transform for long duration (high frequency) signals.

  10. Image preprocessing study on KPCA-based face recognition

    NASA Astrophysics Data System (ADS)

    Li, Xuan; Li, Dehua

    2015-12-01

    Face recognition as an important biometric identification method, with its friendly, natural, convenient advantages, has obtained more and more attention. This paper intends to research a face recognition system including face detection, feature extraction and face recognition, mainly through researching on related theory and the key technology of various preprocessing methods in face detection process, using KPCA method, focuses on the different recognition results in different preprocessing methods. In this paper, we choose YCbCr color space for skin segmentation and choose integral projection for face location. We use erosion and dilation of the opening and closing operation and illumination compensation method to preprocess face images, and then use the face recognition method based on kernel principal component analysis method for analysis and research, and the experiments were carried out using the typical face database. The algorithms experiment on MATLAB platform. Experimental results show that integration of the kernel method based on PCA algorithm under certain conditions make the extracted features represent the original image information better for using nonlinear feature extraction method, which can obtain higher recognition rate. In the image preprocessing stage, we found that images under various operations may appear different results, so as to obtain different recognition rate in recognition stage. At the same time, in the process of the kernel principal component analysis, the value of the power of the polynomial function can affect the recognition result.

  11. Analysis of High-Throughput ELISA Microarray Data

    SciTech Connect

    White, Amanda M.; Daly, Don S.; Zangar, Richard C.

    2011-02-23

    Our research group develops analytical methods and software for the high-throughput analysis of quantitative enzyme-linked immunosorbent assay (ELISA) microarrays. ELISA microarrays differ from DNA microarrays in several fundamental aspects and most algorithms for analysis of DNA microarray data are not applicable to ELISA microarrays. In this review, we provide an overview of the steps involved in ELISA microarray data analysis and how the statistically sound algorithms we have developed provide an integrated software suite to address the needs of each data-processing step. The algorithms discussed are available in a set of open-source software tools (http://www.pnl.gov/statistics/ProMAT).

  12. A comparative analytical assay of gene regulatory networks inferred using microarray and RNA-seq datasets

    PubMed Central

    Izadi, Fereshteh; Zarrini, Hamid Najafi; Kiani, Ghaffar; Jelodar, Nadali Babaeian

    2016-01-01

    A Gene Regulatory Network (GRN) is a collection of interactions between molecular regulators and their targets in cells governing gene expression level. Omics data explosion generated from high-throughput genomic assays such as microarray and RNA-Seq technologies and the emergence of a number of pre-processing methods demands suitable guidelines to determine the impact of transcript data platforms and normalization procedures on describing associations in GRNs. In this study exploiting publically available microarray and RNA-Seq datasets and a gold standard of transcriptional interactions in Arabidopsis, we performed a comparison between six GRNs derived by RNA-Seq and microarray data and different normalization procedures. As a result we observed that compared algorithms were highly data-specific and Networks reconstructed by RNA-Seq data revealed a considerable accuracy against corresponding networks captured by microarrays. Topological analysis showed that GRNs inferred from two platforms were similar in several of topological features although we observed more connectivity in RNA-Seq derived genes network. Taken together transcriptional regulatory networks obtained by Robust Multiarray Averaging (RMA) and Variance-Stabilizing Transformed (VST) normalized data demonstrated predicting higher rate of true edges over the rest of methods used in this comparison. PMID:28293077

  13. Celsius: a community resource for Affymetrix microarray data.

    PubMed

    Day, Allen; Carlson, Marc R J; Dong, Jun; O'Connor, Brian D; Nelson, Stanley F

    2007-01-01

    Celsius is a data warehousing system to aggregate Affymetrix CEL files and associated metadata. It provides mechanisms for importing, storing, querying, and exporting large volumes of primary and pre-processed microarray data. Celsius contains ten billion assay measurements and affiliated metadata. It is the largest publicly available source of Affymetrix microarray data, and through sheer volume it allows a sophisticated, broad view of transcription that has not previously been possible.

  14. Chromosome Microarray.

    PubMed

    Anderson, Sharon

    2016-01-01

    Over the last half century, knowledge about genetics, genetic testing, and its complexity has flourished. Completion of the Human Genome Project provided a foundation upon which the accuracy of genetics, genomics, and integration of bioinformatics knowledge and testing has grown exponentially. What is lagging, however, are efforts to reach and engage nurses about this rapidly changing field. The purpose of this article is to familiarize nurses with several frequently ordered genetic tests including chromosomes and fluorescence in situ hybridization followed by a comprehensive review of chromosome microarray. It shares the complexity of microarray including how testing is performed and results analyzed. A case report demonstrates how this technology is applied in clinical practice and reveals benefits and limitations of this scientific and bioinformatics genetic technology. Clinical implications for maternal-child nurses across practice levels are discussed.

  15. Linear model for fast background subtraction in oligonucleotide microarrays

    PubMed Central

    2009-01-01

    Background One important preprocessing step in the analysis of microarray data is background subtraction. In high-density oligonucleotide arrays this is recognized as a crucial step for the global performance of the data analysis from raw intensities to expression values. Results We propose here an algorithm for background estimation based on a model in which the cost function is quadratic in a set of fitting parameters such that minimization can be performed through linear algebra. The model incorporates two effects: 1) Correlated intensities between neighboring features in the chip and 2) sequence-dependent affinities for non-specific hybridization fitted by an extended nearest-neighbor model. Conclusion The algorithm has been tested on 360 GeneChips from publicly available data of recent expression experiments. The algorithm is fast and accurate. Strong correlations between the fitted values for different experiments as well as between the free-energy parameters and their counterparts in aqueous solution indicate that the model captures a significant part of the underlying physical chemistry. PMID:19917117

  16. Research on pre-processing of QR Code

    NASA Astrophysics Data System (ADS)

    Sun, Haixing; Xia, Haojie; Dong, Ning

    2013-10-01

    QR code encodes many kinds of information because of its advantages: large storage capacity, high reliability, full arrange of utter-high-speed reading, small printing size and high-efficient representation of Chinese characters, etc. In order to obtain the clearer binarization image from complex background, and improve the recognition rate of QR code, this paper researches on pre-processing methods of QR code (Quick Response Code), and shows algorithms and results of image pre-processing for QR code recognition. Improve the conventional method by changing the Souvola's adaptive text recognition method. Additionally, introduce the QR code Extraction which adapts to different image size, flexible image correction approach, and improve the efficiency and accuracy of QR code image processing.

  17. Adaptive fingerprint image enhancement with emphasis on preprocessing of data.

    PubMed

    Bartůnek, Josef Ström; Nilsson, Mikael; Sällberg, Benny; Claesson, Ingvar

    2013-02-01

    This article proposes several improvements to an adaptive fingerprint enhancement method that is based on contextual filtering. The term adaptive implies that parameters of the method are automatically adjusted based on the input fingerprint image. Five processing blocks comprise the adaptive fingerprint enhancement method, where four of these blocks are updated in our proposed system. Hence, the proposed overall system is novel. The four updated processing blocks are: 1) preprocessing; 2) global analysis; 3) local analysis; and 4) matched filtering. In the preprocessing and local analysis blocks, a nonlinear dynamic range adjustment method is used. In the global analysis and matched filtering blocks, different forms of order statistical filters are applied. These processing blocks yield an improved and new adaptive fingerprint image processing method. The performance of the updated processing blocks is presented in the evaluation part of this paper. The algorithm is evaluated toward the NIST developed NBIS software for fingerprint recognition on FVC databases.

  18. Feature Detection Techniques for Preprocessing Proteomic Data

    PubMed Central

    Sellers, Kimberly F.; Miecznikowski, Jeffrey C.

    2010-01-01

    Numerous gel-based and nongel-based technologies are used to detect protein changes potentially associated with disease. The raw data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. Low-level analysis issues (including normalization, background correction, gel and/or spectral alignment, feature detection, and image registration) are substantial problems that need to be addressed, because any large-level data analyses are contingent on appropriate and statistically sound low-level procedures. Feature detection approaches are particularly interesting due to the increased computational speed associated with subsequent calculations. Such summary data corresponding to image features provide a significant reduction in overall data size and structure while retaining key information. In this paper, we focus on recent advances in feature detection as a tool for preprocessing proteomic data. This work highlights existing and newly developed feature detection algorithms for proteomic datasets, particularly relating to time-of-flight mass spectrometry, and two-dimensional gel electrophoresis. Note, however, that the associated data structures (i.e., spectral data, and images containing spots) used as input for these methods are obtained via all gel-based and nongel-based methods discussed in this manuscript, and thus the discussed methods are likewise applicable. PMID:20467457

  19. Diagnostic challenges for multiplexed protein microarrays.

    PubMed

    Master, Stephen R; Bierl, Charlene; Kricka, Larry J

    2006-11-01

    Multiplexed protein analysis using planar microarrays or microbeads is growing in popularity for simultaneous assays of antibodies, cytokines, allergens, drugs and hormones. However, this new assay format presents several new operational issues for the clinical laboratory, such as the quality control of protein-microarray-based assays, the release of unrequested test data and the use of diagnostic algorithms to transform microarray data into diagnostic results.

  20. Automated Microarray Image Analysis Toolbox for MATLAB

    SciTech Connect

    White, Amanda M.; Daly, Don S.; Willse, Alan R.; Protic, Miroslava; Chandler, Darrell P.

    2005-09-01

    The Automated Microarray Image Analysis (AMIA) Toolbox for MATLAB is a flexible, open-source microarray image analysis tool that allows the user to customize analysis of sets of microarray images. This tool provides several methods of identifying and quantify spot statistics, as well as extensive diagnostic statistics and images to identify poor data quality or processing. The open nature of this software allows researchers to understand the algorithms used to provide intensity estimates and to modify them easily if desired.

  1. Optimized LOWESS normalization parameter selection for DNA microarray data

    PubMed Central

    Berger, John A; Hautaniemi, Sampsa; Järvinen, Anna-Kaarina; Edgren, Henrik; Mitra, Sanjit K; Astola, Jaakko

    2004-01-01

    Background Microarray data normalization is an important step for obtaining data that are reliable and usable for subsequent analysis. One of the most commonly utilized normalization techniques is the locally weighted scatterplot smoothing (LOWESS) algorithm. However, a much overlooked concern with the LOWESS normalization strategy deals with choosing the appropriate parameters. Parameters are usually chosen arbitrarily, which may reduce the efficiency of the normalization and result in non-optimally normalized data. Thus, there is a need to explore LOWESS parameter selection in greater detail. Results and discussion In this work, we discuss how to choose parameters for the LOWESS method. Moreover, we present an optimization approach for obtaining the fraction of data points utilized in the local regression and analyze results for local print-tip normalization. The optimization procedure determines the bandwidth parameter for the local regression by minimizing a cost function that represents the mean-squared difference between the LOWESS estimates and the normalization reference level. We demonstrate the utility of the systematic parameter selection using two publicly available data sets. The first data set consists of three self versus self hybridizations, which allow for a quantitative study of the optimization method. The second data set contains a collection of DNA microarray data from a breast cancer study utilizing four breast cancer cell lines. Our results show that different parameter choices for the bandwidth window yield dramatically different calibration results in both studies. Conclusions Results derived from the self versus self experiment indicate that the proposed optimization approach is a plausible solution for estimating the LOWESS parameters, while results from the breast cancer experiment show that the optimization procedure is readily applicable to real-life microarray data normalization. In summary, the systematic approach to obtain critical

  2. Evaluation of the efficiency of continuous wavelet transform as processing and preprocessing algorithm for resolution of overlapped signals in univariate and multivariate regression analyses; an application to ternary and quaternary mixtures.

    PubMed

    Hegazy, Maha A; Lotfy, Hayam M; Mowaka, Shereen; Mohamed, Ekram Hany

    2016-07-05

    Wavelets have been adapted for a vast number of signal-processing applications due to the amount of information that can be extracted from a signal. In this work, a comparative study on the efficiency of continuous wavelet transform (CWT) as a signal processing tool in univariate regression and a pre-processing tool in multivariate analysis using partial least square (CWT-PLS) was conducted. These were applied to complex spectral signals of ternary and quaternary mixtures. CWT-PLS method succeeded in the simultaneous determination of a quaternary mixture of drotaverine (DRO), caffeine (CAF), paracetamol (PAR) and p-aminophenol (PAP, the major impurity of paracetamol). While, the univariate CWT failed to simultaneously determine the quaternary mixture components and was able to determine only PAR and PAP, the ternary mixtures of DRO, CAF, and PAR and CAF, PAR, and PAP. During the calculations of CWT, different wavelet families were tested. The univariate CWT method was validated according to the ICH guidelines. While for the development of the CWT-PLS model a calibration set was prepared by means of an orthogonal experimental design and their absorption spectra were recorded and processed by CWT. The CWT-PLS model was constructed by regression between the wavelet coefficients and concentration matrices and validation was performed by both cross validation and external validation sets. Both methods were successfully applied for determination of the studied drugs in pharmaceutical formulations.

  3. Evaluation of the efficiency of continuous wavelet transform as processing and preprocessing algorithm for resolution of overlapped signals in univariate and multivariate regression analyses; an application to ternary and quaternary mixtures

    NASA Astrophysics Data System (ADS)

    Hegazy, Maha A.; Lotfy, Hayam M.; Mowaka, Shereen; Mohamed, Ekram Hany

    2016-07-01

    Wavelets have been adapted for a vast number of signal-processing applications due to the amount of information that can be extracted from a signal. In this work, a comparative study on the efficiency of continuous wavelet transform (CWT) as a signal processing tool in univariate regression and a pre-processing tool in multivariate analysis using partial least square (CWT-PLS) was conducted. These were applied to complex spectral signals of ternary and quaternary mixtures. CWT-PLS method succeeded in the simultaneous determination of a quaternary mixture of drotaverine (DRO), caffeine (CAF), paracetamol (PAR) and p-aminophenol (PAP, the major impurity of paracetamol). While, the univariate CWT failed to simultaneously determine the quaternary mixture components and was able to determine only PAR and PAP, the ternary mixtures of DRO, CAF, and PAR and CAF, PAR, and PAP. During the calculations of CWT, different wavelet families were tested. The univariate CWT method was validated according to the ICH guidelines. While for the development of the CWT-PLS model a calibration set was prepared by means of an orthogonal experimental design and their absorption spectra were recorded and processed by CWT. The CWT-PLS model was constructed by regression between the wavelet coefficients and concentration matrices and validation was performed by both cross validation and external validation sets. Both methods were successfully applied for determination of the studied drugs in pharmaceutical formulations.

  4. hemaClass.org: Online One-By-One Microarray Normalization and Classification of Hematological Cancers for Precision Medicine

    PubMed Central

    Falgreen, Steffen; Ellern Bilgrau, Anders; Brøndum, Rasmus Froberg; Hjort Jakobsen, Lasse; Have, Jonas; Lindblad Nielsen, Kasper; El-Galaly, Tarec Christoffer; Bødker, Julie Støve; Schmitz, Alexander; H. Young, Ken; Johnsen, Hans Erik; Dybkær, Karen; Bøgsted, Martin

    2016-01-01

    Background Dozens of omics based cancer classification systems have been introduced with prognostic, diagnostic, and predictive capabilities. However, they often employ complex algorithms and are only applicable on whole cohorts of patients, making them difficult to apply in a personalized clinical setting. Results This prompted us to create hemaClass.org, an online web application providing an easy interface to one-by-one RMA normalization of microarrays and subsequent risk classifications of diffuse large B-cell lymphoma (DLBCL) into cell-of-origin and chemotherapeutic sensitivity classes. Classification results for one-by-one array pre-processing with and without a laboratory specific RMA reference dataset were compared to cohort based classifiers in 4 publicly available datasets. Classifications showed high agreement between one-by-one and whole cohort pre-processsed data when a laboratory specific reference set was supplied. The website is essentially the R-package hemaClass accompanied by a Shiny web application. The well-documented package can be used to run the website locally or to use the developed methods programmatically. Conclusions The website and R-package is relevant for biological and clinical lymphoma researchers using affymetrix U-133 Plus 2 arrays, as it provides reliable and swift methods for calculation of disease subclasses. The proposed one-by-one pre-processing method is relevant for all researchers using microarrays. PMID:27701436

  5. Computational biology of genome expression and regulation--a review of microarray bioinformatics.

    PubMed

    Wang, Junbai

    2008-01-01

    Microarray technology is being used widely in various biomedical research areas; the corresponding microarray data analysis is an essential step toward the best utilizing of array technologies. Here we review two components of the microarray data analysis: a low level of microarray data analysis that emphasizes the designing, the quality control, and the preprocessing of microarray experiments, then a high level of microarray data analysis that focuses on the domain-specific microarray applications such as tumor classification, biomarker prediction, analyzing array CGH experiments, and reverse engineering of gene expression networks. Additionally, we will review the recent development of building a predictive model in genome expression and regulation studies. This review may help biologists grasp a basic knowledge of microarray bioinformatics as well as its potential impact on the future evolvement of biomedical research fields.

  6. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    PubMed

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  7. Efficient Preprocessing technique using Web log mining

    NASA Astrophysics Data System (ADS)

    Raiyani, Sheetal A.; jain, Shailendra

    2012-11-01

    Web Usage Mining can be described as the discovery and Analysis of user access pattern through mining of log files and associated data from a particular websites. No. of visitors interact daily with web sites around the world. enormous amount of data are being generated and these information could be very prize to the company in the field of accepting Customerís behaviors. In this paper a complete preprocessing style having data cleaning, user and session Identification activities to improve the quality of data. Efficient preprocessing technique one of the User Identification which is key issue in preprocessing technique phase is to identify the Unique web users. Traditional User Identification is based on the site structure, being supported by using some heuristic rules, for use of this reduced the efficiency of user identification solve this difficulty we introduced proposed Technique DUI (Distinct User Identification) based on IP address ,Agent and Session time ,Referred pages on desired session time. Which can be used in counter terrorism, fraud detection and detection of unusual access of secure data, as well as through detection of regular access behavior of users improve the overall designing and performance of upcoming access of preprocessing results.

  8. Preprocessing Moist Lignocellulosic Biomass for Biorefinery Feedstocks

    SciTech Connect

    Neal Yancey; Christopher T. Wright; Craig Conner; J. Richard Hess

    2009-06-01

    Biomass preprocessing is one of the primary operations in the feedstock assembly system of a lignocellulosic biorefinery. Preprocessing is generally accomplished using industrial grinders to format biomass materials into a suitable biorefinery feedstock for conversion to ethanol and other bioproducts. Many factors affect machine efficiency and the physical characteristics of preprocessed biomass. For example, moisture content of the biomass as received from the point of production has a significant impact on overall system efficiency and can significantly affect the characteristics (particle size distribution, flowability, storability, etc.) of the size-reduced biomass. Many different grinder configurations are available on the market, each with advantages under specific conditions. Ultimately, the capacity and/or efficiency of the grinding process can be enhanced by selecting the grinder configuration that optimizes grinder performance based on moisture content and screen size. This paper discusses the relationships of biomass moisture with respect to preprocessing system performance and product physical characteristics and compares data obtained on corn stover, switchgrass, and wheat straw as model feedstocks during Vermeer HG 200 grinder testing. During the tests, grinder screen configuration and biomass moisture content were varied and tested to provide a better understanding of their relative impact on machine performance and the resulting feedstock physical characteristics and uniformity relative to each crop tested.

  9. Groundtruth approach to accurate quantitation of fluorescence microarrays

    SciTech Connect

    Mascio-Kegelmeyer, L; Tomascik-Cheeseman, L; Burnett, M S; van Hummelen, P; Wyrobek, A J

    2000-12-01

    To more accurately measure fluorescent signals from microarrays, we calibrated our acquisition and analysis systems by using groundtruth samples comprised of known quantities of red and green gene-specific DNA probes hybridized to cDNA targets. We imaged the slides with a full-field, white light CCD imager and analyzed them with our custom analysis software. Here we compare, for multiple genes, results obtained with and without preprocessing (alignment, color crosstalk compensation, dark field subtraction, and integration time). We also evaluate the accuracy of various image processing and analysis techniques (background subtraction, segmentation, quantitation and normalization). This methodology calibrates and validates our system for accurate quantitative measurement of microarrays. Specifically, we show that preprocessing the images produces results significantly closer to the known ground-truth for these samples.

  10. Image preprocessing for vehicle panoramic system

    NASA Astrophysics Data System (ADS)

    Wang, Ting; Chen, Liguo

    2016-10-01

    Due to the problem that distortion exist for panoramic image stitching in terms of vehicle panoramic system, the research on image preprocessing for panoramic system was carried out. Firstly, the principal of vehicle panoramic vision was analyzed and the fundamental role that image preprocessing procedure plays in panoramic vision is found out. And then, the camera was calibrated with Hyperboloid model and the correction for distorted image is realized. Oriented by panoramic system, the effect of mounting position was taken into consideration for image correction and the suggested angle of camera installation for different mounting position is given. Through analyzing the existing problem of bird-eye image, a method of transforming twice with calibration plates is proposed. Experiment results indicate that the proposed method can weaken the existing problem of bird-eye image effectively and it can contribute to reduce the distortion in terms of image stitching for panoramic system.

  11. Acquisition and preprocessing of LANDSAT data

    NASA Technical Reports Server (NTRS)

    Horn, T. N.; Brown, L. E.; Anonsen, W. H. (Principal Investigator)

    1979-01-01

    The original configuration of the GSFC data acquisition, preprocessing, and transmission subsystem, designed to provide LANDSAT data inputs to the LACIE system at JSC, is described. Enhancements made to support LANDSAT -2, and modifications for LANDSAT -3 are discussed. Registration performance throughout the 3 year period of LACIE operations satisfied the 1 pixel root-mean-square requirements established in 1974, with more than two of every three attempts at data registration proving successful, notwithstanding cosmetic faults or content inadequacies to which the process is inherently susceptible. The cloud/snow rejection rate experienced throughout the last 3 years has approached 50%, as expected in most LANDSAT data use situations.

  12. Overview of Protein Microarrays

    PubMed Central

    Reymond Sutandy, FX; Qian, Jiang; Chen, Chien-Sheng; Zhu, Heng

    2013-01-01

    Protein microarray is an emerging technology that provides a versatile platform for characterization of hundreds of thousands of proteins in a highly parallel and high-throughput way. Two major classes of protein microarrays are defined to describe their applications: analytical and functional protein microarrays. In addition, tissue or cell lysates can also be fractionated and spotted on a slide to form a reverse-phase protein microarray. While the fabrication technology is maturing, applications of protein microarrays, especially functional protein microarrays, have flourished during the past decade. Here, we will first review recent advances in the protein microarray technologies, and then present a series of examples to illustrate the applications of analytical and functional protein microarrays in both basic and clinical research. The research areas will include detection of various binding properties of proteins, study of protein posttranslational modifications, analysis of host-microbe interactions, profiling antibody specificity, and identification of biomarkers in autoimmune diseases. As a powerful technology platform, it would not be surprising if protein microarrays will become one of the leading technologies in proteomic and diagnostic fields in the next decade. PMID:23546620

  13. Integration of amplified differential gene expression (ADGE) and DNA microarray.

    PubMed

    Chen, Zhijian J; Gaté, Laurent; Davis, Warren; Ile, Kristina E; Tew, Kenneth D

    2002-12-01

    Amplified Differential Gene Expression (ADGE) provides a new concept that the ratios of differentially expressed genes are magnified before detection in order to improve both sensitivity and accuracy. This technology is now implemented with integration of DNA reassociation and PCR. The ADGE technique can be used either as a stand-alone method or in series with DNA microarray. ADGE is used in sample preprocessing and DNA microarray is used as a displaying system in the series combination. These two techniques are mutually synergistic: the quadratic magnification of ratios of differential gene expression achieved by ADGE improves the detection sensitivity and accuracy; the PCR amplification of templates enhances the signal intensity and reduces the requirement for large amounts of starting material; the high throughput for DNA microarray is maintained.

  14. Real-time preprocessing of holographic information

    NASA Astrophysics Data System (ADS)

    Schilling, Bradley W.; Poon, Ting-Chung

    1995-11-01

    Optical scanning holography (OSH) is a holographic recording technique that uses active optical heterodyne scanning to generate holographic information pertaining to an object. The holographic information manifests itself as an electrical signal suitable for real-time image reconstruction using a spatial light modulator. The electrical signal that carries the holographic information can also be digitized for computer storage and processing, allowing the image reconstruction to be performed numerically. In previous experiments with this technique, holographic information has been recorded using the interference pattern of a plane wave and a spherical wave of different temporal frequencies to scan an object. However, the proper manipulation of the pupil functions in the recording stage can result in real-time processing of the holographic edge extraction technique as an important example of real-time preprocessing of holographic information that utilizes alternate pupils in the OSH recording stage. We investigate the theory of holographic preprocessing using a spatial frequency-domain analysis based on the recording system's optical transfer function. The theory is reinforced through computer simulation.

  15. Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines

    PubMed Central

    del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J.; Raboso, Mariano

    2015-01-01

    Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation—based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking—to reduce the dimensions of images—and binarization—to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements. PMID:26091392

  16. Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines.

    PubMed

    del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J; Raboso, Mariano

    2015-06-17

    Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation-based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking-to reduce the dimensions of images-and binarization-to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements.

  17. Preprocessing and compression of Hyperspectral images captured onboard UAVs

    NASA Astrophysics Data System (ADS)

    Herrero, Rolando; Cadirola, Martin; Ingle, Vinay K.

    2015-10-01

    Advancements in image sensors and signal processing have led to the successful development of lightweight hyperspectral imaging systems that are critical to the deployment of Photometry and Remote Sensing (PaRS) capabilities in unmanned aerial vehicles (UAVs). In general, hyperspectral data cubes include a few dozens of spectral bands that are extremely useful for remote sensing applications that range from detection of land vegetation to monitoring of atmospheric products derived from the processing of lower level radiance images. Because these data cubes are captured in the challenging environment of UAVs, where resources are limited, source encoding by means of compression is a fundamental mechanism that considerably improves the overall system performance and reliability. In this paper, we focus on the hyperspectral images captured by a state-of-the-art commercial hyperspectral camera by showing the results of applying ultraspectral data compression to the obtained data set. Specifically the compression scheme that we introduce integrates two stages; (1) preprocessing and (2) compression itself. The outcomes of this procedure are linear prediction coefficients and an error signal that, when encoded, results in a compressed version of the original image. Second, preprocessing and compression algorithms are optimized and have their time complexity analyzed to guarantee their successful deployment using low power ARM based embedded processors in the context of UAVs. Lastly, we compare the proposed architecture against other well known schemes and show how the compression scheme presented in this paper outperforms all of them by providing substantial improvement and delivering both lower compression rates and lower distortion.

  18. Pre-Processing Effect on the Accuracy of Event-Based Activity Segmentation and Classification through Inertial Sensors.

    PubMed

    Fida, Benish; Bernabucci, Ivan; Bibbo, Daniele; Conforto, Silvia; Schmid, Maurizio

    2015-09-11

    Inertial sensors are increasingly being used to recognize and classify physical activities in a variety of applications. For monitoring and fitness applications, it is crucial to develop methods able to segment each activity cycle, e.g., a gait cycle, so that the successive classification step may be more accurate. To increase detection accuracy, pre-processing is often used, with a concurrent increase in computational cost. In this paper, the effect of pre-processing operations on the detection and classification of locomotion activities was investigated, to check whether the presence of pre-processing significantly contributes to an increase in accuracy. The pre-processing stages evaluated in this study were inclination correction and de-noising. Level walking, step ascending, descending and running were monitored by using a shank-mounted inertial sensor. Raw and filtered segments, obtained from a modified version of a rule-based gait detection algorithm optimized for sequential processing, were processed to extract time and frequency-based features for physical activity classification through a support vector machine classifier. The proposed method accurately detected >99% gait cycles from raw data and produced >98% accuracy on these segmented gait cycles. Pre-processing did not substantially increase classification accuracy, thus highlighting the possibility of reducing the amount of pre-processing for real-time applications.

  19. User microprogrammable processors for high data rate telemetry preprocessing

    NASA Technical Reports Server (NTRS)

    Pugsley, J. H.; Ogrady, E. P.

    1973-01-01

    The use of microprogrammable processors for the preprocessing of high data rate satellite telemetry is investigated. The following topics are discussed along with supporting studies: (1) evaluation of commercial microprogrammable minicomputers for telemetry preprocessing tasks; (2) microinstruction sets for telemetry preprocessing; and (3) the use of multiple minicomputers to achieve high data processing. The simulation of small microprogrammed processors is discussed along with examples of microprogrammed processors.

  20. An effective preprocessing method for finger vein recognition

    NASA Astrophysics Data System (ADS)

    Peng, JiaLiang; Li, Qiong; Wang, Ning; Abd El-Latif, Ahmed A.; Niu, Xiamu

    2013-07-01

    The image preprocessing plays an important role in finger vein recognition system. However, previous preprocessing schemes remind weakness to be resolved for the high finger vein recongtion performance. In this paper, we propose a new finger vein preprocessing that includes finger region localization, alignment, finger vein ROI segmentation and enhancement. The experimental results show that the proposed scheme is capable of enhancing the quality of finger vein image effectively and reliably.

  1. Functionally-focused algorithmic analysis of high resolution microarray-CGH genomic landscapes demonstrates comparable genomic copy number aberrations in MSI and MSS sporadic colorectal cancer

    PubMed Central

    Ali, Hamad; Bitar, Milad S.; Al Madhoun, Ashraf; Marafie, Makia; Al-Mulla, Fahd

    2017-01-01

    Array-based comparative genomic hybridization (aCGH) emerged as a powerful technology for studying copy number variations at higher resolution in many cancers including colorectal cancer. However, the lack of standardized systematic protocols including bioinformatic algorithms to obtain and analyze genomic data resulted in significant variation in the reported copy number aberration (CNA) data. Here, we present genomic aCGH data obtained using highly stringent and functionally relevant statistical algorithms from 116 well-defined microsatellites instable (MSI) and microsatellite stable (MSS) colorectal cancers. We utilized aCGH to characterize genomic CNAs in 116 well-defined sets of colorectal cancer (CRC) cases. We further applied the significance testing for aberrant copy number (STAC) and Genomic Identification of Significant Targets in Cancer (GISTIC) algorithms to identify functionally relevant (nonrandom) chromosomal aberrations in the analyzed colorectal cancer samples. Our results produced high resolution genomic landscapes of both, MSI and MSS sporadic CRC. We found that CNAs in MSI and MSS CRCs are heterogeneous in nature but may be divided into 3 distinct genomic patterns. Moreover, we show that although CNAs in MSI and MSS CRCs differ with respect to their size, number and chromosomal distribution, the functional copy number aberrations obtained from MSI and MSS CRCs were in fact comparable but not identical. These unifying CNAs were verified by MLPA tumor-loss gene panel, which spans 15 different chromosomal locations and contains 50 probes for at least 20 tumor suppressor genes. Consistently, deletion/amplification in these frequently cancer altered genes were identical in MSS and MSI CRCs. Our results suggest that MSI and MSS copy number aberrations driving CRC may be functionally comparable. PMID:28231327

  2. A preprocessing tool for removing artifact from cardiac RR interval recordings using three-dimensional spatial distribution mapping.

    PubMed

    Stapelberg, Nicolas J C; Neumann, David L; Shum, David H K; McConnell, Harry; Hamilton-Craig, Ian

    2016-04-01

    Artifact is common in cardiac RR interval data that is recorded for heart rate variability (HRV) analysis. A novel algorithm for artifact detection and interpolation in RR interval data is described. It is based on spatial distribution mapping of RR interval magnitude and relationships to adjacent values in three dimensions. The characteristics of normal physiological RR intervals and artifact intervals were established using 24-h recordings from 20 technician-assessed human cardiac recordings. The algorithm was incorporated into a preprocessing tool and validated using 30 artificial RR (ARR) interval data files, to which known quantities of artifact (0.5%, 1%, 2%, 3%, 5%, 7%, 10%) were added. The impact of preprocessing ARR files with 1% added artifact was also assessed using 10 time domain and frequency domain HRV metrics. The preprocessing tool was also used to preprocess 69 24-h human cardiac recordings. The tool was able to remove artifact from technician-assessed human cardiac recordings (sensitivity 0.84, SD = 0.09, specificity of 1.00, SD = 0.01) and artificial data files. The removal of artifact had a low impact on time domain and frequency domain HRV metrics (ranging from 0% to 2.5% change in values). This novel preprocessing tool can be used with human 24-h cardiac recordings to remove artifact while minimally affecting physiological data and therefore having a low impact on HRV measures of that data.

  3. Comparison of planar images and SPECT with bayesean preprocessing for the demonstration of facial anatomy and craniomandibular disorders

    SciTech Connect

    Kircos, L.T.; Ortendahl, D.A.; Hattner, R.S.; Faulkner, D.; Taylor, R.L.

    1984-01-01

    Craniomandiublar disorders involving the facial anatomy may be difficult to demonstrate in planar images. Although bone scanning is generally more sensitive than radiography, facial bone anatomy is complex and focal areas of increased or decreased radiotracer may become obscured by overlapping structures in planar images. Thus SPECT appears ideally suited to examination of the facial skeleton. A series of patients with craniomandibular disorders of unknown origin were imaged using 20 mCi Tc-99m MDP. Planar and SPECT (Siemens 7500 ZLC Orbiter) images were obtained four hours after injection. The SPECT images were reconstructed with a filtered back-projection algorithm. In order to improve image contrast and resolution in SPECT images, the rotation views were pre-processed with a Bayesean deblurring algorithm which has previously been show to offer improved contrast and resolution in planar images. SPECT images using the pre-processed rotation views were obtained and compared to the SPECT images without pre-processing and the planar images. TMJ arthropathy involving either the glenoid fossa or the mandibular condyle, orthopedic changes involving the mandible or maxilla, localized dental pathosis, as well as changes in structures peripheral to the facial skeleton were identified. Bayesean pre-processed SPECT depicted the facial skeleton more clearly as well as providing a more obvious demonstration of the bony changes associated with craniomandibular disorders than either planar images or SPECT without pre-processing.

  4. How cyanobacteria pose new problems to old methods: challenges in microarray time series analysis

    PubMed Central

    2013-01-01

    Background The transcriptomes of several cyanobacterial strains have been shown to exhibit diurnal oscillation patterns, reflecting the diurnal phototrophic lifestyle of the organisms. The analysis of such genome-wide transcriptional oscillations is often facilitated by the use of clustering algorithms in conjunction with a number of pre-processing steps. Biological interpretation is usually focussed on the time and phase of expression of the resulting groups of genes. However, the use of microarray technology in such studies requires the normalization of pre-processing data, with unclear impact on the qualitative and quantitative features of the derived information on the number of oscillating transcripts and their respective phases. Results A microarray based evaluation of diurnal expression in the cyanobacterium Synechocystis sp. PCC 6803 is presented. As expected, the temporal expression patterns reveal strong oscillations in transcript abundance. We compare the Fourier transformation-based expression phase before and after the application of quantile normalization, median polishing, cyclical LOESS, and least oscillating set (LOS) normalization. Whereas LOS normalization mostly preserves the phases of the raw data, the remaining methods introduce systematic biases. In particular, quantile-normalization is found to introduce a phase-shift of 180°, effectively changing night-expressed genes into day-expressed ones. Comparison of a large number of clustering results of differently normalized data shows that the normalization method determines the result. Subsequent steps, such as the choice of data transformation, similarity measure, and clustering algorithm, only play minor roles. We find that the standardization and the DTF transformation are favorable for the clustering of time series in contrast to the 12 m transformation. We use the cluster-wise functional enrichment of a clustering derived by LOS normalization, clustering using flowClust, and DFT

  5. Antibiotic treatment algorithm development based on a microarray nucleic acid assay for rapid bacterial identification and resistance determination from positive blood cultures.

    PubMed

    Rödel, Jürgen; Karrasch, Matthias; Edel, Birgit; Stoll, Sylvia; Bohnert, Jürgen; Löffler, Bettina; Saupe, Angela; Pfister, Wolfgang

    2016-03-01

    Rapid diagnosis of bloodstream infections remains a challenge for the early targeting of an antibiotic therapy in sepsis patients. In recent studies, the reliability of the Nanosphere Verigene Gram-positive and Gram-negative blood culture (BC-GP and BC-GN) assays for the rapid identification of bacteria and resistance genes directly from positive BCs has been demonstrated. In this work, we have developed a model to define treatment recommendations by combining Verigene test results with knowledge on local antibiotic resistance patterns of bacterial pathogens. The data of 275 positive BCs were analyzed. Two hundred sixty-three isolates (95.6%) were included in the Verigene assay panels, and 257 isolates (93.5%) were correctly identified. The agreement of the detection of resistance genes with subsequent phenotypic susceptibility testing was 100%. The hospital antibiogram was used to develop a treatment algorithm on the basis of Verigene results that may contribute to a faster patient management.

  6. Microarray in parasitic infections

    PubMed Central

    Sehgal, Rakesh; Misra, Shubham; Anand, Namrata; Sharma, Monika

    2012-01-01

    Modern biology and genomic sciences are rooted in parasitic disease research. Genome sequencing efforts have provided a wealth of new biological information that promises to have a major impact on our understanding of parasites. Microarrays provide one of the major high-throughput platforms by which this information can be exploited in the laboratory. Many excellent reviews and technique articles have recently been published on applying microarrays to organisms for which fully annotated genomes are at hand. However, many parasitologists work on organisms whose genomes have been only partially sequenced. This review is mainly focused on how to use microarray in these situations. PMID:23508469

  7. Data preprocessing methods of FT-NIR spectral data for the classification cooking oil

    NASA Astrophysics Data System (ADS)

    Ruah, Mas Ezatul Nadia Mohd; Rasaruddin, Nor Fazila; Fong, Sim Siong; Jaafar, Mohd Zuli

    2014-12-01

    This recent work describes the data pre-processing method of FT-NIR spectroscopy datasets of cooking oil and its quality parameters with chemometrics method. Pre-processing of near-infrared (NIR) spectral data has become an integral part of chemometrics modelling. Hence, this work is dedicated to investigate the utility and effectiveness of pre-processing algorithms namely row scaling, column scaling and single scaling process with Standard Normal Variate (SNV). The combinations of these scaling methods have impact on exploratory analysis and classification via Principle Component Analysis plot (PCA). The samples were divided into palm oil and non-palm cooking oil. The classification model was build using FT-NIR cooking oil spectra datasets in absorbance mode at the range of 4000cm-1-14000cm-1. Savitzky Golay derivative was applied before developing the classification model. Then, the data was separated into two sets which were training set and test set by using Duplex method. The number of each class was kept equal to 2/3 of the class that has the minimum number of sample. Then, the sample was employed t-statistic as variable selection method in order to select which variable is significant towards the classification models. The evaluation of data pre-processing were looking at value of modified silhouette width (mSW), PCA and also Percentage Correctly Classified (%CC). The results show that different data processing strategies resulting to substantial amount of model performances quality. The effects of several data pre-processing i.e. row scaling, column standardisation and single scaling process with Standard Normal Variate indicated by mSW and %CC. At two PCs model, all five classifier gave high %CC except Quadratic Distance Analysis.

  8. Measurement data preprocessing in a radar-based system for monitoring of human movements

    NASA Astrophysics Data System (ADS)

    Morawski, Roman Z.; Miȩkina, Andrzej; Bajurko, Paweł R.

    2015-02-01

    The importance of research on new technologies that could be employed in care services for elderly people is highlighted. The need to examine the applicability of various sensor systems for non-invasive monitoring of the movements and vital bodily functions, such as heart beat or breathing rhythm, of elderly persons in their home environment is justified. An extensive overview of the literature concerning existing monitoring techniques is provided. A technological potential behind radar sensors is indicated. A new class of algorithms for preprocessing of measurement data from impulse radar sensors, when applied for elderly people monitoring, is proposed. Preliminary results of numerical experiments performed on those algorithms are demonstrated.

  9. Effective automated prediction of vertebral column pathologies based on logistic model tree with SMOTE preprocessing.

    PubMed

    Karabulut, Esra Mahsereci; Ibrikci, Turgay

    2014-05-01

    This study develops a logistic model tree based automation system based on for accurate recognition of types of vertebral column pathologies. Six biomechanical measures are used for this purpose: pelvic incidence, pelvic tilt, lumbar lordosis angle, sacral slope, pelvic radius and grade of spondylolisthesis. A two-phase classification model is employed in which the first step is preprocessing the data by use of Synthetic Minority Over-sampling Technique (SMOTE), and the second one is feeding the classifier Logistic Model Tree (LMT) with the preprocessed data. We have achieved an accuracy of 89.73 %, and 0.964 Area Under Curve (AUC) in computer based automatic detection of the pathology. This was validated via a 10-fold-cross-validation experiment conducted on clinical records of 310 patients. The study also presents a comparative analysis of the vertebral column data with the use of several machine learning algorithms.

  10. A perceptual preprocess method for 3D-HEVC

    NASA Astrophysics Data System (ADS)

    Shi, Yawen; Wang, Yongfang; Wang, Yubing

    2015-08-01

    A perceptual preprocessing method for 3D-HEVC coding is proposed in the paper. Firstly we proposed a new JND model, which accounts for luminance contrast masking effect, spatial masking effect, and temporal masking effect, saliency characteristic as well as depth information. We utilize spectral residual approach to obtain the saliency map and built a visual saliency factor based on saliency map. In order to distinguish the sensitivity of objects in different depth. We segment each texture frame into foreground and background by a automatic threshold selection algorithm using corresponding depth information, and then built a depth weighting factor. A JND modulation factor is built with a linear combined with visual saliency factor and depth weighting factor to adjust the JND threshold. Then, we applied the proposed JND model to 3D-HEVC for residual filtering and distortion coefficient processing. The filtering process is that the residual value will be set to zero if the JND threshold is greater than residual value, or directly subtract the JND threshold from residual value if JND threshold is less than residual value. Experiment results demonstrate that the proposed method can achieve average bit rate reduction of 15.11%, compared to the original coding scheme with HTM12.1, while maintains the same subjective quality.

  11. Wavelet-based illumination invariant preprocessing in face recognition

    NASA Astrophysics Data System (ADS)

    Goh, Yi Zheng; Teoh, Andrew Beng Jin; Goh, Kah Ong Michael

    2009-04-01

    Performance of a contemporary two-dimensional face-recognition system has not been satisfied due to the variation in lighting. As a result, many works of solving illumination variation in face recognition have been carried out in past decades. Among them, the Illumination-Reflectance model is one of the generic models that is used to separate the individual reflectance and illumination components of an object. The illumination component can be removed by means of image-processing techniques to regain an intrinsic face feature, which is depicted by the reflectance component. We present a wavelet-based illumination invariant algorithm as a preprocessing technique for face recognition. On the basis of the multiresolution nature of wavelet analysis, we decompose both illumination and reflectance components from a face image in a systematic way. The illumination component wherein resides in the low-spatial-frequency subband can be eliminated efficiently. This technique works out very advantageously for achieving higher recognition performance on YaleB, CMU PIE, and FRGC face databases.

  12. SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read

    PubMed Central

    2010-01-01

    Background High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. Results SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. Conclusions SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts. PMID:20089148

  13. Spatial-spectral preprocessing for endmember extraction on GPU's

    NASA Astrophysics Data System (ADS)

    Jimenez, Luis I.; Plaza, Javier; Plaza, Antonio; Li, Jun

    2016-10-01

    Spectral unmixing is focused in the identification of spectrally pure signatures, called endmembers, and their corresponding abundances in each pixel of a hyperspectral image. Mainly focused on the spectral information contained in the hyperspectral images, endmember extraction techniques have recently included spatial information to achieve more accurate results. Several algorithms have been developed for automatic or semi-automatic identification of endmembers using spatial and spectral information, including the spectral-spatial endmember extraction (SSEE) where, within a preprocessing step in the technique, both sources of information are extracted from the hyperspectral image and equally used for this purpose. Previous works have implemented the SSEE technique in four main steps: 1) local eigenvectors calculation in each sub-region in which the original hyperspectral image is divided; 2) computation of the maxima and minima projection of all eigenvectors over the entire hyperspectral image in order to obtain a candidates pixels set; 3) expansion and averaging of the signatures of the candidate set; 4) ranking based on the spectral angle distance (SAD). The result of this method is a list of candidate signatures from which the endmembers can be extracted using various spectral-based techniques, such as orthogonal subspace projection (OSP), vertex component analysis (VCA) or N-FINDR. Considering the large volume of data and the complexity of the calculations, there is a need for efficient implementations. Latest- generation hardware accelerators such as commodity graphics processing units (GPUs) offer a good chance for improving the computational performance in this context. In this paper, we develop two different implementations of the SSEE algorithm using GPUs. Both are based on the eigenvectors computation within each sub-region of the first step, one using the singular value decomposition (SVD) and another one using principal component analysis (PCA). Based

  14. E-Predict: a computational strategy for species identification based on observed DNA microarray hybridization patterns.

    PubMed

    Urisman, Anatoly; Fischer, Kael F; Chiu, Charles Y; Kistler, Amy L; Beck, Shoshannah; Wang, David; DeRisi, Joseph L

    2005-01-01

    DNA microarrays may be used to identify microbial species present in environmental and clinical samples. However, automated tools for reliable species identification based on observed microarray hybridization patterns are lacking. We present an algorithm, E-Predict, for microarray-based species identification. E-Predict compares observed hybridization patterns with theoretical energy profiles representing different species. We demonstrate the application of the algorithm to viral detection in a set of clinical samples and discuss its relevance to other metagenomic applications.

  15. A survey of visual preprocessing and shape representation techniques

    NASA Technical Reports Server (NTRS)

    Olshausen, Bruno A.

    1988-01-01

    Many recent theories and methods proposed for visual preprocessing and shape representation are summarized. The survey brings together research from the fields of biology, psychology, computer science, electrical engineering, and most recently, neural networks. It was motivated by the need to preprocess images for a sparse distributed memory (SDM), but the techniques presented may also prove useful for applying other associative memories to visual pattern recognition. The material of this survey is divided into three sections: an overview of biological visual processing; methods of preprocessing (extracting parts of shape, texture, motion, and depth); and shape representation and recognition (form invariance, primitives and structural descriptions, and theories of attention).

  16. Identification of Pou5f1, Sox2, and Nanog downstream target genes with statistical confidence by applying a novel algorithm to time course microarray and genome-wide chromatin immunoprecipitation data

    PubMed Central

    Sharov, Alexei A; Masui, Shinji; Sharova, Lioudmila V; Piao, Yulan; Aiba, Kazuhiro; Matoba, Ryo; Xin, Li; Niwa, Hitoshi; Ko, Minoru SH

    2008-01-01

    Background Target genes of a transcription factor (TF) Pou5f1 (Oct3/4 or Oct4), which is essential for pluripotency maintenance and self-renewal of embryonic stem (ES) cells, have previously been identified based on their response to Pou5f1 manipulation and occurrence of Chromatin-immunoprecipitation (ChIP)-binding sites in promoters. However, many responding genes with binding sites may not be direct targets because response may be mediated by other genes and ChIP-binding site may not be functional in terms of transcription regulation. Results To reduce the number of false positives, we propose to separate responding genes into groups according to direction, magnitude, and time of response, and to apply the false discovery rate (FDR) criterion to each group individually. Using this novel algorithm with stringent statistical criteria (FDR < 0.2) to a compendium of published and new microarray data (3, 6, 12, and 24 hr after Pou5f1 suppression) and published ChIP data, we identified 420 tentative target genes (TTGs) for Pou5f1. The majority of TTGs (372) were down-regulated after Pou5f1 suppression, indicating that the Pou5f1 functions as an activator of gene expression when it binds to promoters. Interestingly, many activated genes are potent suppressors of transcription, which include polycomb genes, zinc finger TFs, chromatin remodeling factors, and suppressors of signaling. Similar analysis showed that Sox2 and Nanog also function mostly as transcription activators in cooperation with Pou5f1. Conclusion We have identified the most reliable sets of direct target genes for key pluripotency genes – Pou5f1, Sox2, and Nanog, and found that they predominantly function as activators of downstream gene expression. Thus, most genes related to cell differentiation are suppressed indirectly. PMID:18522731

  17. Multievidence microarray mining.

    PubMed

    Seifert, Martin; Scherf, Matthias; Epple, Anton; Werner, Thomas

    2005-10-01

    Microarray mining is a challenging task because of the superposition of several processes in the data. We believe that the combination of microarray data-based analyses (statistical significance analysis of gene expression) with array-independent analyses (literature-mining and promoter analysis) enables some of the problems of traditional array analysis to be overcome. As a proof-of-principle, we revisited publicly available microarray data derived from an experiment with platelet-derived growth factor (PDGF)-stimulated fibroblasts. Our strategy revealed results beyond the detection of the major metabolic pathway known to be linked to the PDGF response: we were able to identify the crosstalking regulatory networks underlying the metabolic pathway without using a priori knowledge about the experiment.

  18. DNA microarray technology. Introduction.

    PubMed

    Pollack, Jonathan R

    2009-01-01

    DNA microarray technology has revolutionized biological research by enabling genome-scale explorations. This chapter provides an overview of DNA microarray technology and its application to characterizing the physical genome, with a focus on cancer genomes. Specific areas discussed include investigations of DNA copy number alteration (and loss of heterozygosity), DNA methylation, DNA-protein (i.e., chromatin and transcription factor) interactions, DNA replication, and the integration of diverse genome-scale data types. Also provided is a perspective on recent advances and future directions in characterizing the physical genome.

  19. Solid Earth ARISTOTELES mission data preprocessing simulation of gravity gradiometer

    NASA Astrophysics Data System (ADS)

    Avanzi, G.; Stolfa, R.; Versini, B.

    Data preprocessing of the ARISTOTELES mission, which measures the Earth gravity gradient in a near polar orbit, was studied. The mission measures the gravity field at sea level through indirect measurements performed on the orbit, so that the evaluation steps consist in processing data from GRADIO accelerometer measurements. Due to the physical phenomena involved in the data collection experiment, it is possible to isolate at an initial stage a preprocessing of the gradiometer data based only on GRADIO measurements and not needing a detailed knowledge of the attitude and attitude rate sensors output. This preprocessing produces intermediate quantities used in future stages of the reduction. Software was designed and run to evaluate for this level of data reduction the achievable accuracy as a function of knowledge on instrument and satellite status parameters. The architecture of this element of preprocessing is described.

  20. Flexibility and utility of pre-processing methods in converting STXM setups for ptychography - Final Paper

    SciTech Connect

    Fromm, Catherine

    2015-08-20

    Ptychography is an advanced diffraction based imaging technique that can achieve resolution of 5nm and below. It is done by scanning a sample through a beam of focused x-rays using discrete yet overlapping scan steps. Scattering data is collected on a CCD camera, and the phase of the scattered light is reconstructed with sophisticated iterative algorithms. Because the experimental setup is similar, ptychography setups can be created by retrofitting existing STXM beam lines with new hardware. The other challenge comes in the reconstruction of the collected scattering images. Scattering data must be adjusted and packaged with experimental parameters to calibrate the reconstruction software. The necessary pre-processing of data prior to reconstruction is unique to each beamline setup, and even the optical alignments used on that particular day. Pre-processing software must be developed to be flexible and efficient in order to allow experiments appropriate control and freedom in the analysis of their hard-won data. This paper will describe the implementation of pre-processing software which successfully connects data collection steps to reconstruction steps, letting the user accomplish accurate and reliable ptychography.

  1. [Study of near infrared spectral preprocessing and wavelength selection methods for endometrial cancer tissue].

    PubMed

    Zhao, Li-Ting; Xiang, Yu-Hong; Dai, Yin-Mei; Zhang, Zhuo-Yong

    2010-04-01

    Near infrared spectroscopy was applied to measure the tissue slice of endometrial tissues for collecting the spectra. A total of 154 spectra were obtained from 154 samples. The number of normal, hyperplasia, and malignant samples was 36, 60, and 58, respectively. Original near infrared spectra are composed of many variables, for example, interference information including instrument errors and physical effects such as particle size and light scatter. In order to reduce these influences, original spectra data should be performed with different spectral preprocessing methods to compress variables and extract useful information. So the methods of spectral preprocessing and wavelength selection have played an important role in near infrared spectroscopy technique. In the present paper the raw spectra were processed using various preprocessing methods including first derivative, multiplication scatter correction, Savitzky-Golay first derivative algorithm, standard normal variate, smoothing, and moving-window median. Standard deviation was used to select the optimal spectral region of 4 000-6 000 cm(-1). Then principal component analysis was used for classification. Principal component analysis results showed that three types of samples could be discriminated completely and the accuracy almost achieved 100%. This study demonstrated that near infrared spectroscopy technology and chemometrics method could be a fast, efficient, and novel means to diagnose cancer. The proposed methods would be a promising and significant diagnosis technique of early stage cancer.

  2. Characterizing the continuously acquired cardiovascular time series during hemodialysis, using median hybrid filter preprocessing noise reduction

    PubMed Central

    Wilson, Scott; Bowyer, Andrea; Harrap, Stephen B

    2015-01-01

    The clinical characterization of cardiovascular dynamics during hemodialysis (HD) has important pathophysiological implications in terms of diagnostic, cardiovascular risk assessment, and treatment efficacy perspectives. Currently the diagnosis of significant intradialytic systolic blood pressure (SBP) changes among HD patients is imprecise and opportunistic, reliant upon the presence of hypotensive symptoms in conjunction with coincident but isolated noninvasive brachial cuff blood pressure (NIBP) readings. Considering hemodynamic variables as a time series makes a continuous recording approach more desirable than intermittent measures; however, in the clinical environment, the data signal is susceptible to corruption due to both impulsive and Gaussian-type noise. Signal preprocessing is an attractive solution to this problem. Prospectively collected continuous noninvasive SBP data over the short-break intradialytic period in ten patients was preprocessed using a novel median hybrid filter (MHF) algorithm and compared with 50 time-coincident pairs of intradialytic NIBP measures from routine HD practice. The median hybrid preprocessing technique for continuously acquired cardiovascular data yielded a dynamic regression without significant noise and artifact, suitable for high-level profiling of time-dependent SBP behavior. Signal accuracy is highly comparable with standard NIBP measurement, with the added clinical benefit of dynamic real-time hemodynamic information. PMID:25678827

  3. Comparing Binaural Pre-processing Strategies II

    PubMed Central

    Hu, Hongmei; Krawczyk-Becker, Martin; Marquardt, Daniel; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Bomke, Katrin; Plotz, Karsten; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias

    2015-01-01

    Several binaural audio signal enhancement algorithms were evaluated with respect to their potential to improve speech intelligibility in noise for users of bilateral cochlear implants (CIs). 50% speech reception thresholds (SRT50) were assessed using an adaptive procedure in three distinct, realistic noise scenarios. All scenarios were highly nonstationary, complex, and included a significant amount of reverberation. Other aspects, such as the perfectly frontal target position, were idealized laboratory settings, allowing the algorithms to perform better than in corresponding real-world conditions. Eight bilaterally implanted CI users, wearing devices from three manufacturers, participated in the study. In all noise conditions, a substantial improvement in SRT50 compared to the unprocessed signal was observed for most of the algorithms tested, with the largest improvements generally provided by binaural minimum variance distortionless response (MVDR) beamforming algorithms. The largest overall improvement in speech intelligibility was achieved by an adaptive binaural MVDR in a spatially separated, single competing talker noise scenario. A no-pre-processing condition and adaptive differential microphones without a binaural link served as the two baseline conditions. SRT50 improvements provided by the binaural MVDR beamformers surpassed the performance of the adaptive differential microphones in most cases. Speech intelligibility improvements predicted by instrumental measures were shown to account for some but not all aspects of the perceptually obtained SRT50 improvements measured in bilaterally implanted CI users. PMID:26721921

  4. Protein Microarray Technology

    PubMed Central

    Hall, David A.; Ptacek, Jason

    2007-01-01

    Protein chips have emerged as a promising approach for a wide variety of applications including the identification of protein-protein interactions, protein-phospholipid interactions, small molecule targets, and substrates of proteins kinases. They can also be used for clinical diagnostics and monitoring disease states. This article reviews current methods in the generation and applications of protein microarrays. PMID:17126887

  5. Microarrays for Undergraduate Classes

    ERIC Educational Resources Information Center

    Hancock, Dale; Nguyen, Lisa L.; Denyer, Gareth S.; Johnston, Jill M.

    2006-01-01

    A microarray experiment is presented that, in six laboratory sessions, takes undergraduate students from the tissue sample right through to data analysis. The model chosen, the murine erythroleukemia cell line, can be easily cultured in sufficient quantities for class use. Large changes in gene expression can be induced in these cells by…

  6. Enhancing interdisciplinary mathematics and biology education: a microarray data analysis course bridging these disciplines.

    PubMed

    Tra, Yolande V; Evans, Irene M

    2010-01-01

    BIO2010 put forth the goal of improving the mathematical educational background of biology students. The analysis and interpretation of microarray high-dimensional data can be very challenging and is best done by a statistician and a biologist working and teaching in a collaborative manner. We set up such a collaboration and designed a course on microarray data analysis. We started using Genome Consortium for Active Teaching (GCAT) materials and Microarray Genome and Clustering Tool software and added R statistical software along with Bioconductor packages. In response to student feedback, one microarray data set was fully analyzed in class, starting from preprocessing to gene discovery to pathway analysis using the latter software. A class project was to conduct a similar analysis where students analyzed their own data or data from a published journal paper. This exercise showed the impact that filtering, preprocessing, and different normalization methods had on gene inclusion in the final data set. We conclude that this course achieved its goals to equip students with skills to analyze data from a microarray experiment. We offer our insight about collaborative teaching as well as how other faculty might design and implement a similar interdisciplinary course.

  7. Enhancing Interdisciplinary Mathematics and Biology Education: A Microarray Data Analysis Course Bridging These Disciplines

    PubMed Central

    Evans, Irene M.

    2010-01-01

    BIO2010 put forth the goal of improving the mathematical educational background of biology students. The analysis and interpretation of microarray high-dimensional data can be very challenging and is best done by a statistician and a biologist working and teaching in a collaborative manner. We set up such a collaboration and designed a course on microarray data analysis. We started using Genome Consortium for Active Teaching (GCAT) materials and Microarray Genome and Clustering Tool software and added R statistical software along with Bioconductor packages. In response to student feedback, one microarray data set was fully analyzed in class, starting from preprocessing to gene discovery to pathway analysis using the latter software. A class project was to conduct a similar analysis where students analyzed their own data or data from a published journal paper. This exercise showed the impact that filtering, preprocessing, and different normalization methods had on gene inclusion in the final data set. We conclude that this course achieved its goals to equip students with skills to analyze data from a microarray experiment. We offer our insight about collaborative teaching as well as how other faculty might design and implement a similar interdisciplinary course. PMID:20810954

  8. Microarray data analysis for differential expression: a tutorial.

    PubMed

    Suárez, Erick; Burguete, Ana; Mclachlan, Geoffrey J

    2009-06-01

    DNA microarray is a technology that simultaneously evaluates quantitative measurements for the expression of thousands of genes. DNA microarrays have been used to assess gene expression between groups of cells of different organs or different populations. In order to understand the role and function of the genes, one needs the complete information about their mRNA transcripts and proteins. Unfortunately, exploring the protein functions is very difficult, due to their unique 3-dimentional complicated structure. To overcome this difficulty, one may concentrate on the mRNA molecules produced by the gene expression. In this paper, we describe some of the methods for preprocessing data for gene expression and for pairwise comparison from genomic experiments. Previous studies to assess the efficiency of different methods for pairwise comparisons have found little agreement in the lists of significant genes. Finally, we describe the procedures to control false discovery rates, sample size approach for these experiments, and available software for microarray data analysis. This paper is written for those professionals who are new in microarray data analysis for differential expression and want to have an overview of the specific steps or the different approaches for this sort of analysis.

  9. Comparing Binaural Pre-processing Strategies III: Speech Intelligibility of Normal-Hearing and Hearing-Impaired Listeners.

    PubMed

    Völker, Christoph; Warzybok, Anna; Ernst, Stephan M A

    2015-12-30

    A comprehensive evaluation of eight signal pre-processing strategies, including directional microphones, coherence filters, single-channel noise reduction, binaural beamformers, and their combinations, was undertaken with normal-hearing (NH) and hearing-impaired (HI) listeners. Speech reception thresholds (SRTs) were measured in three noise scenarios (multitalker babble, cafeteria noise, and single competing talker). Predictions of three common instrumental measures were compared with the general perceptual benefit caused by the algorithms. The individual SRTs measured without pre-processing and individual benefits were objectively estimated using the binaural speech intelligibility model. Ten listeners with NH and 12 HI listeners participated. The participants varied in age and pure-tone threshold levels. Although HI listeners required a better signal-to-noise ratio to obtain 50% intelligibility than listeners with NH, no differences in SRT benefit from the different algorithms were found between the two groups. With the exception of single-channel noise reduction, all algorithms showed an improvement in SRT of between 2.1 dB (in cafeteria noise) and 4.8 dB (in single competing talker condition). Model predictions with binaural speech intelligibility model explained 83% of the measured variance of the individual SRTs in the no pre-processing condition. Regarding the benefit from the algorithms, the instrumental measures were not able to predict the perceptual data in all tested noise conditions. The comparable benefit observed for both groups suggests a possible application of noise reduction schemes for listeners with different hearing status. Although the model can predict the individual SRTs without pre-processing, further development is necessary to predict the benefits obtained from the algorithms at an individual level.

  10. Design of radial basis function neural network classifier realized with the aid of data preprocessing techniques: design and analysis

    NASA Astrophysics Data System (ADS)

    Oh, Sung-Kwun; Kim, Wook-Dong; Pedrycz, Witold

    2016-05-01

    In this paper, we introduce a new architecture of optimized Radial Basis Function neural network classifier developed with the aid of fuzzy clustering and data preprocessing techniques and discuss its comprehensive design methodology. In the preprocessing part, the Linear Discriminant Analysis (LDA) or Principal Component Analysis (PCA) algorithm forms a front end of the network. The transformed data produced here are used as the inputs of the network. In the premise part, the Fuzzy C-Means (FCM) algorithm determines the receptive field associated with the condition part of the rules. The connection weights of the classifier are of functional nature and come as polynomial functions forming the consequent part. The Particle Swarm Optimization algorithm optimizes a number of essential parameters needed to improve the accuracy of the classifier. Those optimized parameters include the type of data preprocessing, the dimensionality of the feature vectors produced by the LDA (or PCA), the number of clusters (rules), the fuzzification coefficient used in the FCM algorithm and the orders of the polynomials of networks. The performance of the proposed classifier is reported for several benchmarking data-sets and is compared with the performance of other classifiers reported in the previous studies.

  11. Effects of preprocessing method on TVOC emission of car mat

    NASA Astrophysics Data System (ADS)

    Wang, Min; Jia, Li

    2013-02-01

    The effects of the mat preprocessing method on total volatile organic compounds (TVOC) emission of car mat are studied in this paper. An appropriate TVOC emission period for car mat is suggested. The emission factors for total volatile organic compounds from three kinds of new car mats are discussed. The car mats are preprocessed by washing, baking and ventilation. When car mats are preprocessed by washing, the TVOC emission for all samples tested are lower than that preprocessed in other methods. The TVOC emission is in stable situation for a minimum of 4 days. The TVOC emitted from some samples may exceed 2500μg/kg. But the TVOC emitted from washed Polyamide (PA) and wool mat is less than 2500μg/kg. The emission factors of total volatile organic compounds (TVOC) are experimentally investigated in the case of different preprocessing methods. The air temperature in environment chamber and the water temperature for washing are important factors influencing on emission of car mats.

  12. EARLINET Single Calculus Chain - technical - Part 1: Pre-processing of raw lidar data

    NASA Astrophysics Data System (ADS)

    D'Amico, Giuseppe; Amodeo, Aldo; Mattis, Ina; Freudenthaler, Volker; Pappalardo, Gelsomina

    2016-02-01

    In this paper we describe an automatic tool for the pre-processing of aerosol lidar data called ELPP (EARLINET Lidar Pre-Processor). It is one of two calculus modules of the EARLINET Single Calculus Chain (SCC), the automatic tool for the analysis of EARLINET data. ELPP is an open source module that executes instrumental corrections and data handling of the raw lidar signals, making the lidar data ready to be processed by the optical retrieval algorithms. According to the specific lidar configuration, ELPP automatically performs dead-time correction, atmospheric and electronic background subtraction, gluing of lidar signals, and trigger-delay correction. Moreover, the signal-to-noise ratio of the pre-processed signals can be improved by means of configurable time integration of the raw signals and/or spatial smoothing. ELPP delivers the statistical uncertainties of the final products by means of error propagation or Monte Carlo simulations. During the development of ELPP, particular attention has been payed to make the tool flexible enough to handle all lidar configurations currently used within the EARLINET community. Moreover, it has been designed in a modular way to allow an easy extension to lidar configurations not yet implemented. The primary goal of ELPP is to enable the application of quality-assured procedures in the lidar data analysis starting from the raw lidar data. This provides the added value of full traceability of each delivered lidar product. Several tests have been performed to check the proper functioning of ELPP. The whole SCC has been tested with the same synthetic data sets, which were used for the EARLINET algorithm inter-comparison exercise. ELPP has been successfully employed for the automatic near-real-time pre-processing of the raw lidar data measured during several EARLINET inter-comparison campaigns as well as during intense field campaigns.

  13. EARLINET Single Calculus Chain - technical - Part 1: Pre-processing of raw lidar data

    NASA Astrophysics Data System (ADS)

    D'Amico, G.; Amodeo, A.; Mattis, I.; Freudenthaler, V.; Pappalardo, G.

    2015-10-01

    In this paper we describe an automatic tool for the pre-processing of lidar data called ELPP (EARLINET Lidar Pre-Processor). It is one of two calculus modules of the EARLINET Single Calculus Chain (SCC), the automatic tool for the analysis of EARLINET data. The ELPP is an open source module that executes instrumental corrections and data handling of the raw lidar signals, making the lidar data ready to be processed by the optical retrieval algorithms. According to the specific lidar configuration, the ELPP automatically performs dead-time correction, atmospheric and electronic background subtraction, gluing of lidar signals, and trigger-delay correction. Moreover, the signal-to-noise ratio of the pre-processed signals can be improved by means of configurable time integration of the raw signals and/or spatial smoothing. The ELPP delivers the statistical uncertainties of the final products by means of error propagation or Monte Carlo simulations. During the development of the ELPP module, particular attention has been payed to make the tool flexible enough to handle all lidar configurations currently used within the EARLINET community. Moreover, it has been designed in a modular way to allow an easy extension to lidar configurations not yet implemented. The primary goal of the ELPP module is to enable the application of quality-assured procedures in the lidar data analysis starting from the raw lidar data. This provides the added value of full traceability of each delivered lidar product. Several tests have been performed to check the proper functioning of the ELPP module. The whole SCC has been tested with the same synthetic data sets, which were used for the EARLINET algorithm inter-comparison exercise. The ELPP module has been successfully employed for the automatic near-real-time pre-processing of the raw lidar data measured during several EARLINET inter-comparison campaigns as well as during intense field campaigns.

  14. Analyzing Microarray Data.

    PubMed

    Hung, Jui-Hung; Weng, Zhiping

    2017-03-01

    Because there is no widely used software for analyzing RNA-seq data that has a graphical user interface, this protocol provides an example of analyzing microarray data using Babelomics. This analysis entails performing quantile normalization and then detecting differentially expressed genes associated with the transgenesis of a human oncogene c-Myc in mice. Finally, hierarchical clustering is performed on the differentially expressed genes using the Cluster program, and the results are visualized using TreeView.

  15. Membrane-based microarrays

    NASA Astrophysics Data System (ADS)

    Dawson, Elliott P.; Hudson, James; Steward, John; Donnell, Philip A.; Chan, Wing W.; Taylor, Richard F.

    1999-11-01

    Microarrays represent a new approach to the rapid detection and identification of analytes. Studies to date have shown that the immobilization of receptor molecules (such as DNA, oligonucleotides, antibodies, enzymes and binding proteins) onto silicon and polymeric substrates can result in arrays able to detect hundreds of analytes in a single step. The formation of the receptor/analyte complex can, itself, lead to detection, or the complex can be interrogated through the use of fluorescent, chemiluminescent or radioactive probes and ligands.

  16. Molecular diagnosis and prognosis with DNA microarrays.

    PubMed

    Wiltgen, Marco; Tilz, Gernot P

    2011-05-01

    Microarray analysis makes it possible to determine thousands of gene expression values simultaneously. Changes in gene expression, as a response to diseases, can be detected allowing a better understanding and differentiation of diseases at a molecular level. By comparing different kinds of tissue, for example healthy tissue and cancer tissue, the microarray analysis indicates induced gene activity, repressed gene activity or when there is no change in the gene activity level. Fundamental patterns in gene expression are extracted by several clustering and machine learning algorithms. Certain kinds of cancer can be divided into subtypes, with different clinical outcomes, by their specific gene expression patterns. This enables a better diagnosis and tailoring of individual patient treatments.

  17. EEG preprocessing for synchronization estimation and epilepsy lateralization.

    PubMed

    Vélez-Pérez, H; Romo-Vázquez, R; Ranta, R; Louis-Dorr, V; Maillard, L

    2011-01-01

    The global framework of this paper is the synchronization analysis in EEG recordings. Two main objectives are pursued: the evaluation of the synchronization estimation for lateralization purposes in epileptic EEGs and the evaluation of the effect of the preprocessing (artifact and noise cancelling by blind source separation, wavelet denoising and classification) on the synchronization analysis. We propose a new global synchronization index, based on the classical cross power spectrum, estimated for each cerebral hemisphere. After preprocessing, the proposed index is able to correctly lateralize the epileptic zone in over 90% of the cases.

  18. Empirical evaluation of oligonucleotide probe selection for DNA microarrays.

    PubMed

    Mulle, Jennifer G; Patel, Viren C; Warren, Stephen T; Hegde, Madhuri R; Cutler, David J; Zwick, Michael E

    2010-03-29

    DNA-based microarrays are increasingly central to biomedical research. Selecting oligonucleotide sequences that will behave consistently across experiments is essential to the design, production and performance of DNA microarrays. Here our aim was to improve on probe design parameters by empirically and systematically evaluating probe performance in a multivariate context. We used experimental data from 19 array CGH hybridizations to assess the probe performance of 385,474 probes tiled in the Duchenne muscular dystrophy (DMD) region of the X chromosome. Our results demonstrate that probe melting temperature, single nucleotide polymorphisms (SNPs), and homocytosine motifs all have a strong effect on probe behavior. These findings, when incorporated into future microarray probe selection algorithms, may improve microarray performance for a wide variety of applications.

  19. Surface chemistries for antibody microarrays

    SciTech Connect

    Seurynck-Servoss, Shannon L.; Baird, Cheryl L.; Rodland, Karin D.; Zangar, Richard C.

    2007-05-01

    Enzyme-linked immunosorbent assay (ELISA) microarrays promise to be a powerful tool for the detection of disease biomarkers. The original technology for printing ELISA microarray chips and capturing antibodies on slides was derived from the DNA microarray field. However, due to the need to maintain antibody structure and function when immobilized, surface chemistries used for DNA microarrays are not always appropriate for ELISA microarrays. In order to identify better surface chemistries for antibody capture, a number of commercial companies and academic research groups have developed new slide types that could improve antibody function in microarray applications. In this review we compare and contrast the commercially available slide chemistries, as well as highlight some promising recent advances in the field.

  20. OPSN: The IMS COMSYS 1 and 2 Data Preprocessing System.

    ERIC Educational Resources Information Center

    Yu, John

    The Instructional Management System (IMS) developed by the Southwest Regional Laboratory (SWRL) processes student and teacher-generated data through the use of an optical scanner that produces a magnetic tape (Scan Tape) for input to IMS. A series of computer routines, OPSN, preprocesses the Scan Tape and prepares the data for transmission to the…

  1. A Glance at DNA Microarray Technology and Applications

    PubMed Central

    Saei, Amir Ata; Omidi, Yadollah

    2011-01-01

    Introduction Because of huge impacts of “OMICS” technologies in life sciences, many researchers aim to implement such high throughput approach to address cellular and/or molecular functions in response to any influential intervention in genomics, proteomics, or metabolomics levels. However, in many cases, use of such technologies often encounters some cybernetic difficulties in terms of knowledge extraction from a bunch of data using related softwares. In fact, there is little guidance upon data mining for novices. The main goal of this article is to provide a brief review on different steps of microarray data handling and mining for novices and at last to introduce different PC and/or web-based softwares that can be used in preprocessing and/or data mining of microarray data. Methods To pursue such aim, recently published papers and microarray softwares were reviewed. Results It was found that defining the true place of the genes in cell networks is the main phase in our understanding of programming and functioning of living cells. This can be obtained with global/selected gene expression profiling. Conclusion Studying the regulation patterns of genes in groups, using clustering and classification methods helps us understand different pathways in the cell, their functions, regulations and the way one component in the system affects the other one. These networks can act as starting points for data mining and hypothesis generation, helping us reverse engineer. PMID:23678411

  2. Microarrays in cancer research.

    PubMed

    Grant, Geraldine M; Fortney, Amanda; Gorreta, Francesco; Estep, Michael; Del Giacco, Luca; Van Meter, Amy; Christensen, Alan; Appalla, Lakshmi; Naouar, Chahla; Jamison, Curtis; Al-Timimi, Ali; Donovan, Jean; Cooper, James; Garrett, Carleton; Chandhoke, Vikas

    2004-01-01

    Microarray technology has presented the scientific community with a compelling approach that allows for simultaneous evaluation of all cellular processes at once. Cancer, being one of the most challenging diseases due to its polygenic nature, presents itself as a perfect candidate for evaluation by this approach. Several recent articles have provided significant insight into the strengths and limitations of microarrays. Nevertheless, there are strong indications that this approach will provide new molecular markers that could be used in diagnosis and prognosis of cancers. To achieve these goals it is essential that there is a seamless integration of clinical and molecular biological data that allows us to elucidate genes and pathways involved in various cancers. To this effect we are currently evaluating gene expression profiles in human brain, ovarian, breast and hematopoetic, lung, colorectal, head and neck and biliary tract cancers. To address the issues we have a joint team of scientists, doctors and computer scientists from two Virginia Universities and a major healthcare provider. The study has been divided into several focus groups that include; Tissue Bank Clinical & Pathology Laboratory Data, Chip Fabrication, QA/QC, Tissue Devitalization, Database Design and Data Analysis, using multiple microarray platforms. Currently over 300 consenting patients have been enrolled in the study with the largest number being that of breast cancer patients. Clinical data on each patient is being compiled into a secure and interactive relational database and integration of these data elements will be accomplished by a common programming interface. This clinical database contains several key parameters on each patient including demographic (risk factors, nutrition, co-morbidity, familial history), histopathology (non genetic predictors), tumor, treatment and follow-up information. Gene expression data derived from the tissue samples will be linked to this database, which

  3. The Genopolis Microarray Database

    PubMed Central

    Splendiani, Andrea; Brandizi, Marco; Even, Gael; Beretta, Ottavio; Pavelka, Norman; Pelizzola, Mattia; Mayhaus, Manuel; Foti, Maria; Mauri, Giancarlo; Ricciardi-Castagnoli, Paola

    2007-01-01

    Background Gene expression databases are key resources for microarray data management and analysis and the importance of a proper annotation of their content is well understood. Public repositories as well as microarray database systems that can be implemented by single laboratories exist. However, there is not yet a tool that can easily support a collaborative environment where different users with different rights of access to data can interact to define a common highly coherent content. The scope of the Genopolis database is to provide a resource that allows different groups performing microarray experiments related to a common subject to create a common coherent knowledge base and to analyse it. The Genopolis database has been implemented as a dedicated system for the scientific community studying dendritic and macrophage cells functions and host-parasite interactions. Results The Genopolis Database system allows the community to build an object based MIAME compliant annotation of their experiments and to store images, raw and processed data from the Affymetrix GeneChip® platform. It supports dynamical definition of controlled vocabularies and provides automated and supervised steps to control the coherence of data and annotations. It allows a precise control of the visibility of the database content to different sub groups in the community and facilitates exports of its content to public repositories. It provides an interactive users interface for data analysis: this allows users to visualize data matrices based on functional lists and sample characterization, and to navigate to other data matrices defined by similarity of expression values as well as functional characterizations of genes involved. A collaborative environment is also provided for the definition and sharing of functional annotation by users. Conclusion The Genopolis Database supports a community in building a common coherent knowledge base and analyse it. This fills a gap between a local

  4. DNA Microarray-Based Diagnostics.

    PubMed

    Marzancola, Mahsa Gharibi; Sedighi, Abootaleb; Li, Paul C H

    2016-01-01

    The DNA microarray technology is currently a useful biomedical tool which has been developed for a variety of diagnostic applications. However, the development pathway has not been smooth and the technology has faced some challenges. The reliability of the microarray data and also the clinical utility of the results in the early days were criticized. These criticisms added to the severe competition from other techniques, such as next-generation sequencing (NGS), impacting the growth of microarray-based tests in the molecular diagnostic market.Thanks to the advances in the underlying technologies as well as the tremendous effort offered by the research community and commercial vendors, these challenges have mostly been addressed. Nowadays, the microarray platform has achieved sufficient standardization and method validation as well as efficient probe printing, liquid handling and signal visualization. Integration of various steps of the microarray assay into a harmonized and miniaturized handheld lab-on-a-chip (LOC) device has been a goal for the microarray community. In this respect, notable progress has been achieved in coupling the DNA microarray with the liquid manipulation microsystem as well as the supporting subsystem that will generate the stand-alone LOC device.In this chapter, we discuss the major challenges that microarray technology has faced in its almost two decades of development and also describe the solutions to overcome the challenges. In addition, we review the advancements of the technology, especially the progress toward developing the LOC devices for DNA diagnostic applications.

  5. Living-cell microarrays.

    PubMed

    Yarmush, Martin L; King, Kevin R

    2009-01-01

    Living cells are remarkably complex. To unravel this complexity, living-cell assays have been developed that allow delivery of experimental stimuli and measurement of the resulting cellular responses. High-throughput adaptations of these assays, known as living-cell microarrays, which are based on microtiter plates, high-density spotting, microfabrication, and microfluidics technologies, are being developed for two general applications: (a) to screen large-scale chemical and genomic libraries and (b) to systematically investigate the local cellular microenvironment. These emerging experimental platforms offer exciting opportunities to rapidly identify genetic determinants of disease, to discover modulators of cellular function, and to probe the complex and dynamic relationships between cells and their local environment.

  6. Microarray oligonucleotide probe designer (MOPeD): A web service.

    PubMed

    Patel, Viren C; Mondal, Kajari; Shetty, Amol Carl; Horner, Vanessa L; Bedoyan, Jirair K; Martin, Donna; Caspary, Tamara; Cutler, David J; Zwick, Michael E

    2010-11-01

    Methods of genomic selection that combine high-density oligonucleotide microarrays with next-generation DNA sequencing allow investigators to characterize genomic variation in selected portions of complex eukaryotic genomes. Yet choosing which specific oligonucleotides to be use can pose a major technical challenge. To address this issue, we have developed a software package called MOPeD (Microarray Oligonucleotide Probe Designer), which automates the process of designing genomic selection microarrays. This web-based software allows individual investigators to design custom genomic selection microarrays optimized for synthesis with Roche NimbleGen's maskless photolithography. Design parameters include uniqueness of the probe sequences, melting temperature, hairpin formation, and the presence of single nucleotide polymorphisms. We generated probe databases for the human, mouse, and rhesus macaque genomes and conducted experimental validation of MOPeD-designed microarrays in human samples by sequencing the human X chromosome exome, where relevant sequence metrics indicated superior performance relative to a microarray designed by the Roche NimbleGen proprietary algorithm. We also performed validation in the mouse to identify known mutations contained within a 487-kb region from mouse chromosome 16, the mouse chromosome 16 exome (1.7 Mb), and the mouse chromosome 12 exome (3.3 Mb). Our results suggest that the open source MOPeD software package and website (http://moped.genetics.emory.edu/) will make a valuable resource for investigators in their sequence-based studies of complex eukaryotic genomes.

  7. The Longhorn Array Database (LAD): An Open-Source, MIAME compliant implementation of the Stanford Microarray Database (SMD)

    PubMed Central

    Killion, Patrick J; Sherlock, Gavin; Iyer, Vishwanath R

    2003-01-01

    Background The power of microarray analysis can be realized only if data is systematically archived and linked to biological annotations as well as analysis algorithms. Description The Longhorn Array Database (LAD) is a MIAME compliant microarray database that operates on PostgreSQL and Linux. It is a fully open source version of the Stanford Microarray Database (SMD), one of the largest microarray databases. LAD is available at Conclusions Our development of LAD provides a simple, free, open, reliable and proven solution for storage and analysis of two-color microarray data. PMID:12930545

  8. Linguistic Preprocessing and Tagging for Problem Report Trend Analysis

    NASA Technical Reports Server (NTRS)

    Beil, Robert J.; Malin, Jane T.

    2012-01-01

    Mr. Robert Beil, Systems Engineer at Kennedy Space Center (KSC), requested the NASA Engineering and Safety Center (NESC) develop a prototype tool suite that combines complementary software technology used at Johnson Space Center (JSC) and KSC for problem report preprocessing and semantic tag extraction, to improve input to data mining and trend analysis. This document contains the outcome of the assessment and the Findings, Observations and NESC Recommendations.

  9. A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps.

    PubMed

    Tugizimana, Fidele; Steenkamp, Paul A; Piater, Lizelle A; Dubery, Ian A

    2016-11-03

    Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the "exhaustive" extraction of information from these metabolomic datasets is still a non-trivial undertaking. A conversation on data mining strategies for a maximal information extraction from metabolomic data is needed. Using a liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomic dataset, this study explored the influence of collection parameters in the data pre-processing step, scaling and data transformation on the statistical models generated, and feature selection, thereafter. Data obtained in positive mode generated from a LC-MS-based untargeted metabolomic study (sorghum plants responding dynamically to infection by a fungal pathogen) were used. Raw data were pre-processed with MarkerLynx(TM) software (Waters Corporation, Manchester, UK). Here, two parameters were varied: the intensity threshold (50-100 counts) and the mass tolerance (0.005-0.01 Da). After the pre-processing, the datasets were imported into SIMCA (Umetrics, Umea, Sweden) for more data cleaning and statistical modeling. In addition, different scaling (unit variance, Pareto, etc.) and data transformation (log and power) methods were explored. The results showed that the pre-processing parameters (or algorithms) influence the output dataset with regard to the number of defined features. Furthermore, the study demonstrates that the pre-treatment of data prior to statistical modeling affects the subspace approximation outcome: e.g., the amount of variation in X-data that the model can explain and predict. The pre-processing and pre-treatment steps subsequently influence the number of statistically significant extracted/selected features (variables). Thus, as informed by the results, to maximize the value of untargeted metabolomic data, understanding

  10. A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps

    PubMed Central

    Tugizimana, Fidele; Steenkamp, Paul A.; Piater, Lizelle A.; Dubery, Ian A.

    2016-01-01

    Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the “exhaustive” extraction of information from these metabolomic datasets is still a non-trivial undertaking. A conversation on data mining strategies for a maximal information extraction from metabolomic data is needed. Using a liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomic dataset, this study explored the influence of collection parameters in the data pre-processing step, scaling and data transformation on the statistical models generated, and feature selection, thereafter. Data obtained in positive mode generated from a LC-MS-based untargeted metabolomic study (sorghum plants responding dynamically to infection by a fungal pathogen) were used. Raw data were pre-processed with MarkerLynxTM software (Waters Corporation, Manchester, UK). Here, two parameters were varied: the intensity threshold (50–100 counts) and the mass tolerance (0.005–0.01 Da). After the pre-processing, the datasets were imported into SIMCA (Umetrics, Umea, Sweden) for more data cleaning and statistical modeling. In addition, different scaling (unit variance, Pareto, etc.) and data transformation (log and power) methods were explored. The results showed that the pre-processing parameters (or algorithms) influence the output dataset with regard to the number of defined features. Furthermore, the study demonstrates that the pre-treatment of data prior to statistical modeling affects the subspace approximation outcome: e.g., the amount of variation in X-data that the model can explain and predict. The pre-processing and pre-treatment steps subsequently influence the number of statistically significant extracted/selected features (variables). Thus, as informed by the results, to maximize the value of untargeted metabolomic data

  11. Application of filtering techniques in preprocessing magnetic data

    NASA Astrophysics Data System (ADS)

    Liu, Haijun; Yi, Yongping; Yang, Hongxia; Hu, Guochuang; Liu, Guoming

    2010-08-01

    High precision magnetic exploration is a popular geophysical technique for its simplicity and its effectiveness. The explanation in high precision magnetic exploration is always a difficulty because of the existence of noise and disturbance factors, so it is necessary to find an effective preprocessing method to get rid of the affection of interference factors before further processing. The common way to do this work is by filtering. There are many kinds of filtering methods. In this paper we introduced in detail three popular kinds of filtering techniques including regularized filtering technique, sliding averages filtering technique, compensation smoothing filtering technique. Then we designed the work flow of filtering program based on these techniques and realized it with the help of DELPHI. To check it we applied it to preprocess magnetic data of a certain place in China. Comparing the initial contour map with the filtered contour map, we can see clearly the perfect effect our program. The contour map processed by our program is very smooth and the high frequency parts of data are disappeared. After filtering, we separated useful signals and noisy signals, minor anomaly and major anomaly, local anomaly and regional anomaly. It made us easily to focus on the useful information. Our program can be used to preprocess magnetic data. The results showed the effectiveness of our program.

  12. Review of feed forward neural network classification preprocessing techniques

    NASA Astrophysics Data System (ADS)

    Asadi, Roya; Kareem, Sameem Abdul

    2014-06-01

    The best feature of artificial intelligent Feed Forward Neural Network (FFNN) classification models is learning of input data through their weights. Data preprocessing and pre-training are the contributing factors in developing efficient techniques for low training time and high accuracy of classification. In this study, we investigate and review the powerful preprocessing functions of the FFNN models. Currently initialization of the weights is at random which is the main source of problems. Multilayer auto-encoder networks as the latest technique like other related techniques is unable to solve the problems. Weight Linear Analysis (WLA) is a combination of data pre-processing and pre-training to generate real weights through the use of normalized input values. The FFNN model by using the WLA increases classification accuracy and improve training time in a single epoch without any training cycle, the gradient of the mean square error function, updating the weights. The results of comparison and evaluation show that the WLA is a powerful technique in the FFNN classification area yet.

  13. Improving Drift Correction by Double Projection Preprocessing in Gas Sensor Arrays

    NASA Astrophysics Data System (ADS)

    Padilla, M.; Perera, A.; Montoliu, I.; Chaudry, A.; Persaud, K.; Marco, S.

    2009-05-01

    It is well known that gas chemical sensors are strongly affected by drift. Drift consist on changes in sensors responses along the time, which make that initial statistical models for gas or odor recognition become useless after a period of time of about weeks. Gas sensor arrays based instruments periodically need calibrations that are expensive and laborious. Many different statistical methods have been proposed to extend time between recalibrations. In this work, a simple preprocessing technique based on a double projection is proposed as a prior step to a posterior drift correction algorithm (in this particular case, Direct Orthogonal Signal Correction). This method highly improves the time stability of data in relation with the one obtained by using only such drift correction method. The performance of this technique will be evaluated on a dataset composed by measurements of three analytes by a polymer sensor array along ten months.

  14. Development, Characterization and Experimental Validation of a Cultivated Sunflower (Helianthus annuus L.) Gene Expression Oligonucleotide Microarray

    PubMed Central

    Fernandez, Paula; Soria, Marcelo; Blesa, David; DiRienzo, Julio; Moschen, Sebastian; Rivarola, Maximo; Clavijo, Bernardo Jose; Gonzalez, Sergio; Peluffo, Lucila; Príncipi, Dario; Dosio, Guillermo; Aguirrezabal, Luis; García-García, Francisco; Conesa, Ana; Hopp, Esteban; Dopazo, Joaquín; Heinz, Ruth Amelia; Paniego, Norma

    2012-01-01

    Oligonucleotide-based microarrays with accurate gene coverage represent a key strategy for transcriptional studies in orphan species such as sunflower, H. annuus L., which lacks full genome sequences. The goal of this study was the development and functional annotation of a comprehensive sunflower unigene collection and the design and validation of a custom sunflower oligonucleotide-based microarray. A large scale EST (>130,000 ESTs) curation, assembly and sequence annotation was performed using Blast2GO (www.blast2go.de). The EST assembly comprises 41,013 putative transcripts (12,924 contigs and 28,089 singletons). The resulting Sunflower Unigen Resource (SUR version 1.0) was used to design an oligonucleotide-based Agilent microarray for cultivated sunflower. This microarray includes a total of 42,326 features: 1,417 Agilent controls, 74 control probes for sunflower replicated 10 times (740 controls) and 40,169 different non-control probes. Microarray performance was validated using a model experiment examining the induction of senescence by water deficit. Pre-processing and differential expression analysis of Agilent microarrays was performed using the Bioconductor limma package. The analyses based on p-values calculated by eBayes (p<0.01) allowed the detection of 558 differentially expressed genes between water stress and control conditions; from these, ten genes were further validated by qPCR. Over-represented ontologies were identified using FatiScan in the Babelomics suite. This work generated a curated and trustable sunflower unigene collection, and a custom, validated sunflower oligonucleotide-based microarray using Agilent technology. Both the curated unigene collection and the validated oligonucleotide microarray provide key resources for sunflower genome analysis, transcriptional studies, and molecular breeding for crop improvement. PMID:23110046

  15. Development, characterization and experimental validation of a cultivated sunflower (Helianthus annuus L.) gene expression oligonucleotide microarray.

    PubMed

    Fernandez, Paula; Soria, Marcelo; Blesa, David; DiRienzo, Julio; Moschen, Sebastian; Rivarola, Maximo; Clavijo, Bernardo Jose; Gonzalez, Sergio; Peluffo, Lucila; Príncipi, Dario; Dosio, Guillermo; Aguirrezabal, Luis; García-García, Francisco; Conesa, Ana; Hopp, Esteban; Dopazo, Joaquín; Heinz, Ruth Amelia; Paniego, Norma

    2012-01-01

    Oligonucleotide-based microarrays with accurate gene coverage represent a key strategy for transcriptional studies in orphan species such as sunflower, H. annuus L., which lacks full genome sequences. The goal of this study was the development and functional annotation of a comprehensive sunflower unigene collection and the design and validation of a custom sunflower oligonucleotide-based microarray. A large scale EST (>130,000 ESTs) curation, assembly and sequence annotation was performed using Blast2GO (www.blast2go.de). The EST assembly comprises 41,013 putative transcripts (12,924 contigs and 28,089 singletons). The resulting Sunflower Unigen Resource (SUR version 1.0) was used to design an oligonucleotide-based Agilent microarray for cultivated sunflower. This microarray includes a total of 42,326 features: 1,417 Agilent controls, 74 control probes for sunflower replicated 10 times (740 controls) and 40,169 different non-control probes. Microarray performance was validated using a model experiment examining the induction of senescence by water deficit. Pre-processing and differential expression analysis of Agilent microarrays was performed using the Bioconductor limma package. The analyses based on p-values calculated by eBayes (p<0.01) allowed the detection of 558 differentially expressed genes between water stress and control conditions; from these, ten genes were further validated by qPCR. Over-represented ontologies were identified using FatiScan in the Babelomics suite. This work generated a curated and trustable sunflower unigene collection, and a custom, validated sunflower oligonucleotide-based microarray using Agilent technology. Both the curated unigene collection and the validated oligonucleotide microarray provide key resources for sunflower genome analysis, transcriptional studies, and molecular breeding for crop improvement.

  16. Microarray simulator as educational tool.

    PubMed

    Ruusuvuori, Pekka; Nykter, Matti; Mäkiraatikka, Eeva; Lehmussola, Antti; Korpelainen, Tomi; Erkkilä, Timo; Yli-Harja, Olli

    2007-01-01

    As many real-world applications, microarray measurements are inapplicable for large-scale teaching purposes due to their laborious preparation process and expense. Fortunately, many phases of the array preparation process can be efficiently demonstrated by using a software simulator tool. Here we propose the use of microarray simulator as an aiding tool in teaching of computational biology. Three case studies on educational use of the simulator are presented, which demonstrate the effect of gene knock-out, synthetic time series, and effect of noise sources. We conclude that the simulator, used for teaching the principles of microarray measurement technology, proved to be a useful tool in education.

  17. An accelerated procedure for recursive feature ranking on microarray data.

    PubMed

    Furlanello, C; Serafini, M; Merler, S; Jurman, G

    2003-01-01

    We describe a new wrapper algorithm for fast feature ranking in classification problems. The Entropy-based Recursive Feature Elimination (E-RFE) method eliminates chunks of uninteresting features according to the entropy of the weights distribution of a SVM classifier. With specific regard to DNA microarray datasets, the method is designed to support computationally intensive model selection in classification problems in which the number of features is much larger than the number of samples. We test E-RFE on synthetic and real data sets, comparing it with other SVM-based methods. The speed-up obtained with E-RFE supports predictive modeling on high dimensional microarray data.

  18. Chemistry of Natural Glycan Microarray

    PubMed Central

    Song, Xuezheng; Heimburg-Molinaro, Jamie; Cummings, Richard D.; Smith, David F.

    2014-01-01

    Glycan microarrays have become indispensable tools for studying protein-glycan interactions. Along with chemo-enzymatic synthesis, glycans isolated from natural sources have played important roles in array development and will continue to be a major source of glycans. N- and O-glycans from glycoproteins, and glycans from glycosphingolipids can be released from corresponding glycoconjugates with relatively mature methods, although isolation of large numbers and quantities of glycans are still very challenging. Glycosylphosphatidylinositol (GPI)-anchors and glycosaminoglycans (GAGs) are less represented on current glycan microarrays. Glycan microarray development has been greatly facilitated by bifunctional fluorescent linkers, which can be applied in a “Shotgun Glycomics” approach to incorporate isolated natural glycans. Glycan presentation on microarrays may affect glycan binding by GBPs, often through multivalent recognition by the GBP. PMID:24487062

  19. The Effect of LC-MS Data Preprocessing Methods on the Selection of Plasma Biomarkers in Fed vs. Fasted Rats

    PubMed Central

    Gürdeniz, Gözde; Kristensen, Mette; Skov, Thomas; Dragsted, Lars O.

    2012-01-01

    The metabolic composition of plasma is affected by time passed since the last meal and by individual variation in metabolite clearance rates. Rat plasma in fed and fasted states was analyzed with liquid chromatography quadrupole-time-of-flight mass spectrometry (LC-QTOF) for an untargeted investigation of these metabolite patterns. The dataset was used to investigate the effect of data preprocessing on biomarker selection using three different softwares, MarkerLynxTM, MZmine, XCMS along with a customized preprocessing method that performs binning of m/z channels followed by summation through retention time. Direct comparison of selected features representing the fed or fasted state showed large differences between the softwares. Many false positive markers were obtained from custom data preprocessing compared with dedicated softwares while MarkerLynxTM provided better coverage of markers. However, marker selection was more reliable with the gap filling (or peak finding) algorithms present in MZmine and XCMS. Further identification of the putative markers revealed that many of the differences between the markers selected were due to variations in features representing adducts or daughter ions of the same metabolites or of compounds from the same chemical subclasses, e.g., lyso-phosphatidylcholines (LPCs) and lyso-phosphatidylethanolamines (LPEs). We conclude that despite considerable differences in the performance of the preprocessing tools we could extract the same biological information by any of them. Carnitine, branched-chain amino acids, LPCs and LPEs were identified by all methods as markers of the fed state whereas acetylcarnitine was abundant during fasting in rats. PMID:24957369

  20. A new approach to pre-processing digital image for wavelet-based watermark

    NASA Astrophysics Data System (ADS)

    Agreste, Santa; Andaloro, Guido

    2008-11-01

    The growth of the Internet has increased the phenomenon of digital piracy, in multimedia objects, like software, image, video, audio and text. Therefore it is strategic to individualize and to develop methods and numerical algorithms, which are stable and have low computational cost, that will allow us to find a solution to these problems. We describe a digital watermarking algorithm for color image protection and authenticity: robust, not blind, and wavelet-based. The use of Discrete Wavelet Transform is motivated by good time-frequency features and a good match with Human Visual System directives. These two combined elements are important for building an invisible and robust watermark. Moreover our algorithm can work with any image, thanks to the step of pre-processing of the image that includes resize techniques that adapt to the size of the original image for Wavelet transform. The watermark signal is calculated in correlation with the image features and statistic properties. In the detection step we apply a re-synchronization between the original and watermarked image according to the Neyman-Pearson statistic criterion. Experimentation on a large set of different images has been shown to be resistant against geometric, filtering, and StirMark attacks with a low rate of false alarm.

  1. Systematic interpretation of microarray data using experiment annotations

    PubMed Central

    Fellenberg, Kurt; Busold, Christian H; Witt, Olaf; Bauer, Andrea; Beckmann, Boris; Hauser, Nicole C; Frohme, Marcus; Winter, Stefan; Dippon, Jürgen; Hoheisel, Jörg D

    2006-01-01

    Background Up to now, microarray data are mostly assessed in context with only one or few parameters characterizing the experimental conditions under study. More explicit experiment annotations, however, are highly useful for interpreting microarray data, when available in a statistically accessible format. Results We provide means to preprocess these additional data, and to extract relevant traits corresponding to the transcription patterns under study. We found correspondence analysis particularly well-suited for mapping such extracted traits. It visualizes associations both among and between the traits, the hereby annotated experiments, and the genes, revealing how they are all interrelated. Here, we apply our methods to the systematic interpretation of radioactive (single channel) and two-channel data, stemming from model organisms such as yeast and drosophila up to complex human cancer samples. Inclusion of technical parameters allows for identification of artifacts and flaws in experimental design. Conclusion Biological and clinical traits can act as landmarks in transcription space, systematically mapping the variance of large datasets from the predominant changes down toward intricate details. PMID:17181856

  2. Preprocessing and Analysis of LC-MS-Based Proteomic Data

    PubMed Central

    Tsai, Tsung-Heng; Wang, Minkun; Ressom, Habtom W.

    2016-01-01

    Liquid chromatography coupled with mass spectrometry (LC-MS) has been widely used for profiling protein expression levels. This chapter is focused on LC-MS data preprocessing, which is a crucial step in the analysis of LC-MS based proteomics. We provide a high-level overview, highlight associated challenges, and present a step-by-step example for analysis of data from LC-MS based untargeted proteomic study. Furthermore, key procedures and relevant issues with the subsequent analysis by multiple reaction monitoring (MRM) are discussed. PMID:26519169

  3. WebArray: an online platform for microarray data analysis

    PubMed Central

    Xia, Xiaoqin; McClelland, Michael; Wang, Yipeng

    2005-01-01

    Background Many cutting-edge microarray analysis tools and algorithms, including commonly used limma and affy packages in Bioconductor, need sophisticated knowledge of mathematics, statistics and computer skills for implementation. Commercially available software can provide a user-friendly interface at considerable cost. To facilitate the use of these tools for microarray data analysis on an open platform we developed an online microarray data analysis platform, WebArray, for bench biologists to utilize these tools to explore data from single/dual color microarray experiments. Results The currently implemented functions were based on limma and affy package from Bioconductor, the spacings LOESS histogram (SPLOSH) method, PCA-assisted normalization method and genome mapping method. WebArray incorporates these packages and provides a user-friendly interface for accessing a wide range of key functions of limma and others, such as spot quality weight, background correction, graphical plotting, normalization, linear modeling, empirical bayes statistical analysis, false discovery rate (FDR) estimation, chromosomal mapping for genome comparison. Conclusion WebArray offers a convenient platform for bench biologists to access several cutting-edge microarray data analysis tools. The website is freely available at . It runs on a Linux server with Apache and MySQL. PMID:16371165

  4. Classification of large microarray datasets using fast random forest construction.

    PubMed

    Manilich, Elena A; Özsoyoğlu, Z Meral; Trubachev, Valeriy; Radivoyevitch, Tomas

    2011-04-01

    Random forest is an ensemble classification algorithm. It performs well when most predictive variables are noisy and can be used when the number of variables is much larger than the number of observations. The use of bootstrap samples and restricted subsets of attributes makes it more powerful than simple ensembles of trees. The main advantage of a random forest classifier is its explanatory power: it measures variable importance or impact of each factor on a predicted class label. These characteristics make the algorithm ideal for microarray data. It was shown to build models with high accuracy when tested on high-dimensional microarray datasets. Current implementations of random forest in the machine learning and statistics community, however, limit its usability for mining over large datasets, as they require that the entire dataset remains permanently in memory. We propose a new framework, an optimized implementation of a random forest classifier, which addresses specific properties of microarray data, takes computational complexity of a decision tree algorithm into consideration, and shows excellent computing performance while preserving predictive accuracy. The implementation is based on reducing overlapping computations and eliminating dependency on the size of main memory. The implementation's excellent computational performance makes the algorithm useful for interactive data analyses and data mining.

  5. Inferring genetic networks from microarray data.

    SciTech Connect

    May, Elebeoba Eni; Davidson, George S.; Martin, Shawn Bryan; Werner-Washburne, Margaret C.; Faulon, Jean-Loup Michel

    2004-06-01

    In theory, it should be possible to infer realistic genetic networks from time series microarray data. In practice, however, network discovery has proved problematic. The three major challenges are: (1) inferring the network; (2) estimating the stability of the inferred network; and (3) making the network visually accessible to the user. Here we describe a method, tested on publicly available time series microarray data, which addresses these concerns. The inference of genetic networks from genome-wide experimental data is an important biological problem which has received much attention. Approaches to this problem have typically included application of clustering algorithms [6]; the use of Boolean networks [12, 1, 10]; the use of Bayesian networks [8, 11]; and the use of continuous models [21, 14, 19]. Overviews of the problem and general approaches to network inference can be found in [4, 3]. Our approach to network inference is similar to earlier methods in that we use both clustering and Boolean network inference. However, we have attempted to extend the process to better serve the end-user, the biologist. In particular, we have incorporated a system to assess the reliability of our network, and we have developed tools which allow interactive visualization of the proposed network.

  6. Microarray Technologies in Fungal Diagnostics.

    PubMed

    Rupp, Steffen

    2017-01-01

    Microarray technologies have been a major research tool in the last decades. In addition they have been introduced into several fields of diagnostics including diagnostics of infectious diseases. Microarrays are highly parallelized assay systems that initially were developed for multiparametric nucleic acid detection. From there on they rapidly developed towards a tool for the detection of all kind of biological compounds (DNA, RNA, proteins, cells, nucleic acids, carbohydrates, etc.) or their modifications (methylation, phosphorylation, etc.). The combination of closed-tube systems and lab on chip devices with microarrays further enabled a higher automation degree with a reduced contamination risk. Microarray-based diagnostic applications currently complement and may in the future replace classical methods in clinical microbiology like blood cultures, resistance determination, microscopic and metabolic analyses as well as biochemical or immunohistochemical assays. In addition, novel diagnostic markers appear, like noncoding RNAs and miRNAs providing additional room for novel nucleic acid based biomarkers. Here I focus an microarray technologies in diagnostics and as research tools, based on nucleic acid-based arrays.

  7. Comparing Bacterial DNA Microarray Fingerprints

    SciTech Connect

    Willse, Alan R.; Chandler, Darrell P.; White, Amanda M.; Protic, Miroslava; Daly, Don S.; Wunschel, Sharon C.

    2005-08-15

    Detecting subtle genetic differences between microorganisms is an important problem in molecular epidemiology and microbial forensics. In a typical investigation, gel electrophoresis is used to compare randomly amplified DNA fragments between microbial strains, where the patterns of DNA fragment sizes are proxies for a microbe's genotype. The limited genomic sample captured on a gel is often insufficient to discriminate nearly identical strains. This paper examines the application of microarray technology to DNA fingerprinting as a high-resolution alternative to gel-based methods. The so-called universal microarray, which uses short oligonucleotide probes that do not target specific genes or species, is intended to be applicable to all microorganisms because it does not require prior knowledge of genomic sequence. In principle, closely related strains can be distinguished if the number of probes on the microarray is sufficiently large, i.e., if the genome is sufficiently sampled. In practice, we confront noisy data, imperfectly matched hybridizations, and a high-dimensional inference problem. We describe the statistical problems of microarray fingerprinting, outline similarities with and differences from more conventional microarray applications, and illustrate the statistical fingerprinting problem for 10 closely related strains from three Bacillus species, and 3 strains from non-Bacillus species.

  8. [The net analyte preprocessing combined with radial basis partial least squares regression applied in noninvasive measurement of blood glucose].

    PubMed

    Li, Qing-Bo; Huang, Zheng-Wei

    2014-02-01

    In order to improve the prediction accuracy of quantitative analysis model in the near-infrared spectroscopy of blood glucose, this paper, by combining net analyte preprocessing (NAP) algorithm and radial basis functions partial least squares (RBFPLS) regression, builds a nonlinear model building method which is suitable for glucose measurement of human, named as NAP-RBFPLS. First, NAP is used to pre-process the near-infrared spectroscopy of blood glucose, in order to effectively extract the information which only relates to glucose signal from the original near-infrared spectra, so that it could effectively weaken the occasional correlation problems of the glucose changes and the interference factors which are caused by the absorption of water, albumin, hemoglobin, fat and other components of the blood in human body, the change of temperature of human body, the drift of measuring instruments, the changes of measuring environment, and the changes of measuring conditions; and then a nonlinear quantitative analysis model is built with the near-infrared spectroscopy data after NAP, in order to solve the nonlinear relationship between glucose concentrations and near-infrared spectroscopy which is caused by body strong scattering. In this paper, the new method is compared with other three quantitative analysis models building on partial least squares (PLS), net analyte preprocessing partial least squares (NAP-PLS) and RBFPLS respectively. At last, the experimental results show that the nonlinear calibration model, developed by combining NAP algorithm and RBFPLS regression, which was put forward in this paper, greatly improves the prediction accuracy of prediction sets, and what has been proved in this paper is that the nonlinear model building method will produce practical applications for the research of non-invasive detection techniques on human glucose concentrations.

  9. Automatic pre-processing for an object-oriented distributed hydrological model using GRASS-GIS

    NASA Astrophysics Data System (ADS)

    Sanzana, P.; Jankowfsky, S.; Branger, F.; Braud, I.; Vargas, X.; Hitschfeld, N.

    2012-04-01

    Landscapes are very heterogeneous, which impact the hydrological processes occurring in the catchments, especially in the modeling of peri-urban catchments. The Hydrological Response Units (HRUs), resulting from the intersection of different maps, such as land use, soil types and geology, and flow networks, allow the representation of these elements in an explicit way, preserving natural and artificial contours of the different layers. These HRUs are used as model mesh in some distributed object-oriented hydrological models, allowing the application of a topological oriented approach. The connectivity between polygons and polylines provides a detailed representation of the water balance and overland flow in these distributed hydrological models, based on irregular hydro-landscape units. When computing fluxes between these HRUs, the geometrical parameters, such as the distance between the centroid of gravity of the HRUs and the river network, and the length of the perimeter, can impact the realism of the calculated overland, sub-surface and groundwater fluxes. Therefore, it is necessary to process the original model mesh in order to avoid these numerical problems. We present an automatic pre-processing implemented in the open source GRASS-GIS software, for which several Python scripts or some algorithms already available were used, such as the Triangle software. First, some scripts were developed to improve the topology of the various elements, such as snapping of the river network to the closest contours. When data are derived with remote sensing, such as vegetation areas, their perimeter has lots of right angles that were smoothed. Second, the algorithms more particularly address bad-shaped elements of the model mesh such as polygons with narrow shapes, marked irregular contours and/or the centroid outside of the polygons. To identify these elements we used shape descriptors. The convexity index was considered the best descriptor to identify them with a threshold

  10. Design and implementation of a preprocessing system for a sodium lidar

    NASA Technical Reports Server (NTRS)

    Voelz, D. G.; Sechrist, C. F., Jr.

    1983-01-01

    A preprocessing system, designed and constructed for use with the University of Illinois sodium lidar system, was developed to increase the altitude resolution and range of the lidar system and also to decrease the processing burden of the main lidar computer. The preprocessing system hardware and the software required to implement the system are described. Some preliminary results of an airborne sodium lidar experiment conducted with the preprocessing system installed in the sodium lidar are presented.

  11. Image microarrays (IMA): Digital pathology's missing tool

    PubMed Central

    Hipp, Jason; Cheng, Jerome; Pantanowitz, Liron; Hewitt, Stephen; Yagi, Yukako; Monaco, James; Madabhushi, Anant; Rodriguez-canales, Jaime; Hanson, Jeffrey; Roy-Chowdhuri, Sinchita; Filie, Armando C.; Feldman, Michael D.; Tomaszewski, John E.; Shih, Natalie NC.; Brodsky, Victor; Giaccone, Giuseppe; Emmert-Buck, Michael R.; Balis, Ulysses J.

    2011-01-01

    Introduction: The increasing availability of whole slide imaging (WSI) data sets (digital slides) from glass slides offers new opportunities for the development of computer-aided diagnostic (CAD) algorithms. With the all-digital pathology workflow that these data sets will enable in the near future, literally millions of digital slides will be generated and stored. Consequently, the field in general and pathologists, specifically, will need tools to help extract actionable information from this new and vast collective repository. Methods: To address this limitation, we designed and implemented a tool (dCORE) to enable the systematic capture of image tiles with constrained size and resolution that contain desired histopathologic features. Results: In this communication, we describe a user-friendly tool that will enable pathologists to mine digital slides archives to create image microarrays (IMAs). IMAs are to digital slides as tissue microarrays (TMAs) are to cell blocks. Thus, a single digital slide could be transformed into an array of hundreds to thousands of high quality digital images, with each containing key diagnostic morphologies and appropriate controls. Current manual digital image cut-and-paste methods that allow for the creation of a grid of images (such as an IMA) of matching resolutions are tedious. Conclusion: The ability to create IMAs representing hundreds to thousands of vetted morphologic features has numerous applications in education, proficiency testing, consensus case review, and research. Lastly, in a manner analogous to the way conventional TMA technology has significantly accelerated in situ studies of tissue specimens use of IMAs has similar potential to significantly accelerate CAD algorithm development. PMID:22200030

  12. Pre-processing technologies to prepare solid waste for composting

    SciTech Connect

    Gould, M.

    1996-09-01

    The organic constituents of municipal solid waste can be converted into compost for use as a safe, beneficial soil amendment, conserving landfill space. The solid waste must first be processed to remove contaminants and prepare the organics for composting. This paper describes five different preprocessing systems, covering a broad range of technical approaches. Three are described briefly; two, from projects managed by the author, are presented as more detailed case histories: (1) a pilot study at a refuse-derived fuel (RDF) plant in Hartford, Connecticut and (2) a solid waste composting facility in East Hampton, New York. Materials flow diagrams and mass balances are presented for each process, showing that 100 tons of solid waste will yield 32 to 44 tons of compost, requiring disposal of 3 to 10 tons of metal, grit, and glass and 16 to 40 tons of light residue that can be landfilled or used as RDF.

  13. Pre-processing of ultraviolet resonance Raman spectra.

    PubMed

    Simpson, John V; Oshokoya, Olayinka; Wagner, Nicole; Liu, Jing; JiJi, Renee D

    2011-03-21

    The application of UV excitation sources coupled with resonance Raman have the potential to offer information unavailable with the current inventory of commonly used structural techniques including X-ray, NMR and IR analysis. However, for ultraviolet resonance Raman (UVRR) spectroscopy to become a mainstream method for the determination of protein secondary structure content and monitoring protein dynamics, the application of multivariate data analysis methodologies must be made routine. Typically, the application of higher order data analysis methods requires robust pre-processing methods in order to standardize the data arrays. The application of such methods can be problematic in UVRR datasets due to spectral shifts arising from day-to-day fluctuations in the instrument response. Additionally, the non-linear increases in spectral resolution in wavenumbers (increasing spectral data points for the same spectral region) that results from increasing excitation wavelengths can make the alignment of multi-excitation datasets problematic. Last, a uniform and standardized methodology for the subtraction of the water band has also been a systematic issue for multivariate data analysis as the water band overlaps the amide I mode. Here we present a two-pronged preprocessing approach using correlation optimized warping (COW) to alleviate spectra-to-spectra and day-to-day alignment errors coupled with a method whereby the relative intensity of the water band is determined through a least-squares determination of the signal intensity between 1750 and 1900 cm(-1) to make complex multi-excitation datasets more homogeneous and usable with multivariate analysis methods.

  14. Preprocessing of Satellite Data for Urban Object Extraction

    NASA Astrophysics Data System (ADS)

    Krauß, T.

    2015-03-01

    Very high resolution (VHR) DSMs (digital surface models) derived from stereo- or multi-stereo images from current VHR satellites like WorldView-2 or Pléiades can be produced up to the ground sampling distance (GSD) of the sensors in the range of 50 cm to 1 m. From such DSMs the digital terrain model (DTM) representing the ground and also a so called nDEM (normalized digital elevation model) describing the height of objects above the ground can be derived. In parallel these sensors deliver multispectral imagery which can be used for a spectral classification of the imagery. Fusion of the multispectral classification and the nDEM allows a simple classification and detection of urban objects. In further processing steps these detected urban objects can be modeled and exported in a suitable description language like CityGML. In this work we present the pre-processing steps up to the classification and detection of the urban objects. The modeling is not part of this work. The pre-processing steps described here cover briefly the coregistration of the input images and the generation of the DSM. In more detail the improvement of the DSM, the extraction of the DTM and nDEM, the multispectral classification and the object detection and extraction are explained. The methods described are applied to two test regions from two satellites: First the center of Munich acquired by WorldView-2 and second the center of Melbourne acquired by Pĺeiades. From both acquisitions a stereo-pair from the panchromatic bands is used for creation of the DSM and the pan-sharpened multispectral images are used for spectral classification. Finally the quality of the detected urban objects is discussed.

  15. Study on Construction of a Medical X-Ray Direct Digital Radiography System and Hybrid Preprocessing Methods

    PubMed Central

    Ren, Yong; Wu, Sheng; Wang, Mijian; Cen, Zhongjie

    2014-01-01

    We construct a medical X-ray direct digital radiography (DDR) system based on a CCD (charge-coupled devices) camera. For the original images captured from X-ray exposure, computer first executes image flat-field correction and image gamma correction, and then carries out image contrast enhancement. A hybrid image contrast enhancement algorithm which is based on sharp frequency localization-contourlet transform (SFL-CT) and contrast limited adaptive histogram equalization (CLAHE), is proposed and verified by the clinical DDR images. Experimental results show that, for the medical X-ray DDR images, the proposed comprehensive preprocessing algorithm can not only greatly enhance the contrast and detail information, but also improve the resolution capability of DDR system. PMID:25013452

  16. Microfluidic microarray systems and methods thereof

    SciTech Connect

    West, Jay A. A.; Hukari, Kyle W.; Hux, Gary A.

    2009-04-28

    Disclosed are systems that include a manifold in fluid communication with a microfluidic chip having a microarray, an illuminator, and a detector in optical communication with the microarray. Methods for using these systems for biological detection are also disclosed.

  17. Technical Advances of the Recombinant Antibody Microarray Technology Platform for Clinical Immunoproteomics

    PubMed Central

    Delfani, Payam; Dexlin Mellby, Linda; Nordström, Malin; Holmér, Andreas; Ohlsson, Mattias; Borrebaeck, Carl A. K.; Wingren, Christer

    2016-01-01

    In the quest for deciphering disease-associated biomarkers, high-performing tools for multiplexed protein expression profiling of crude clinical samples will be crucial. Affinity proteomics, mainly represented by antibody-based microarrays, have during recent years been established as a proteomic tool providing unique opportunities for parallelized protein expression profiling. But despite the progress, several main technical features and assay procedures remains to be (fully) resolved. Among these issues, the handling of protein microarray data, i.e. the biostatistics parts, is one of the key features to solve. In this study, we have therefore further optimized, validated, and standardized our in-house designed recombinant antibody microarray technology platform. To this end, we addressed the main remaining technical issues (e.g. antibody quality, array production, sample labelling, and selected assay conditions) and most importantly key biostatistics subjects (e.g. array data pre-processing and biomarker panel condensation). This represents one of the first antibody array studies in which these key biostatistics subjects have been studied in detail. Here, we thus present the next generation of the recombinant antibody microarray technology platform designed for clinical immunoproteomics. PMID:27414037

  18. The Microarray Revolution: Perspectives from Educators

    ERIC Educational Resources Information Center

    Brewster, Jay L.; Beason, K. Beth; Eckdahl, Todd T.; Evans, Irene M.

    2004-01-01

    In recent years, microarray analysis has become a key experimental tool, enabling the analysis of genome-wide patterns of gene expression. This review approaches the microarray revolution with a focus upon four topics: 1) the early development of this technology and its application to cancer diagnostics; 2) a primer of microarray research,…

  19. Independent component analysis of Alzheimer's DNA microarray gene expression data

    PubMed Central

    Kong, Wei; Mou, Xiaoyang; Liu, Qingzhong; Chen, Zhongxue; Vanderburg, Charles R; Rogers, Jack T; Huang, Xudong

    2009-01-01

    Background Gene microarray technology is an effective tool to investigate the simultaneous activity of multiple cellular pathways from hundreds to thousands of genes. However, because data in the colossal amounts generated by DNA microarray technology are usually complex, noisy, high-dimensional, and often hindered by low statistical power, their exploitation is difficult. To overcome these problems, two kinds of unsupervised analysis methods for microarray data: principal component analysis (PCA) and independent component analysis (ICA) have been developed to accomplish the task. PCA projects the data into a new space spanned by the principal components that are mutually orthonormal to each other. The constraint of mutual orthogonality and second-order statistics technique within PCA algorithms, however, may not be applied to the biological systems studied. Extracting and characterizing the most informative features of the biological signals, however, require higher-order statistics. Results ICA is one of the unsupervised algorithms that can extract higher-order statistical structures from data and has been applied to DNA microarray gene expression data analysis. We performed FastICA method on DNA microarray gene expression data from Alzheimer's disease (AD) hippocampal tissue samples and consequential gene clustering. Experimental results showed that the ICA method can improve the clustering results of AD samples and identify significant genes. More than 50 significant genes with high expression levels in severe AD were extracted, representing immunity-related protein, metal-related protein, membrane protein, lipoprotein, neuropeptide, cytoskeleton protein, cellular binding protein, and ribosomal protein. Within the aforementioned categories, our method also found 37 significant genes with low expression levels. Moreover, it is worth noting that some oncogenes and phosphorylation-related proteins are expressed in low levels. In comparison to the PCA and support

  20. Fourier Lucas-Kanade algorithm.

    PubMed

    Lucey, Simon; Navarathna, Rajitha; Ashraf, Ahmed Bilal; Sridharan, Sridha

    2013-06-01

    In this paper, we propose a framework for both gradient descent image and object alignment in the Fourier domain. Our method centers upon the classical Lucas & Kanade (LK) algorithm where we represent the source and template/model in the complex 2D Fourier domain rather than in the spatial 2D domain. We refer to our approach as the Fourier LK (FLK) algorithm. The FLK formulation is advantageous when one preprocesses the source image and template/model with a bank of filters (e.g., oriented edges, Gabor, etc.) as 1) it can handle substantial illumination variations, 2) the inefficient preprocessing filter bank step can be subsumed within the FLK algorithm as a sparse diagonal weighting matrix, 3) unlike traditional LK, the computational cost is invariant to the number of filters and as a result is far more efficient, and 4) this approach can be extended to the Inverse Compositional (IC) form of the LK algorithm where nearly all steps (including Fourier transform and filter bank preprocessing) can be precomputed, leading to an extremely efficient and robust approach to gradient descent image matching. Further, these computational savings translate to nonrigid object alignment tasks that are considered extensions of the LK algorithm, such as those found in Active Appearance Models (AAMs).

  1. The Current Status of DNA Microarrays

    NASA Astrophysics Data System (ADS)

    Shi, Leming; Perkins, Roger G.; Tong, Weida

    DNA microarray technology that allows simultaneous assay of thousands of genes in a single experiment has steadily advanced to become a mainstream method used in research, and has reached a stage that envisions its use in medical applications and personalized medicine. Many different strategies have been developed for manufacturing DNA microarrays. In this chapter, we discuss the manufacturing characteristics of seven microarray platforms that were used in a recently completed large study by the MicroArray Quality Control (MAQC) consortium, which evaluated the concordance of results across these platforms. The platforms can be grouped into three categories: (1) in situ synthesis of oligonucleotide probes on microarrays (Affymetrix GeneChip® arrays based on photolithography synthesis and Agilent's arrays based on inkjet synthesis); (2) spotting of presynthesized oligonucleotide probes on microarrays (GE Healthcare's CodeLink system, Applied Biosystems' Genome Survey Microarrays, and the custom microarrays printed with Operon's oligonucleotide set); and (3) deposition of presynthesized oligonucleotide probes on bead-based microarrays (Illumina's BeadChip microarrays). We conclude this chapter with our views on the challenges and opportunities toward acceptance of DNA microarray data in clinical and regulatory settings.

  2. The Current Status of DNA Microarrays

    NASA Astrophysics Data System (ADS)

    Shi, Leming; Perkins, Roger G.; Tong, Weida

    DNA microarray technology that allows simultaneous assay of thousands of genes in a single experiment has steadily advanced to become a mainstream method used in research, and has reached a stage that envisions its use in medical applications and personalized medicine. Many different strategies have been developed for manufacturing DNA microarrays. In this chapter, we discuss the manu facturing characteristics of seven microarray platforms that were used in a recently completed large study by the MicroArray Quality Control (MAQC) consortium, which evaluated the concordance of results across these platforms. The platforms can be grouped into three categories: (1) in situ synthesis of oligonucleotide probes on microarrays (Affymetrix GeneChip® arrays based on photolithography synthesis and Agilent's arrays based on inkjet synthesis); (2) spotting of presynthe-sized oligonucleotide probes on microarrays (GE Healthcare's CodeLink system, Applied Biosystems' Genome Survey Microarrays, and the custom microarrays printed with Operon's oligonucleotide set); and (3) deposition of presynthesized oligonucleotide probes on bead-based microarrays (Illumina's BeadChip microar-rays). We conclude this chapter with our views on the challenges and opportunities toward acceptance of DNA microarray data in clinical and regulatory settings.

  3. Microarray analysis in pulmonary hypertension

    PubMed Central

    Hoffmann, Julia; Wilhelm, Jochen; Olschewski, Andrea

    2016-01-01

    Microarrays are a powerful and effective tool that allows the detection of genome-wide gene expression differences between controls and disease conditions. They have been broadly applied to investigate the pathobiology of diverse forms of pulmonary hypertension, namely group 1, including patients with idiopathic pulmonary arterial hypertension, and group 3, including pulmonary hypertension associated with chronic lung diseases such as chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis. To date, numerous human microarray studies have been conducted to analyse global (lung homogenate samples), compartment-specific (laser capture microdissection), cell type-specific (isolated primary cells) and circulating cell (peripheral blood) expression profiles. Combined, they provide important information on development, progression and the end-stage disease. In the future, system biology approaches, expression of noncoding RNAs that regulate coding RNAs, and direct comparison between animal models and human disease might be of importance. PMID:27076594

  4. DNA microarray technology in dermatology.

    PubMed

    Kunz, Manfred

    2008-03-01

    In recent years, DNA microarray technology has been used for the analysis of gene expression patterns in a variety of skin diseases, including malignant melanoma, psoriasis, lupus erythematosus, and systemic sclerosis. Many of the studies described herein confirmed earlier results on individual genes or functional groups of genes. However, a plethora of new candidate genes, gene patterns, and regulatory pathways have been identified. Major progresses were reached by the identification of a prognostic gene pattern in malignant melanoma, an immune signaling cluster in psoriasis, and a so-called interferon signature in systemic lupus erythematosus. In future, interference with genes or regulatory pathways with the use of different RNA interference technologies or targeted therapy may not only underscore the functional significance of microarray data but also may open interesting therapeutic perspectives. Large-scale gene expression analyses may also help to design more individualized treatment approaches of cutaneous diseases.

  5. Microarray analysis in pulmonary hypertension.

    PubMed

    Hoffmann, Julia; Wilhelm, Jochen; Olschewski, Andrea; Kwapiszewska, Grazyna

    2016-07-01

    Microarrays are a powerful and effective tool that allows the detection of genome-wide gene expression differences between controls and disease conditions. They have been broadly applied to investigate the pathobiology of diverse forms of pulmonary hypertension, namely group 1, including patients with idiopathic pulmonary arterial hypertension, and group 3, including pulmonary hypertension associated with chronic lung diseases such as chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis. To date, numerous human microarray studies have been conducted to analyse global (lung homogenate samples), compartment-specific (laser capture microdissection), cell type-specific (isolated primary cells) and circulating cell (peripheral blood) expression profiles. Combined, they provide important information on development, progression and the end-stage disease. In the future, system biology approaches, expression of noncoding RNAs that regulate coding RNAs, and direct comparison between animal models and human disease might be of importance.

  6. Microarrays, antiobesity and the liver

    PubMed Central

    Castro-Chávez, Fernando

    2013-01-01

    In this review, the microarray technology and especially oligonucleotide arrays are exemplified with a practical example taken from the perilipin−/− mice and using the dChip software, available for non-lucrative purposes. It was found that the liver of perilipin−/− mice was healthy and normal, even under high-fat diet when compared with the results published for the scd1−/− mice, which under high-fat diets had a darker liver, suggestive of hepatic steatosis. Scd1 is required for the biosynthesis of monounsaturated fatty acids and plays a key role in the hepatic synthesis of triglycerides and of very-low-density lipoproteins. Both models of obesity resistance share many similar phenotypic antiobesity features, however, the perilipin−/− mice had a significant downregulation of stearoyl CoA desaturases scd1 and scd2 in its white adipose tissue, but a normal level of both genes inside the liver, even under high-fat diet. Here, different microarray methodologies are discussed, and also some of the most recent discoveries and perspectives regarding the use of microarrays, with an emphasis on obesity gene expression, and a personal remark on my findings of increased expression for hemoglobin transcripts and other hemo related genes (hemo-like), and for leukocyte like (leuko-like) genes inside the white adipose tissue of the perilipin−/− mice. In conclusion, microarrays have much to offer in comparative studies such as those in antiobesity, and also they are methodologies adequate for new astounding molecular discoveries [free full text of this article PMID:15657555

  7. Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection

    PubMed Central

    Wong, Raymond

    2013-01-01

    Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. PMID:24288684

  8. Impact of spatial complexity preprocessing on hyperspectral data unmixing

    NASA Astrophysics Data System (ADS)

    Robila, Stefan A.; Pirate, Kimberly; Hall, Terrance

    2013-05-01

    For most of the success, hyperspectral image processing techniques have their origins in multidimensional signal processing with a special emphasis on optimization based on objective functions. Many of these techniques (ICA, PCA, NMF, OSP, etc.) have their basis on collections of single dimensional data and do not take in consideration any spatial based characteristics (such as the shape of objects in the scene). Recently, in an effort to improve the processing results, several approaches that characterize spatial complexity (based on the neighborhood information) were introduced. Our goal is to investigate how spatial complexity based approaches can be employed as preprocessing techniques for other previously established methods. First, we designed for each spatial complexity based technique a step that generates a hyperspectral cube scaled based on spatial information. Next we feed the new cubes to a group of processing techniques such as ICA and PCA. We compare the results between processing the original and the scaled data. We compared the results on the scaled data with the results on the full data. We built upon these initial results by employing additional spatial complexity approaches. We also introduced new hybrid approaches that would embed the spatial complexity step into the main processing stage.

  9. Macular Preprocessing of Linear Acceleratory Stimuli: Implications for the Clinic

    NASA Technical Reports Server (NTRS)

    Ross, M. D.; Hargens, Alan R. (Technical Monitor)

    1996-01-01

    Three-dimensional reconstructions of innervation patterns in rat maculae were carried out using serial section images sent to a Silicon Graphics workstation from a transmission electron microscope. Contours were extracted from mosaicked sections, then registered and visualized using Biocomputation Center software. Purposes were to determine innervation patterns of type II cells and areas encompassed by vestibular afferent receptive fields. Terminals on type II cells typically are elongated and compartmentalized into parts varying in vesicular content; reciprocal and serial synapses are common. The terminals originate as processes of nearby calyces or from nerve fibers passing to calyces outside the immediate vicinity. Thus, receptive fields of the afferents overlap in unique ways. Multiple processes are frequent; from 4 to 6 afferents supply 12-16 terminals on a type II cell. Processes commonly communicate with two type II cells. The morphology indicates that extensive preprocessing of linear acceleratory stimuli occurs peripherally, as is true also of visual and olfactory systems. Clinically, this means that loss of individual nerve fibers may not be noticed behaviorally, due to redundancy (receptive field overlap). However, peripheral processing implies the presence of neuroactive agents whose loss can acutely or chronically alter normal peripheral function and cause balance disorders. (Platform presentation preferred - Theme 11)

  10. Breast image pre-processing for mammographic tissue segmentation.

    PubMed

    He, Wenda; Hogg, Peter; Juette, Arne; Denton, Erika R E; Zwiggelaar, Reyer

    2015-12-01

    During mammographic image acquisition, a compression paddle is used to even the breast thickness in order to obtain optimal image quality. Clinical observation has indicated that some mammograms may exhibit abrupt intensity change and low visibility of tissue structures in the breast peripheral areas. Such appearance discrepancies can affect image interpretation and may not be desirable for computer aided mammography, leading to incorrect diagnosis and/or detection which can have a negative impact on sensitivity and specificity of screening mammography. This paper describes a novel mammographic image pre-processing method to improve image quality for analysis. An image selection process is incorporated to better target problematic images. The processed images show improved mammographic appearances not only in the breast periphery but also across the mammograms. Mammographic segmentation and risk/density classification were performed to facilitate a quantitative and qualitative evaluation. When using the processed images, the results indicated more anatomically correct segmentation in tissue specific areas, and subsequently better classification accuracies were achieved. Visual assessments were conducted in a clinical environment to determine the quality of the processed images and the resultant segmentation. The developed method has shown promising results. It is expected to be useful in early breast cancer detection, risk-stratified screening, and aiding radiologists in the process of decision making prior to surgery and/or treatment.

  11. Software for Preprocessing Data from Rocket-Engine Tests

    NASA Technical Reports Server (NTRS)

    Cheng, Chiu-Fu

    2004-01-01

    Three computer programs have been written to preprocess digitized outputs of sensors during rocket-engine tests at Stennis Space Center (SSC). The programs apply exclusively to the SSC E test-stand complex and utilize the SSC file format. The programs are the following: Engineering Units Generator (EUGEN) converts sensor-output-measurement data to engineering units. The inputs to EUGEN are raw binary test-data files, which include the voltage data, a list identifying the data channels, and time codes. EUGEN effects conversion by use of a file that contains calibration coefficients for each channel. QUICKLOOK enables immediate viewing of a few selected channels of data, in contradistinction to viewing only after post-test processing (which can take 30 minutes to several hours depending on the number of channels and other test parameters) of data from all channels. QUICKLOOK converts the selected data into a form in which they can be plotted in engineering units by use of Winplot (a free graphing program written by Rick Paris). EUPLOT provides a quick means for looking at data files generated by EUGEN without the necessity of relying on the PV-WAVE based plotting software.

  12. Software for Preprocessing Data From Rocket-Engine Tests

    NASA Technical Reports Server (NTRS)

    Cheng, Chiu-Fu

    2002-01-01

    Three computer programs have been written to preprocess digitized outputs of sensors during rocket-engine tests at Stennis Space Center (SSC). The programs apply exclusively to the SSC "E" test-stand complex and utilize the SSC file format. The programs are the following: 1) Engineering Units Generator (EUGEN) converts sensor-output-measurement data to engineering units. The inputs to EUGEN are raw binary test-data files, which include the voltage data, a list identifying the data channels, and time codes. EUGEN effects conversion by use of a file that contains calibration coefficients for each channel; 2) QUICKLOOK enables immediate viewing of a few selected channels of data, in contradistinction to viewing only after post test processing (which can take 30 minutes to several hours depending on the number of channels and other test parameters) of data from all channels. QUICKLOOK converts the selected data into a form in which they can be plotted in engineering units by use of Winplot (a free graphing program written by Rick Paris); and 3) EUPLOT provides a quick means for looking at data files generated by EUGEN without the necessity of relying on the PVWAVE based plotting software.

  13. Localization of spatially distributed brain sources after a tensor-based preprocessing of interictal epileptic EEG data.

    PubMed

    Albera, L; Becker, H; Karfoul, A; Gribonval, R; Kachenoura, A; Bensaid, S; Senhadji, L; Hernandez, A; Merlet, I

    2015-01-01

    This paper addresses the localization of spatially distributed sources from interictal epileptic electroencephalographic data after a tensor-based preprocessing. Justifying the Canonical Polyadic (CP) model of the space-time-frequency and space-time-wave-vector tensors is not an easy task when two or more extended sources have to be localized. On the other hand, the occurrence of several amplitude modulated spikes originating from the same epileptic region can be used to build a space-time-spike tensor from the EEG data. While the CP model of this tensor appears more justified, the exact computation of its loading matrices can be limited by the presence of highly correlated sources or/and a strong background noise. An efficient extended source localization scheme after the tensor-based preprocessing has then to be set up. Different strategies are thus investigated and compared on realistic simulated data: the "disk algorithm" using a precomputed dictionary of circular patches, a standardized Tikhonov regularization and a fused LASSO scheme.

  14. Recommendations for the use of microarrays in prenatal diagnosis.

    PubMed

    Suela, Javier; López-Expósito, Isabel; Querejeta, María Eugenia; Martorell, Rosa; Cuatrecasas, Esther; Armengol, Lluis; Antolín, Eugenia; Domínguez Garrido, Elena; Trujillo-Tiebas, María José; Rosell, Jordi; García Planells, Javier; Cigudosa, Juan Cruz

    2017-04-07

    Microarray technology, recently implemented in international prenatal diagnosis systems, has become one of the main techniques in this field in terms of detection rate and objectivity of the results. This guideline attempts to provide background information on this technology, including technical and diagnostic aspects to be considered. Specifically, this guideline defines: the different prenatal sample types to be used, as well as their characteristics (chorionic villi samples, amniotic fluid, fetal cord blood or miscarriage tissue material); variant reporting policies (including variants of uncertain significance) to be considered in informed consents and prenatal microarray reports; microarray limitations inherent to the technique and which must be taken into account when recommending microarray testing for diagnosis; a detailed clinical algorithm recommending the use of microarray testing and its introduction into routine clinical practice within the context of other genetic tests, including pregnancies in families with a genetic history or specific syndrome suspicion, first trimester increased nuchal translucency or second trimester heart malformation and ultrasound findings not related to a known or specific syndrome. This guideline has been coordinated by the Spanish Association for Prenatal Diagnosis (AEDP, «Asociación Española de Diagnóstico Prenatal»), the Spanish Human Genetics Association (AEGH, «Asociación Española de Genética Humana») and the Spanish Society of Clinical Genetics and Dysmorphology (SEGCyD, «Sociedad Española de Genética Clínica y Dismorfología»).

  15. Understanding the effects of pre-processing on extracted signal features from gait accelerometry signals.

    PubMed

    Millecamps, Alexandre; Lowry, Kristin A; Brach, Jennifer S; Perera, Subashan; Redfern, Mark S; Sejdić, Ervin

    2015-07-01

    Gait accelerometry is an important approach for gait assessment. Previous contributions have adopted various pre-processing approaches for gait accelerometry signals, but none have thoroughly investigated the effects of such pre-processing operations on the obtained results. Therefore, this paper investigated the influence of pre-processing operations on signal features extracted from gait accelerometry signals. These signals were collected from 35 participants aged over 65years: 14 of them were healthy controls (HC), 10 had Parkinson׳s disease (PD) and 11 had peripheral neuropathy (PN). The participants walked on a treadmill at preferred speed. Signal features in time, frequency and time-frequency domains were computed for both raw and pre-processed signals. The pre-processing stage consisted of applying tilt correction and denoising operations to acquired signals. We first examined the effects of these operations separately, followed by the investigation of their joint effects. Several important observations were made based on the obtained results. First, the denoising operation alone had almost no effects in comparison to the trends observed in the raw data. Second, the tilt correction affected the reported results to a certain degree, which could lead to a better discrimination between groups. Third, the combination of the two pre-processing operations yielded similar trends as the tilt correction alone. These results indicated that while gait accelerometry is a valuable approach for the gait assessment, one has to carefully adopt any pre-processing steps as they alter the observed findings.

  16. Comparison of preprocessing methods and storage times for touch DNA samples

    PubMed Central

    Dong, Hui; Wang, Jing; Zhang, Tao; Ge, Jian-ye; Dong, Ying-qiang; Sun, Qi-fan; Liu, Chao; Li, Cai-xia

    2017-01-01

    Aim To select appropriate preprocessing methods for different substrates by comparing the effects of four different preprocessing methods on touch DNA samples and to determine the effect of various storage times on the results of touch DNA sample analysis. Method Hand touch DNA samples were used to investigate the detection and inspection results of DNA on different substrates. Four preprocessing methods, including the direct cutting method, stubbing procedure, double swab technique, and vacuum cleaner method, were used in this study. DNA was extracted from mock samples with four different preprocessing methods. The best preprocess protocol determined from the study was further used to compare performance after various storage times. DNA extracted from all samples was quantified and amplified using standard procedures. Results The amounts of DNA and the number of alleles detected on the porous substrates were greater than those on the non-porous substrates. The performances of the four preprocessing methods varied with different substrates. The direct cutting method displayed advantages for porous substrates, and the vacuum cleaner method was advantageous for non-porous substrates. No significant degradation trend was observed as the storage times increased. Conclusion Different substrates require the use of different preprocessing method in order to obtain the highest DNA amount and allele number from touch DNA samples. This study provides a theoretical basis for explorations of touch DNA samples and may be used as a reference when dealing with touch DNA samples in case work. PMID:28252870

  17. Object localization based on smoothing preprocessing and cascade classifier

    NASA Astrophysics Data System (ADS)

    Zhang, Xingfu; Liu, Lei; Zhao, Feng

    2017-01-01

    An improved algorithm for image location is proposed in this paper. Firstly, the image is smoothed and the partial noise is removed. Then use the cascade classifier to train a template. Finally, the template is used to detect the related images. The advantage of the algorithm is that it is robust to noise and the proportion of the image is not sensitive to change. At the same time, the algorithm also has the advantages of fast computation speed. In this paper, a real truck bottom picture is chosen as the experimental object. Images of normal components and faulty components are all included in the image sample. Experimental results show that the accuracy rate of the image is more than 90 percent when the grade is more than 40. So we can draw a conclusion that the algorithm proposed in this paper can be applied to the actual image localization project.

  18. Gene Expression Browser: large-scale and cross-experiment microarray data integration, management, search & visualization

    PubMed Central

    2010-01-01

    Background In the last decade, a large amount of microarray gene expression data has been accumulated in public repositories. Integrating and analyzing high-throughput gene expression data have become key activities for exploring gene functions, gene networks and biological pathways. Effectively utilizing these invaluable microarray data remains challenging due to a lack of powerful tools to integrate large-scale gene-expression information across diverse experiments and to search and visualize a large number of gene-expression data points. Results Gene Expression Browser is a microarray data integration, management and processing system with web-based search and visualization functions. An innovative method has been developed to define a treatment over a control for every microarray experiment to standardize and make microarray data from different experiments homogeneous. In the browser, data are pre-processed offline and the resulting data points are visualized online with a 2-layer dynamic web display. Users can view all treatments over control that affect the expression of a selected gene via Gene View, and view all genes that change in a selected treatment over control via treatment over control View. Users can also check the changes of expression profiles of a set of either the treatments over control or genes via Slide View. In addition, the relationships between genes and treatments over control are computed according to gene expression ratio and are shown as co-responsive genes and co-regulation treatments over control. Conclusion Gene Expression Browser is composed of a set of software tools, including a data extraction tool, a microarray data-management system, a data-annotation tool, a microarray data-processing pipeline, and a data search & visualization tool. The browser is deployed as a free public web service (http://www.ExpressionBrowser.com) that integrates 301 ATH1 gene microarray experiments from public data repositories (viz. the Gene

  19. Reconfiguration-based implementation of SVM classifier on FPGA for Classifying Microarray data.

    PubMed

    Hussain, Hanaa M; Benkrid, Khaled; Seker, Huseyin

    2013-01-01

    Classifying Microarray data, which are of high dimensional nature, requires high computational power. Support Vector Machines-based classifier (SVM) is among the most common and successful classifiers used in the analysis of Microarray data but also requires high computational power due to its complex mathematical architecture. Implementing SVM on hardware exploits the parallelism available within the algorithm kernels to accelerate the classification of Microarray data. In this work, a flexible, dynamically and partially reconfigurable implementation of the SVM classifier on Field Programmable Gate Array (FPGA) is presented. The SVM architecture achieved up to 85× speed-up over equivalent general purpose processor (GPP) showing the capability of FPGAs in enhancing the performance of SVM-based analysis of Microarray data as well as future bioinformatics applications.

  20. FPGA based system for automatic cDNA microarray image processing.

    PubMed

    Belean, Bogdan; Borda, Monica; Le Gal, Bertrand; Terebes, Romulus

    2012-07-01

    Automation is an open subject in DNA microarray image processing, aiming reliable gene expression estimation. The paper presents a novel shock filter based approach for automatic microarray grid alignment. The proposed method brings up significantly reduced computational complexity compared to state of the art approaches, while similar results in terms of accuracy are achieved. Based on this approach, we also propose an FPGA based system for microarray image analysis that eliminates the shortcomings of existing software platforms: user intervention, increased computational time and cost. Our system includes application-specific architectures which involve algorithm parallelization, aiming fast and automated cDNA microarray image processing. The proposed automated image processing chain is implemented both on a general purpose processor and using the developed hardware architectures as co-processors in a FPGA based system. The comparative results included in the last section show that an important gain in terms of computational time is obtained using hardware based implementations.

  1. Automated Pre-processing for NMR Assignments with Reduced Tedium

    SciTech Connect

    Pawley, Norma; Gans, Jason

    2004-05-11

    An important rate-limiting step in the reasonance asignment process is accurate identification of resonance peaks in MNR spectra. NMR spectra are noisy. Hence, automatic peak-picking programs must navigate between the Scylla of reliable but incomplete picking, and the Charybdis of noisy but complete picking. Each of these extremes complicates the assignment process: incomplete peak-picking results in the loss of essential connectivities, while noisy picking conceals the true connectivities under a combinatiorial explosion of false positives. Intermediate processing can simplify the assignment process by preferentially removing false peaks from noisy peak lists. This is accomplished by requiring consensus between multiple NMR experiments, exploiting a priori information about NMR spectra, and drawing on empirical statistical distributions of chemical shift extracted from the BioMagResBank. Experienced NMR practitioners currently apply many of these techniques "by hand", which is tedious, and may appear arbitrary to the novice. To increase efficiency, we have created a systematic and automated approach to this process, known as APART. Automated pre-processing has three main advantages: reduced tedium, standardization, and pedagogy. In the hands of experienced spectroscopists, the main advantage is reduced tedium (a rapid increase in the ratio of true peaks to false peaks with minimal effort). When a project is passed from hand to hand, the main advantage is standardization. APART automatically documents the peak filtering process by archiving its original recommendations, the accompanying justifications, and whether a user accepted or overrode a given filtering recommendation. In the hands of a novice, this tool can reduce the stumbling block of learning to differentiate between real peaks and noise, by providing real-time examples of how such decisions are made.

  2. Surface characterization of carbohydrate microarrays.

    PubMed

    Scurr, David J; Horlacher, Tim; Oberli, Matthias A; Werz, Daniel B; Kroeck, Lenz; Bufali, Simone; Seeberger, Peter H; Shard, Alexander G; Alexander, Morgan R

    2010-11-16

    Carbohydrate microarrays are essential tools to determine the biological function of glycans. Here, we analyze a glycan array by time-of-flight secondary ion mass spectrometry (ToF-SIMS) to gain a better understanding of the physicochemical properties of the individual spots and to improve carbohydrate microarray quality. The carbohydrate microarray is prepared by piezo printing of thiol-terminated sugars onto a maleimide functionalized glass slide. The hyperspectral ToF-SIMS imaging data are analyzed by multivariate curve resolution (MCR) to discern secondary ions from regions of the array containing saccharide, linker, salts from the printing buffer, and the background linker chemistry. Analysis of secondary ions from the linker common to all of the sugar molecules employed reveals a relatively uniform distribution of the sugars within the spots formed from solutions with saccharide concentration of 0.4 mM and less, whereas a doughnut shape is often formed at higher-concentration solutions. A detailed analysis of individual spots reveals that in the larger spots the phosphate buffered saline (PBS) salts are heterogeneously distributed, apparently resulting in saccharide concentrated at the rim of the spots. A model of spot formation from the evaporating sessile drop is proposed to explain these observations. Saccharide spot diameters increase with saccharide concentration due to a reduction in surface tension of the saccharide solution compared to PBS. The multivariate analytical partial least squares (PLS) technique identifies ions from the sugars that in the complex ToF-SIMS spectra correlate with the binding of galectin proteins.

  3. Pre-processing of data coming from a laser-EMAT system for non-destructive testing of steel slabs.

    PubMed

    Sgarbi, Mirko; Colla, Valentina; Cateni, Sivia; Higson, Stuart

    2012-01-01

    Non destructive test systems are increasingly applied in the industrial context for their strong potentialities in improving and standardizing quality control. Especially in the intermediate manufacturing stages, early detection of defects on semi-finished products allow their direction towards later production processes according to their quality, with consequent considerable savings in time, energy, materials and work. However, the raw data coming from non destructive test systems are not always immediately suitable for sophisticated defect detection algorithms, due to noise and disturbances which are unavoidable, especially in harsh operating conditions, such as the ones which are typical of the steelmaking cycle. The paper describes some pre-processing operations which are required in order to exploit the data coming from a non destructive test system. Such a system is based on the joint exploitation of Laser and Electro-Magnetic Acoustic Transducer technologies and is applied to the detection of surface and sub-surface cracks in cold and hot steel slabs.

  4. THE ABRF MARG MICROARRAY SURVEY 2005: TAKING THE PULSE ON THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years microarray technology has evolved into a critical component of any discovery based program. Since 1999, the Association of Biomolecular Resource Facilities (ABRF) Microarray Research Group (MARG) has conducted biennial surveys designed to generate a pr...

  5. Living Cell Microarrays: An Overview of Concepts

    PubMed Central

    Jonczyk, Rebecca; Kurth, Tracy; Lavrentieva, Antonina; Walter, Johanna-Gabriela; Scheper, Thomas; Stahl, Frank

    2016-01-01

    Living cell microarrays are a highly efficient cellular screening system. Due to the low number of cells required per spot, cell microarrays enable the use of primary and stem cells and provide resolution close to the single-cell level. Apart from a variety of conventional static designs, microfluidic microarray systems have also been established. An alternative format is a microarray consisting of three-dimensional cell constructs ranging from cell spheroids to cells encapsulated in hydrogel. These systems provide an in vivo-like microenvironment and are preferably used for the investigation of cellular physiology, cytotoxicity, and drug screening. Thus, many different high-tech microarray platforms are currently available. Disadvantages of many systems include their high cost, the requirement of specialized equipment for their manufacture, and the poor comparability of results between different platforms. In this article, we provide an overview of static, microfluidic, and 3D cell microarrays. In addition, we describe a simple method for the printing of living cell microarrays on modified microscope glass slides using standard DNA microarray equipment available in most laboratories. Applications in research and diagnostics are discussed, e.g., the selective and sensitive detection of biomarkers. Finally, we highlight current limitations and the future prospects of living cell microarrays. PMID:27600077

  6. Transcriptome Analysis of Zebrafish Embryogenesis Using Microarrays

    PubMed Central

    Mathavan, Sinnakaruppan; Lee, Serene G. P; Mak, Alicia; Miller, Lance D; Murthy, Karuturi Radha Krishna; Govindarajan, Kunde R; Tong, Yan; Wu, Yi Lian; Lam, Siew Hong; Yang, Henry; Ruan, Yijun; Korzh, Vladimir; Gong, Zhiyuan; Liu, Edison T; Lufkin, Thomas

    2005-01-01

    Zebrafish (Danio rerio) is a well-recognized model for the study of vertebrate developmental genetics, yet at the same time little is known about the transcriptional events that underlie zebrafish embryogenesis. Here we have employed microarray analysis to study the temporal activity of developmentally regulated genes during zebrafish embryogenesis. Transcriptome analysis at 12 different embryonic time points covering five different developmental stages (maternal, blastula, gastrula, segmentation, and pharyngula) revealed a highly dynamic transcriptional profile. Hierarchical clustering, stage-specific clustering, and algorithms to detect onset and peak of gene expression revealed clearly demarcated transcript clusters with maximum gene activity at distinct developmental stages as well as co-regulated expression of gene groups involved in dedicated functions such as organogenesis. Our study also revealed a previously unidentified cohort of genes that are transcribed prior to the mid-blastula transition, a time point earlier than when the zygotic genome was traditionally thought to become active. Here we provide, for the first time to our knowledge, a comprehensive list of developmentally regulated zebrafish genes and their expression profiles during embryogenesis, including novel information on the temporal expression of several thousand previously uncharacterized genes. The expression data generated from this study are accessible to all interested scientists from our institute resource database (http://giscompute.gis.a-star.edu.sg/~govind/zebrafish/data_download.html). PMID:16132083

  7. THE ABRF-MARG MICROARRAY SURVEY 2004: TAKING THE PULSE OF THE MICROARRAY FIELD

    EPA Science Inventory

    Over the past several years, the field of microarrays has grown and evolved drastically. In its continued efforts to track this evolution, the ABRF-MARG has once again conducted a survey of international microarray facilities and individual microarray users. The goal of the surve...

  8. 2008 Microarray Research Group (MARG Survey): Sensing the State of Microarray Technology

    EPA Science Inventory

    Over the past several years, the field of microarrays has grown and evolved drastically. In its continued efforts to track this evolution and transformation, the ABRF-MARG has once again conducted a survey of international microarray facilities and individual microarray users. Th...

  9. Multisensor data fusion algorithm development

    SciTech Connect

    Yocky, D.A.; Chadwick, M.D.; Goudy, S.P.; Johnson, D.K.

    1995-12-01

    This report presents a two-year LDRD research effort into multisensor data fusion. We approached the problem by addressing the available types of data, preprocessing that data, and developing fusion algorithms using that data. The report reflects these three distinct areas. First, the possible data sets for fusion are identified. Second, automated registration techniques for imagery data are analyzed. Third, two fusion techniques are presented. The first fusion algorithm is based on the two-dimensional discrete wavelet transform. Using test images, the wavelet algorithm is compared against intensity modulation and intensity-hue-saturation image fusion algorithms that are available in commercial software. The wavelet approach outperforms the other two fusion techniques by preserving spectral/spatial information more precisely. The wavelet fusion algorithm was also applied to Landsat Thematic Mapper and SPOT panchromatic imagery data. The second algorithm is based on a linear-regression technique. We analyzed the technique using the same Landsat and SPOT data.

  10. Comparison of contamination of femoral heads and pre-processed bone chips during hip revision arthroplasty.

    PubMed

    Mathijssen, N M C; Sturm, P D; Pilot, P; Bloem, R M; Buma, P; Petit, P L; Schreurs, B W

    2013-12-01

    With bone impaction grafting, cancellous bone chips made from allograft femoral heads are impacted in a bone defect, which introduces an additional source of infection. The potential benefit of the use of pre-processed bone chips was investigated by comparing the bacterial contamination of bone chips prepared intraoperatively with the bacterial contamination of pre-processed bone chips at different stages in the surgical procedure. To investigate baseline contamination of the bone grafts, specimens were collected during 88 procedures before actual use or preparation of the bone chips: in 44 procedures intraoperatively prepared chips were used (Group A) and in the other 44 procedures pre-processed bone chips were used (Group B). In 64 of these procedures (32 using locally prepared bone chips and 32 using pre-processed bone chips) specimens were also collected later in the procedure to investigate contamination after use and preparation of the bone chips. In total, 8 procedures had one or more positive specimen(s) (12.5 %). Contamination rates were not significantly different between bone chips prepared at the operating theatre and pre-processed bone chips. In conclusion, there was no difference in bacterial contamination between bone chips prepared from whole femoral heads in the operating room and pre-processed bone chips, and therefore, both types of bone allografts are comparable with respect to risk of infection.

  11. Classification of Microarray Data Using Kernel Fuzzy Inference System.

    PubMed

    Kumar, Mukesh; Kumar Rath, Santanu

    2014-01-01

    The DNA microarray classification technique has gained more popularity in both research and practice. In real data analysis, such as microarray data, the dataset contains a huge number of insignificant and irrelevant features that tend to lose useful information. Classes with high relevance and feature sets with high significance are generally referred for the selected features, which determine the samples classification into their respective classes. In this paper, kernel fuzzy inference system (K-FIS) algorithm is applied to classify the microarray data (leukemia) using t-test as a feature selection method. Kernel functions are used to map original data points into a higher-dimensional (possibly infinite-dimensional) feature space defined by a (usually nonlinear) function ϕ through a mathematical process called the kernel trick. This paper also presents a comparative study for classification using K-FIS along with support vector machine (SVM) for different set of features (genes). Performance parameters available in the literature such as precision, recall, specificity, F-measure, ROC curve, and accuracy are considered to analyze the efficiency of the classification model. From the proposed approach, it is apparent that K-FIS model obtains similar results when compared with SVM model. This is an indication that the proposed approach relies on kernel function.

  12. MAGMA: analysis of two-channel microarrays made easy.

    PubMed

    Rehrauer, Hubert; Zoller, Stefan; Schlapbach, Ralph

    2007-07-01

    The web application MAGMA provides a simple and intuitive interface to identify differentially expressed genes from two-channel microarray data. While the underlying algorithms are not superior to those of similar web applications, MAGMA is particularly user friendly and can be used without prior training. The user interface guides the novice user through the most typical microarray analysis workflow consisting of data upload, annotation, normalization and statistical analysis. It automatically generates R-scripts that document MAGMA's entire data processing steps, thereby allowing the user to regenerate all results in his local R installation. The implementation of MAGMA follows the model-view-controller design pattern that strictly separates the R-based statistical data processing, the web-representation and the application logic. This modular design makes the application flexible and easily extendible by experts in one of the fields: statistical microarray analysis, web design or software development. State-of-the-art Java Server Faces technology was used to generate the web interface and to perform user input processing. MAGMA's object-oriented modular framework makes it easily extendible and applicable to other fields and demonstrates that modern Java technology is also suitable for rather small and concise academic projects. MAGMA is freely available at www.magma-fgcz.uzh.ch.

  13. Automatic design of decision-tree algorithms with evolutionary algorithms.

    PubMed

    Barros, Rodrigo C; Basgalupp, Márcio P; de Carvalho, André C P L F; Freitas, Alex A

    2013-01-01

    This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability to provide an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated over more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI datasets and 10 microarray gene expression datasets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART.

  14. Automatic image analysis and spot classification for detection of pathogenic Escherichia coli on glass slide DNA microarrays

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A computer algorithm was created to inspect scanned images from DNA microarray slides developed to rapidly detect and genotype E. Coli O157 virulent strains. The algorithm computes centroid locations for signal and background pixels in RGB space and defines a plane perpendicular to the line connect...

  15. Microarrays Made Simple: "DNA Chips" Paper Activity

    ERIC Educational Resources Information Center

    Barnard, Betsy

    2006-01-01

    DNA microarray technology is revolutionizing biological science. DNA microarrays (also called DNA chips) allow simultaneous screening of many genes for changes in expression between different cells. Now researchers can obtain information about genes in days or weeks that used to take months or years. The paper activity described in this article…

  16. Tissue Microarrays in Clinical Oncology

    PubMed Central

    Voduc, David; Kenney, Challayne; Nielsen, Torsten O.

    2008-01-01

    The tissue microarray is a recently-implemented, high-throughput technology for the analysis of molecular markers in oncology. This research tool permits the rapid assessment of a biomarker in thousands of tumor samples, using commonly available laboratory assays such as immunohistochemistry and in-situ hybridization. Although introduced less than a decade ago, the TMA has proven to be invaluable in the study of tumor biology, the development of diagnostic tests, and the investigation of oncological biomarkers. This review describes the impact of TMA-based research in clinical oncology and its potential future applications. Technical aspects of TMA construction, and the advantages and disadvantages inherent to this technology are also discussed. PMID:18314063

  17. Analysis of DNA microarray expression data.

    PubMed

    Simon, Richard

    2009-06-01

    DNA microarrays are powerful tools for studying biological mechanisms and for developing prognostic and predictive classifiers for identifying the patients who require treatment and are best candidates for specific treatments. Because microarrays produce so much data from each specimen, they offer great opportunities for discovery and great dangers or producing misleading claims. Microarray based studies require clear objectives for selecting cases and appropriate analysis methods. Effective analysis of microarray data, where the number of measured variables is orders of magnitude greater than the number of cases, requires specialized statistical methods which have recently been developed. Recent literature reviews indicate that serious problems of analysis exist a substantial proportion of publications. This manuscript attempts to provide a non-technical summary of the key principles of statistical design and analysis for studies that utilize microarray expression profiling.

  18. Microarray Applications in Microbial Ecology Research.

    SciTech Connect

    Gentry, T.; Schadt, C.; Zhou, J.

    2006-04-06

    Microarray technology has the unparalleled potential tosimultaneously determine the dynamics and/or activities of most, if notall, of the microbial populations in complex environments such as soilsand sediments. Researchers have developed several types of arrays thatcharacterize the microbial populations in these samples based on theirphylogenetic relatedness or functional genomic content. Several recentstudies have used these microarrays to investigate ecological issues;however, most have only analyzed a limited number of samples withrelatively few experiments utilizing the full high-throughput potentialof microarray analysis. This is due in part to the unique analyticalchallenges that these samples present with regard to sensitivity,specificity, quantitation, and data analysis. This review discussesspecific applications of microarrays to microbial ecology research alongwith some of the latest studies addressing the difficulties encounteredduring analysis of complex microbial communities within environmentalsamples. With continued development, microarray technology may ultimatelyachieve its potential for comprehensive, high-throughput characterizationof microbial populations in near real-time.

  19. In control: systematic assessment of microarray performance.

    PubMed

    van Bakel, Harm; Holstege, Frank C P

    2004-10-01

    Expression profiling using DNA microarrays is a powerful technique that is widely used in the life sciences. How reliable are microarray-derived measurements? The assessment of performance is challenging because of the complicated nature of microarray experiments and the many different technology platforms. There is a mounting call for standards to be introduced, and this review addresses some of the issues that are involved. Two important characteristics of performance are accuracy and precision. The assessment of these factors can be either for the purpose of technology optimization or for the evaluation of individual microarray hybridizations. Microarray performance has been evaluated by at least four approaches in the past. Here, we argue that external RNA controls offer the most versatile system for determining performance and describe how such standards could be implemented. Other uses of external controls are discussed, along with the importance of probe sequence availability and the quantification of labelled material.

  20. Chaotic mixer improves microarray hybridization.

    PubMed

    McQuain, Mark K; Seale, Kevin; Peek, Joel; Fisher, Timothy S; Levy, Shawn; Stremler, Mark A; Haselton, Frederick R

    2004-02-15

    Hybridization is an important aspect of microarray experimental design which influences array signal levels and the repeatability of data within an array and across different arrays. Current methods typically require 24h and use target inefficiently. In these studies, we compare hybridization signals obtained in conventional static hybridization, which depends on diffusional target delivery, with signals obtained in a dynamic hybridization chamber, which employs a fluid mixer based on chaotic advection theory to deliver targets across a conventional glass slide array. Microarrays were printed with a pattern of 102 identical probe spots containing a 65-mer oligonucleotide capture probe. Hybridization of a 725-bp fluorescently labeled target was used to measure average target hybridization levels, local signal-to-noise ratios, and array hybridization uniformity. Dynamic hybridization for 1h with 1 or 10ng of target DNA increased hybridization signal intensities approximately threefold over a 24-h static hybridization. Similarly, a 10- or 60-min dynamic hybridization of 10ng of target DNA increased hybridization signal intensities fourfold over a 24h static hybridization. In time course studies, static hybridization reached a maximum within 8 to 12h using either 1 or 10ng of target. In time course studies using the dynamic hybridization chamber, hybridization using 1ng of target increased to a maximum at 4h and that using 10ng of target did not vary over the time points tested. In comparison to static hybridization, dynamic hybridization reduced the signal-to-noise ratios threefold and reduced spot-to-spot variation twofold. Therefore, we conclude that dynamic hybridization based on a chaotic mixer design improves both the speed of hybridization and the maximum level of hybridization while increasing signal-to-noise ratios and reducing spot-to-spot variation.

  1. Grouping preprocess for haplotype inference from SNP and CNV data

    NASA Astrophysics Data System (ADS)

    Shindo, Hiroyuki; Chigira, Hiroshi; Nagaoka, Tomoyo; Kamatani, Naoyuki; Inoue, Masato

    2009-12-01

    The method of statistical haplotype inference is an indispensable technique in the field of medical science. The authors previously reported Hardy-Weinberg equilibrium-based haplotype inference that could manage single nucleotide polymorphism (SNP) data. We recently extended the method to cover copy number variation (CNV) data. Haplotype inference from mixed data is important because SNPs and CNVs are occasionally in linkage disequilibrium. The idea underlying the proposed method is simple, but the algorithm for it needs to be quite elaborate to reduce the calculation cost. Consequently, we have focused on the details on the algorithm in this study. Although the main advantage of the method is accuracy, in that it does not use any approximation, its main disadvantage is still the calculation cost, which is sometimes intractable for large data sets with missing values.

  2. Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability.

    PubMed

    Uziela, Karolis; Honkela, Antti

    2015-01-01

    Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all difficult. Although RNA-sequencing (RNA-seq) is attracting the most attention, at present, the rate of new microarray studies submitted to public databases far exceeds the rate of new RNA-seq studies. There is clearly a need for methods that make it easier to combine data from different technologies. In this paper, we propose a new method for processing RNA-seq data that yields gene expression estimates that are much more similar to corresponding estimates from microarray data, hence greatly improving cross-platform comparability. The method we call PREBS is based on estimating the expression from RNA-seq reads overlapping the microarray probe regions, and processing these estimates with standard microarray summarisation algorithms. Using paired microarray and RNA-seq samples from TCGA LAML data set we show that PREBS expression estimates derived from RNA-seq are more similar to microarray-based expression estimates than those from other RNA-seq processing methods. In an experiment to retrieve paired microarray samples from a database using an RNA-seq query sample, gene signatures defined based on PREBS expression estimates were found to be much more accurate than those from other methods. PREBS also allows new ways of using RNA-seq data, such as expression estimation for microarray probe sets. An implementation of the proposed method is available in the Bioconductor package "prebs."

  3. EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration

    PubMed Central

    Forment, Javier; Gilabert, Francisco; Robles, Antonio; Conejero, Vicente; Nuez, Fernando; Blanca, Jose M

    2008-01-01

    Background Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation. Results We have created EST2uni, an integrated, highly-configurable EST analysis pipeline and data mining software package that automates the pre-processing, clustering, annotation, database creation, and data mining of EST collections. The pipeline uses standard EST analysis tools and the software has a modular design to facilitate the addition of new analytical methods and their configuration. Currently implemented analyses include functional and structural annotation, SNP and microsatellite discovery, integration of previously known genetic marker data and gene expression results, and assistance in cDNA microarray design. It can be run in parallel in a PC cluster in order to reduce the time necessary for the analysis. It also creates a web site linked to the database, showing collection statistics, with complex query capabilities and tools for data mining and retrieval. Conclusion The software package presented here provides an efficient and complete bioinformatics tool for the management of EST collections which is very easy to adapt to the local needs of different EST projects. The code is freely available under the GPL license and can be obtained at . This site also provides detailed instructions for

  4. PAA: an R/bioconductor package for biomarker discovery with protein microarrays

    PubMed Central

    Turewicz, Michael; Ahrens, Maike; May, Caroline; Marcus, Katrin; Eisenacher, Martin

    2016-01-01

    Summary: The R/Bioconductor package Protein Array Analyzer (PAA) facilitates a flexible analysis of protein microarrays for biomarker discovery (esp., ProtoArrays). It provides a complete data analysis workflow including preprocessing and quality control, uni- and multivariate feature selection as well as several different plots and results tables to outline and evaluate the analysis results. As a main feature, PAA’s multivariate feature selection methods are based on recursive feature elimination (e.g. SVM-recursive feature elimination, SVM-RFE) with stability ensuring strategies such as ensemble feature selection. This enables PAA to detect stable and reliable biomarker candidate panels. Availability and implementation: PAA is freely available (BSD 3-clause license) from http://www.bioconductor.org/packages/PAA/. Contact: michael.turewicz@rub.de or martin.eisenacher@rub.de PMID:26803161

  5. Generation of attributes for learning algorithms

    SciTech Connect

    Hu, Yuh-Jyh; Kibler, D.

    1996-12-31

    Inductive algorithms rely strongly on their representational biases. Constructive induction can mitigate representational inadequacies. This paper introduces the notion of a relative gain measure and describes a new constructive induction algorithm (GALA) which is independent of the learning algorithm. Unlike most previous research on constructive induction, our methods are designed as preprocessing step before standard machine learning algorithms are applied. We present the results which demonstrate the effectiveness of GALA on artificial and real domains for several learners: C4.5, CN2, perceptron and backpropagation.

  6. Microarray-integrated optoelectrofluidic immunoassay system.

    PubMed

    Han, Dongsik; Park, Je-Kyun

    2016-05-01

    A microarray-based analytical platform has been utilized as a powerful tool in biological assay fields. However, an analyte depletion problem due to the slow mass transport based on molecular diffusion causes low reaction efficiency, resulting in a limitation for practical applications. This paper presents a novel method to improve the efficiency of microarray-based immunoassay via an optically induced electrokinetic phenomenon by integrating an optoelectrofluidic device with a conventional glass slide-based microarray format. A sample droplet was loaded between the microarray slide and the optoelectrofluidic device on which a photoconductive layer was deposited. Under the application of an AC voltage, optically induced AC electroosmotic flows caused by a microarray-patterned light actively enhanced the mass transport of target molecules at the multiple assay spots of the microarray simultaneously, which reduced tedious reaction time from more than 30 min to 10 min. Based on this enhancing effect, a heterogeneous immunoassay with a tiny volume of sample (5 μl) was successfully performed in the microarray-integrated optoelectrofluidic system using immunoglobulin G (IgG) and anti-IgG, resulting in improved efficiency compared to the static environment. Furthermore, the application of multiplex assays was also demonstrated by multiple protein detection.

  7. MARS: Microarray analysis, retrieval, and storage system

    PubMed Central

    Maurer, Michael; Molidor, Robert; Sturn, Alexander; Hartler, Juergen; Hackl, Hubert; Stocker, Gernot; Prokesch, Andreas; Scheideler, Marcel; Trajanoski, Zlatko

    2005-01-01

    Background Microarray analysis has become a widely used technique for the study of gene-expression patterns on a genomic scale. As more and more laboratories are adopting microarray technology, there is a need for powerful and easy to use microarray databases facilitating array fabrication, labeling, hybridization, and data analysis. The wealth of data generated by this high throughput approach renders adequate database and analysis tools crucial for the pursuit of insights into the transcriptomic behavior of cells. Results MARS (Microarray Analysis and Retrieval System) provides a comprehensive MIAME supportive suite for storing, retrieving, and analyzing multi color microarray data. The system comprises a laboratory information management system (LIMS), a quality control management, as well as a sophisticated user management system. MARS is fully integrated into an analytical pipeline of microarray image analysis, normalization, gene expression clustering, and mapping of gene expression data onto biological pathways. The incorporation of ontologies and the use of MAGE-ML enables an export of studies stored in MARS to public repositories and other databases accepting these documents. Conclusion We have developed an integrated system tailored to serve the specific needs of microarray based research projects using a unique fusion of Web based and standalone applications connected to the latest J2EE application server technology. The presented system is freely available for academic and non-profit institutions. More information can be found at . PMID:15836795

  8. On the Development of Parafoveal Preprocessing: Evidence from the Incremental Boundary Paradigm

    PubMed Central

    Marx, Christina; Hutzler, Florian; Schuster, Sarah; Hawelka, Stefan

    2016-01-01

    Parafoveal preprocessing of upcoming words and the resultant preview benefit are key aspects of fluent reading. Evidence regarding the development of parafoveal preprocessing during reading acquisition, however, is scarce. The present developmental (cross-sectional) eye tracking study estimated the magnitude of parafoveal preprocessing of beginning readers with a novel variant of the classical boundary paradigm. Additionally, we assessed the association of parafoveal preprocessing with several reading-related psychometric measures. The participants were children learning to read the regular German orthography with about 1, 3, and 5 years of formal reading instruction (Grade 2, 4, and 6, respectively). We found evidence of parafoveal preprocessing in each Grade. However, an effective use of parafoveal information was related to the individual reading fluency of the participants (i.e., the reading rate expressed as words-per-minute) which substantially overlapped between the Grades. The size of the preview benefit was furthermore associated with the children’s performance in rapid naming tasks and with their performance in a pseudoword reading task. The latter task assessed the children’s efficiency in phonological decoding and our findings show that the best decoders exhibited the largest preview benefit. PMID:27148123

  9. A hybrid preprocessing method using geometry based diffusion and elective enhancement filtering for pulmonary nodule detection

    NASA Astrophysics Data System (ADS)

    Dhara, Ashis K.; Mukhopadhyay, Sudipta

    2012-03-01

    The computer aided diagnostic (CAD) system has been developed to assist radiologist for early detection and analysis of lung nodules. For pulmonary nodule detection, image preprocessing is required to remove the anatomical structure of lung parenchyma and to enhance the visibility of pulmonary nodules. In this paper a hybrid preprocessing technique using geometry based diffusion and selective enhancement filtering have been proposed. This technique provides a unified preprocessing framework for solid nodule as well as ground glass opacity (GGO) nodules. Geometry based diffusion is applied to smooth the images by preserving the boundary. In order to improve the sensitivity of pulmonary nodule detection, selective enhancement filter is used to highlight blob like structure. But selective enhancement filter sometimes enhances the structures like blood vessel and airways other than nodule and results in large number of false positive. In first step, geometry based diffusion (GBD) is applied for reduction of false positive and in second step, selective enhancement filtering is used for further reduction of false negative. Geometry based diffusion and selective enhancement filtering has been used as preprocessing step separately but their combined effect was not investigated earlier. This hybrid preprocessing approach is suitable for accurate calculation of voxel based features. The proposed method has been validated on one public database named Lung Image Database Consortium (LIDC) containing 50 nodules (30 solid and 20 GGO nodule) from 30 subjects and one private database containing 40 nodules (25 solid and 15 GGO nodule) from 30 subjects.

  10. Preprocessing Inconsistent Linear System for a Meaningful Least Squares Solution

    NASA Technical Reports Server (NTRS)

    Sen, Syamal K.; Shaykhian, Gholam Ali

    2011-01-01

    Mathematical models of many physical/statistical problems are systems of linear equations. Due to measurement and possible human errors/mistakes in modeling/data, as well as due to certain assumptions to reduce complexity, inconsistency (contradiction) is injected into the model, viz. the linear system. While any inconsistent system irrespective of the degree of inconsistency has always a least-squares solution, one needs to check whether an equation is too much inconsistent or, equivalently too much contradictory. Such an equation will affect/distort the least-squares solution to such an extent that renders it unacceptable/unfit to be used in a real-world application. We propose an algorithm which (i) prunes numerically redundant linear equations from the system as these do not add any new information to the model, (ii) detects contradictory linear equations along with their degree of contradiction (inconsistency index), (iii) removes those equations presumed to be too contradictory, and then (iv) obtain the minimum norm least-squares solution of the acceptably inconsistent reduced linear system. The algorithm presented in Matlab reduces the computational and storage complexities and also improves the accuracy of the solution. It also provides the necessary warning about the existence of too much contradiction in the model. In addition, we suggest a thorough relook into the mathematical modeling to determine the reason why unacceptable contradiction has occurred thus prompting us to make necessary corrections/modifications to the models - both mathematical and, if necessary, physical.

  11. DNA Microarrays in Herbal Drug Research

    PubMed Central

    Chavan, Preeti; Joshi, Kalpana; Patwardhan, Bhushan

    2006-01-01

    Natural products are gaining increased applications in drug discovery and development. Being chemically diverse they are able to modulate several targets simultaneously in a complex system. Analysis of gene expression becomes necessary for better understanding of molecular mechanisms. Conventional strategies for expression profiling are optimized for single gene analysis. DNA microarrays serve as suitable high throughput tool for simultaneous analysis of multiple genes. Major practical applicability of DNA microarrays remains in DNA mutation and polymorphism analysis. This review highlights applications of DNA microarrays in pharmacodynamics, pharmacogenomics, toxicogenomics and quality control of herbal drugs and extracts. PMID:17173108

  12. Progress in the application of DNA microarrays.

    PubMed Central

    Lobenhofer, E K; Bushel, P R; Afshari, C A; Hamadeh, H K

    2001-01-01

    Microarray technology has been applied to a variety of different fields to address fundamental research questions. The use of microarrays, or DNA chips, to study the gene expression profiles of biologic samples began in 1995. Since that time, the fundamental concepts behind the chip, the technology required for making and using these chips, and the multitude of statistical tools for analyzing the data have been extensively reviewed. For this reason, the focus of this review will be not on the technology itself but on the application of microarrays as a research tool and the future challenges of the field. PMID:11673116

  13. Gene microarray data analysis using parallel point-symmetry-based clustering.

    PubMed

    Sarkar, Anasua; Maulik, Ujjwal

    2015-01-01

    Identification of co-expressed genes is the central goal in microarray gene expression analysis. Point-symmetry-based clustering is an important unsupervised learning technique for recognising symmetrical convex- or non-convex-shaped clusters. To enable fast clustering of large microarray data, we propose a distributed time-efficient scalable approach for point-symmetry-based K-Means algorithm. A natural basis for analysing gene expression data using symmetry-based algorithm is to group together genes with similar symmetrical expression patterns. This new parallel implementation also satisfies linear speedup in timing without sacrificing the quality of clustering solution on large microarray data sets. The parallel point-symmetry-based K-Means algorithm is compared with another new parallel symmetry-based K-Means and existing parallel K-Means over eight artificial and benchmark microarray data sets, to demonstrate its superiority, in both timing and validity. The statistical analysis is also performed to establish the significance of this message-passing-interface based point-symmetry K-Means implementation. We also analysed the biological relevance of clustering solutions.

  14. Mutational analysis using oligonucleotide microarrays

    PubMed Central

    Hacia, J.; Collins, F.

    1999-01-01

    The development of inexpensive high throughput methods to identify individual DNA sequence differences is important to the future growth of medical genetics. This has become increasingly apparent as epidemiologists, pathologists, and clinical geneticists focus more attention on the molecular basis of complex multifactorial diseases. Such undertakings will rely upon genetic maps based upon newly discovered, common, single nucleotide polymorphisms. Furthermore, candidate gene approaches used in identifying disease associated genes necessitate screening large sequence blocks for changes tracking with the disease state. Even after such genes are isolated, large scale mutational analyses will often be needed for risk assessment studies to define the likely medical consequences of carrying a mutated gene.
This review concentrates on the use of oligonucleotide arrays for hybridisation based comparative sequence analysis. Technological advances within the past decade have made it possible to apply this technology to many different aspects of medical genetics. These applications range from the detection and scoring of single nucleotide polymorphisms to mutational analysis of large genes. Although we discuss published scientific reports, unpublished work from the private sector12 could also significantly affect the future of this technology.


Keywords: mutational analysis; oligonucleotide microarrays; DNA chips PMID:10528850

  15. Integrating Microarray Data and GRNs.

    PubMed

    Koumakis, L; Potamias, G; Tsiknakis, M; Zervakis, M; Moustakis, V

    2016-01-01

    With the completion of the Human Genome Project and the emergence of high-throughput technologies, a vast amount of molecular and biological data are being produced. Two of the most important and significant data sources come from microarray gene-expression experiments and respective databanks (e,g., Gene Expression Omnibus-GEO (http://www.ncbi.nlm.nih.gov/geo)), and from molecular pathways and Gene Regulatory Networks (GRNs) stored and curated in public (e.g., Kyoto Encyclopedia of Genes and Genomes-KEGG (http://www.genome.jp/kegg/pathway.html), Reactome (http://www.reactome.org/ReactomeGWT/entrypoint.html)) as well as in commercial repositories (e.g., Ingenuity IPA (http://www.ingenuity.com/products/ipa)). The association of these two sources aims to give new insight in disease understanding and reveal new molecular targets in the treatment of specific phenotypes.Three major research lines and respective efforts that try to utilize and combine data from both of these sources could be identified, namely: (1) de novo reconstruction of GRNs, (2) identification of Gene-signatures, and (3) identification of differentially expressed GRN functional paths (i.e., sub-GRN paths that distinguish between different phenotypes). In this chapter, we give an overview of the existing methods that support the different types of gene-expression and GRN integration with a focus on methodologies that aim to identify phenotype-discriminant GRNs or subnetworks, and we also present our methodology.

  16. Algorithms and Algorithmic Languages.

    ERIC Educational Resources Information Center

    Veselov, V. M.; Koprov, V. M.

    This paper is intended as an introduction to a number of problems connected with the description of algorithms and algorithmic languages, particularly the syntaxes and semantics of algorithmic languages. The terms "letter, word, alphabet" are defined and described. The concept of the algorithm is defined and the relation between the algorithm and…

  17. Preprocessed barley, rye, and triticale as a feedstock for an integrated fuel ethanol-feedlot plant

    SciTech Connect

    Sosulski, K.; Wang, Sunmin; Ingledew, W.M.

    1997-12-31

    Rye, triticale, and barley were evaluated as starch feedstock to replace wheat for ethanol production. Preprocessing of grain by abrasion on a Satake mill reduced fiber and increased starch concentrations in feed-stock for fermentations. Higher concentrations of starch in flours from preprocessed cereal grains would increase plant throughput by 8-23% since more starch is processed in the same weight of feedstock. Increased concentrations of starch for fermentation resulted in higher concentrations of ethanol in beer. Energy requirements to produce one L of ethanol from preprocessed grains were reduced, the natural gas by 3.5-11.4%, whereas power consumption was reduced by 5.2-15.6%. 7 refs., 7 figs., 4 tabs.

  18. Optimization of Preprocessing and Densification of Sorghum Stover at Full-scale Operation

    SciTech Connect

    Neal A. Yancey; Jaya Shankar Tumuluru; Craig C. Conner; Christopher T. Wright

    2011-08-01

    Transportation costs can be a prohibitive step in bringing biomass to a preprocessing location or biofuel refinery. One alternative to transporting biomass in baled or loose format to a preprocessing location, is to utilize a mobile preprocessing system that can be relocated to various locations where biomass is stored, preprocess and densify the biomass, then ship it to the refinery as needed. The Idaho National Laboratory has a full scale 'Process Demonstration Unit' PDU which includes a stage 1 grinder, hammer mill, drier, pellet mill, and cooler with the associated conveyance system components. Testing at bench and pilot scale has been conducted to determine effects of moisture on preprocessing, crop varieties on preprocessing efficiency and product quality. The INLs PDU provides an opportunity to test the conclusions made at the bench and pilot scale on full industrial scale systems. Each component of the PDU is operated from a central operating station where data is collected to determine power consumption rates for each step in the process. The power for each electrical motor in the system is monitored from the control station to monitor for problems and determine optimal conditions for the system performance. The data can then be viewed to observe how changes in biomass input parameters (moisture and crop type for example), mechanical changes (screen size, biomass drying, pellet size, grinding speed, etc.,), or other variations effect the power consumption of the system. Sorgum in four foot round bales was tested in the system using a series of 6 different screen sizes including: 3/16 in., 1 in., 2 in., 3 in., 4 in., and 6 in. The effect on power consumption, product quality, and production rate were measured to determine optimal conditions.

  19. Protein Microarrays: Novel Developments and Applications

    PubMed Central

    Berrade, Luis; Garcia, Angie E.

    2011-01-01

    Protein microarray technology possesses some of the greatest potential for providing direct information on protein function and potential drug targets. For example, functional protein microarrays are ideal tools suited for the mapping of biological pathways. They can be used to study most major types of interactions and enzymatic activities that take place in biochemical pathways and have been used for the analysis of simultaneous multiple biomolecular interactions involving protein-protein, protein-lipid, protein-DNA and protein-small molecule interactions. Because of this unique ability to analyze many kinds of molecular interactions en masse, the requirement of very small sample amount and the potential to be miniaturized and automated, protein microarrays are extremely well suited for protein profiling, drug discovery, drug target identification and clinical prognosis and diagnosis. The aim of this review is to summarize the most recent developments in the production, applications and analysis of protein microarrays. PMID:21116694

  20. Boosting model performance and interpretation by entangling preprocessing selection and variable selection.

    PubMed

    Gerretzen, Jan; Szymańska, Ewa; Bart, Jacob; Davies, Antony N; van Manen, Henk-Jan; van den Heuvel, Edwin R; Jansen, Jeroen J; Buydens, Lutgarde M C

    2016-09-28

    The aim of data preprocessing is to remove data artifacts-such as a baseline, scatter effects or noise-and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames. In that approach, the focus was solely on improving the predictive performance of the chemometric model. This is, however, only one of the two relevant criteria in modeling: interpretation of the model results can be just as important. Variable selection is often used to achieve such interpretation. Data artifacts, however, may hamper proper variable selection by masking the true relevant variables. The choice of preprocessing therefore has a huge impact on the outcome of variable selection methods and may thus hamper an objective interpretation of the final model. To enhance such objective interpretation, we here integrate variable selection into the preprocessing selection approach that is based on DoE. We show that the entanglement of preprocessing selection and variable selection not only improves the interpretation, but also the predictive performance of the model. This is achieved by analyzing several experimental data sets of which the true relevant variables are available as prior knowledge. We show that a selection of variables is provided that complies more with the true informative variables compared to individual optimization of both model aspects. Importantly, the approach presented in this work is generic. Different types of models (e.g. PCR, PLS, …) can be incorporated into it, as well as different variable selection methods and different preprocessing methods, according to the taste and experience of

  1. Influence of Hemp Fibers Pre-processing on Low Density Polyethylene Matrix Composites Properties

    NASA Astrophysics Data System (ADS)

    Kukle, S.; Vidzickis, R.; Zelca, Z.; Belakova, D.; Kajaks, J.

    2016-04-01

    In present research with short hemp fibres reinforced LLDPE matrix composites with fibres content in a range from 30 to 50 wt% subjected to four different pre-processing technologies were produced and such their properties as tensile strength and elongation at break, tensile modulus, melt flow index, micro hardness and water absorption dynamics were investigated. Capillary viscosimetry was used for fluidity evaluation and melt flow index (MFI) evaluated for all variants. MFI of fibres of two pre-processing variants were high enough to increase hemp fibres content from 30 to 50 wt% with moderate increase of water sorption capability.

  2. Contributions to Statistical Problems Related to Microarray Data

    ERIC Educational Resources Information Center

    Hong, Feng

    2009-01-01

    Microarray is a high throughput technology to measure the gene expression. Analysis of microarray data brings many interesting and challenging problems. This thesis consists three studies related to microarray data. First, we propose a Bayesian model for microarray data and use Bayes Factors to identify differentially expressed genes. Second, we…

  3. PATMA: parser of archival tissue microarray.

    PubMed

    Roszkowiak, Lukasz; Lopez, Carlos

    2016-01-01

    Tissue microarrays are commonly used in modern pathology for cancer tissue evaluation, as it is a very potent technique. Tissue microarray slides are often scanned to perform computer-aided histopathological analysis of the tissue cores. For processing the image, splitting the whole virtual slide into images of individual cores is required. The only way to distinguish cores corresponding to specimens in the tissue microarray is through their arrangement. Unfortunately, distinguishing the correct order of cores is not a trivial task as they are not labelled directly on the slide. The main aim of this study was to create a procedure capable of automatically finding and extracting cores from archival images of the tissue microarrays. This software supports the work of scientists who want to perform further image processing on single cores. The proposed method is an efficient and fast procedure, working in fully automatic or semi-automatic mode. A total of 89% of punches were correctly extracted with automatic selection. With an addition of manual correction, it is possible to fully prepare the whole slide image for extraction in 2 min per tissue microarray. The proposed technique requires minimum skill and time to parse big array of cores from tissue microarray whole slide image into individual core images.

  4. PATMA: parser of archival tissue microarray

    PubMed Central

    2016-01-01

    Tissue microarrays are commonly used in modern pathology for cancer tissue evaluation, as it is a very potent technique. Tissue microarray slides are often scanned to perform computer-aided histopathological analysis of the tissue cores. For processing the image, splitting the whole virtual slide into images of individual cores is required. The only way to distinguish cores corresponding to specimens in the tissue microarray is through their arrangement. Unfortunately, distinguishing the correct order of cores is not a trivial task as they are not labelled directly on the slide. The main aim of this study was to create a procedure capable of automatically finding and extracting cores from archival images of the tissue microarrays. This software supports the work of scientists who want to perform further image processing on single cores. The proposed method is an efficient and fast procedure, working in fully automatic or semi-automatic mode. A total of 89% of punches were correctly extracted with automatic selection. With an addition of manual correction, it is possible to fully prepare the whole slide image for extraction in 2 min per tissue microarray. The proposed technique requires minimum skill and time to parse big array of cores from tissue microarray whole slide image into individual core images. PMID:27920955

  5. The Impact of Photobleaching on Microarray Analysis

    PubMed Central

    von der Haar, Marcel; Preuß, John-Alexander; von der Haar, Kathrin; Lindner, Patrick; Scheper, Thomas; Stahl, Frank

    2015-01-01

    DNA-Microarrays have become a potent technology for high-throughput analysis of genetic regulation. However, the wide dynamic range of signal intensities of fluorophore-based microarrays exceeds the dynamic range of a single array scan by far, thus limiting the key benefit of microarray technology: parallelization. The implementation of multi-scan techniques represents a promising approach to overcome these limitations. These techniques are, in turn, limited by the fluorophores’ susceptibility to photobleaching when exposed to the scanner’s laser light. In this paper the photobleaching characteristics of cyanine-3 and cyanine-5 as part of solid state DNA microarrays are studied. The effects of initial fluorophore intensity as well as laser scanner dependent variables such as the photomultiplier tube’s voltage on bleaching and imaging are investigated. The resulting data is used to develop a model capable of simulating the expected degree of signal intensity reduction caused by photobleaching for each fluorophore individually, allowing for the removal of photobleaching-induced, systematic bias in multi-scan procedures. Single-scan applications also benefit as they rely on pre-scans to determine the optimal scanner settings. These findings constitute a step towards standardization of microarray experiments and analysis and may help to increase the lab-to-lab comparability of microarray experiment results. PMID:26378589

  6. Construction of citrus gene coexpression networks from microarray data using random matrix theory.

    PubMed

    Du, Dongliang; Rawat, Nidhi; Deng, Zhanao; Gmitter, Fred G

    2015-01-01

    After the sequencing of citrus genomes, gene function annotation is becoming a new challenge. Gene coexpression analysis can be employed for function annotation using publicly available microarray data sets. In this study, 230 sweet orange (Citrus sinensis) microarrays were used to construct seven coexpression networks, including one condition-independent and six condition-dependent (Citrus canker, Huanglongbing, leaves, flavedo, albedo, and flesh) networks. In total, these networks contain 37 633 edges among 6256 nodes (genes), which accounts for 52.11% measurable genes of the citrus microarray. Then, these networks were partitioned into functional modules using the Markov Cluster Algorithm. Significantly enriched Gene Ontology biological process terms and KEGG pathway terms were detected for 343 and 60 modules, respectively. Finally, independent verification of these networks was performed using another expression data of 371 genes. This study provides new targets for further functional analyses in citrus.

  7. Construction of citrus gene coexpression networks from microarray data using random matrix theory

    PubMed Central

    Du, Dongliang; Rawat, Nidhi; Deng, Zhanao; Gmitter, Fred G.

    2015-01-01

    After the sequencing of citrus genomes, gene function annotation is becoming a new challenge. Gene coexpression analysis can be employed for function annotation using publicly available microarray data sets. In this study, 230 sweet orange (Citrus sinensis) microarrays were used to construct seven coexpression networks, including one condition-independent and six condition-dependent (Citrus canker, Huanglongbing, leaves, flavedo, albedo, and flesh) networks. In total, these networks contain 37 633 edges among 6256 nodes (genes), which accounts for 52.11% measurable genes of the citrus microarray. Then, these networks were partitioned into functional modules using the Markov Cluster Algorithm. Significantly enriched Gene Ontology biological process terms and KEGG pathway terms were detected for 343 and 60 modules, respectively. Finally, independent verification of these networks was performed using another expression data of 371 genes. This study provides new targets for further functional analyses in citrus. PMID:26504573

  8. Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning

    PubMed Central

    Maulik, Ujjwal

    2014-01-01

    Microarrays have now gone from obscurity to being almost ubiquitous in biological research. At the same time, the statistical methodology for microarray analysis has progressed from simple visual assessments of results to novel algorithms for analyzing changes in expression profiles. In a micro-RNA (miRNA) or gene-expression profiling experiment, the expression levels of thousands of genes/miRNAs are simultaneously monitored to study the effects of certain treatments, diseases, and developmental stages on their expressions. Microarray-based gene expression profiling can be used to identify genes, whose expressions are changed in response to pathogens or other organisms by comparing gene expression in infected to that in uninfected cells or tissues. Recent studies have revealed that patterns of altered microarray expression profiles in cancer can serve as molecular biomarkers for tumor diagnosis, prognosis of disease-specific outcomes, and prediction of therapeutic responses. Microarray data sets containing expression profiles of a number of miRNAs or genes are used to identify biomarkers, which have dysregulation in normal and malignant tissues. However, small sample size remains a bottleneck to design successful classification methods. On the other hand, adequate number of microarray data that do not have clinical knowledge can be employed as additional source of information. In this paper, a combination of kernelized fuzzy rough set (KFRS) and semisupervised support vector machine (S3VM) is proposed for predicting cancer biomarkers from one miRNA and three gene expression data sets. Biomarkers are discovered employing three feature selection methods, including KFRS. The effectiveness of the proposed KFRS and S3VM combination on the microarray data sets is demonstrated, and the cancer biomarkers identified from miRNA data are reported. Furthermore, biological significance tests are conducted for miRNA cancer biomarkers. PMID:27170887

  9. Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning.

    PubMed

    Chakraborty, Debasis; Maulik, Ujjwal

    2014-01-01

    Microarrays have now gone from obscurity to being almost ubiquitous in biological research. At the same time, the statistical methodology for microarray analysis has progressed from simple visual assessments of results to novel algorithms for analyzing changes in expression profiles. In a micro-RNA (miRNA) or gene-expression profiling experiment, the expression levels of thousands of genes/miRNAs are simultaneously monitored to study the effects of certain treatments, diseases, and developmental stages on their expressions. Microarray-based gene expression profiling can be used to identify genes, whose expressions are changed in response to pathogens or other organisms by comparing gene expression in infected to that in uninfected cells or tissues. Recent studies have revealed that patterns of altered microarray expression profiles in cancer can serve as molecular biomarkers for tumor diagnosis, prognosis of disease-specific outcomes, and prediction of therapeutic responses. Microarray data sets containing expression profiles of a number of miRNAs or genes are used to identify biomarkers, which have dysregulation in normal and malignant tissues. However, small sample size remains a bottleneck to design successful classification methods. On the other hand, adequate number of microarray data that do not have clinical knowledge can be employed as additional source of information. In this paper, a combination of kernelized fuzzy rough set (KFRS) and semisupervised support vector machine (S(3)VM) is proposed for predicting cancer biomarkers from one miRNA and three gene expression data sets. Biomarkers are discovered employing three feature selection methods, including KFRS. The effectiveness of the proposed KFRS and S(3)VM combination on the microarray data sets is demonstrated, and the cancer biomarkers identified from miRNA data are reported. Furthermore, biological significance tests are conducted for miRNA cancer biomarkers.

  10. Identifying genes relevant to specific biological conditions in time course microarray experiments.

    PubMed

    Singh, Nitesh Kumar; Repsilber, Dirk; Liebscher, Volkmar; Taher, Leila; Fuellen, Georg

    2013-01-01

    Microarrays have been useful in understanding various biological processes by allowing the simultaneous study of the expression of thousands of genes. However, the analysis of microarray data is a challenging task. One of the key problems in microarray analysis is the classification of unknown expression profiles. Specifically, the often large number of non-informative genes on the microarray adversely affects the performance and efficiency of classification algorithms. Furthermore, the skewed ratio of sample to variable poses a risk of overfitting. Thus, in this context, feature selection methods become crucial to select relevant genes and, hence, improve classification accuracy. In this study, we investigated feature selection methods based on gene expression profiles and protein interactions. We found that in our setup, the addition of protein interaction information did not contribute to any significant improvement of the classification results. Furthermore, we developed a novel feature selection method that relies exclusively on observed gene expression changes in microarray experiments, which we call "relative Signal-to-Noise ratio" (rSNR). More precisely, the rSNR ranks genes based on their specificity to an experimental condition, by comparing intrinsic variation, i.e. variation in gene expression within an experimental condition, with extrinsic variation, i.e. variation in gene expression across experimental conditions. Genes with low variation within an experimental condition of interest and high variation across experimental conditions are ranked higher, and help in improving classification accuracy. We compared different feature selection methods on two time-series microarray datasets and one static microarray dataset. We found that the rSNR performed generally better than the other methods.

  11. Functional analysis of differentially expressed genes associated with glaucoma from DNA microarray data.

    PubMed

    Wu, Y; Zang, W D; Jiang, W

    2014-11-11

    Microarray data of astrocytes extracted from the optic nerves of donors with and without glaucoma were analyzed to screen for differentially expressed genes (DEGs). Functional exploration with bioinformatic tools was then used to understand the roles of the identified DEGs in glaucoma. Microarray data were downloaded from the Gene Expression Omnibus (GEO) database, which contains 13 astrocyte samples, 6 from healthy subjects and 7 from patients suffering from glaucoma. Data were pre-processed, and DEGs were screened out using R software packages. Interactions between DEGs were identified, and networks were built using Search Tool for the Retrieval of Interacting Genes/Proteins (STRING). GENECODIS was utilized for the functional analysis of the DEGs, and GOTM was used for module division, for which functional annotation was conducted with the Database for Annotation, Visualization, and Integrated Discovery (DAVID). A total of 371 DEGs were identified between glaucoma-associated samples and normal samples. Three modules included in the PPID database were generated with 11, 12, and 2 significant functional annotations, including immune system processes, inflammatory responses, and synaptic vesicle endocytosis, respectively. We found that the most significantly enriched functions for each module were associated with immune function. Several genes that play interesting roles in the development of glaucoma are described; these genes may be potential biomarkers for glaucoma diagnosis or treatment.

  12. ExpressYourself: a modular platform for processing and visualizing microarray data

    PubMed Central

    Luscombe, Nicholas M.; Royce, Thomas E.; Bertone, Paul; Echols, Nathaniel; Horak, Christine E.; Chang, Joseph T.; Snyder, Michael; Gerstein, Mark

    2003-01-01

    DNA microarrays are widely used in biological research; by analyzing differential hybridization on a single microarray slide, one can detect changes in mRNA expression levels, increases in DNA copy numbers and the location of transcription factor binding sites on a genomic scale. Having performed the experiments, the major challenge is to process large, noisy datasets in order to identify the specific array elements that are significantly differentially hybridized. This normally requires aggregating different, often incompatible programs into a multi-step pipeline. Here we present ExpressYourself, a fully integrated platform for processing microarray data. In completely automated fashion, it will correct the background array signal, normalize the Cy5 and Cy3 signals, score levels of differential hybridization, combine the results of replicate experiments, filter problematic regions of the array and assess the quality of individual and replicate experiments. ExpressYourself is designed with a highly modular architecture so various types of microarray analysis algorithms can readily be incorporated as they are developed; for example, the system currently implements several normalization methods, including those that simultaneously consider signal intensity and slide location. The processed data are presented using a web-based graphical interface to facilitate comparison with the original images of the array slides. In particular, Express Yourself is able to regenerate images of the original microarray after applying various steps of processing, which greatly facilities identification of position-specific artifacts. The program is freely available for use at http://bioinfo.mbb.yale.edu/expressyourself. PMID:12824348

  13. Chromosomal Microarray versus Karyotyping for Prenatal Diagnosis

    PubMed Central

    Wapner, Ronald J.; Martin, Christa Lese; Levy, Brynn; Ballif, Blake C.; Eng, Christine M.; Zachary, Julia M.; Savage, Melissa; Platt, Lawrence D.; Saltzman, Daniel; Grobman, William A.; Klugman, Susan; Scholl, Thomas; Simpson, Joe Leigh; McCall, Kimberly; Aggarwal, Vimla S.; Bunke, Brian; Nahum, Odelia; Patel, Ankita; Lamb, Allen N.; Thom, Elizabeth A.; Beaudet, Arthur L.; Ledbetter, David H.; Shaffer, Lisa G.; Jackson, Laird

    2013-01-01

    Background Chromosomal microarray analysis has emerged as a primary diagnostic tool for the evaluation of developmental delay and structural malformations in children. We aimed to evaluate the accuracy, efficacy, and incremental yield of chromosomal microarray analysis as compared with karyotyping for routine prenatal diagnosis. Methods Samples from women undergoing prenatal diagnosis at 29 centers were sent to a central karyotyping laboratory. Each sample was split in two; standard karyotyping was performed on one portion and the other was sent to one of four laboratories for chromosomal microarray. Results We enrolled a total of 4406 women. Indications for prenatal diagnosis were advanced maternal age (46.6%), abnormal result on Down’s syndrome screening (18.8%), structural anomalies on ultrasonography (25.2%), and other indications (9.4%). In 4340 (98.8%) of the fetal samples, microarray analysis was successful; 87.9% of samples could be used without tissue culture. Microarray analysis of the 4282 nonmosaic samples identified all the aneuploidies and unbalanced rearrangements identified on karyotyping but did not identify balanced translocations and fetal triploidy. In samples with a normal karyotype, microarray analysis revealed clinically relevant deletions or duplications in 6.0% with a structural anomaly and in 1.7% of those whose indications were advanced maternal age or positive screening results. Conclusions In the context of prenatal diagnostic testing, chromosomal microarray analysis identified additional, clinically significant cytogenetic information as compared with karyotyping and was equally efficacious in identifying aneuploidies and unbalanced rearrangements but did not identify balanced translocations and triploidies. (Funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development and others; ClinicalTrials.gov number, NCT01279733.) PMID:23215555

  14. Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools

    PubMed Central

    Cheng, Yinhe; Tzeng, Tzy-Hwa Kathy

    2016-01-01

    This paper introduces a high-throughput software tool framework called sam2bam that enables users to significantly speed up pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156–186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize multiple processors, available memory, high-bandwidth storage, and hardware compression accelerators, if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting input data are provided by using plug-in tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of next generation sequencing (NGS) data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime of NGS data pre-processing from about 20 hours to about nine minutes for a whole-genome sequencing data set on the same system using up to 711 GB of memory. PMID:27861637

  15. The application of SVD in stored grain pest image pre-processing

    NASA Astrophysics Data System (ADS)

    Mou, Yi; Zhou, Long

    2009-07-01

    The principle of singular value decomposition is introduced. Then the procedures of restoration, compression for pest image based on SVD are proposed. The experiments demonstrate that the SVD is one effective method in stored-gain pest image pre-processing.

  16. Pre-processing SAR image stream to facilitate compression for transport on bandwidth-limited-link

    DOEpatents

    Rush, Bobby G.; Riley, Robert

    2015-09-29

    Pre-processing is applied to a raw VideoSAR (or similar near-video rate) product to transform the image frame sequence into a product that resembles more closely the type of product for which conventional video codecs are designed, while sufficiently maintaining utility and visual quality of the product delivered by the codec.

  17. Parafoveal Preprocessing in Reading Revisited: Evidence from a Novel Preview Manipulation

    ERIC Educational Resources Information Center

    Gagl, Benjamin; Hawelka, Stefan; Richlan, Fabio; Schuster, Sarah; Hutzler, Florian

    2014-01-01

    The study investigated parafoveal preprocessing by the means of the classical invisible boundary paradigm and a novel manipulation of the parafoveal previews (i.e., visual degradation). Eye movements were investigated on 5-letter target words with constraining (i.e., highly informative) initial letters or similarly constraining final letters.…

  18. Algorithms for optimal dyadic decision trees

    SciTech Connect

    Hush, Don; Porter, Reid

    2009-01-01

    A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.

  19. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships

    PubMed Central

    2010-01-01

    Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245

  20. Value of Distributed Preprocessing of Biomass Feedstocks to a Bioenergy Industry

    SciTech Connect

    Christopher T Wright

    2006-07-01

    Biomass preprocessing is one of the primary operations in the feedstock assembly system and the front-end of a biorefinery. Its purpose is to chop, grind, or otherwise format the biomass into a suitable feedstock for conversion to ethanol and other bioproducts. Many variables such as equipment cost and efficiency, and feedstock moisture content, particle size, bulk density, compressibility, and flowability affect the location and implementation of this unit operation. Previous conceptual designs show this operation to be located at the front-end of the biorefinery. However, data are presented that show distributed preprocessing at the field-side or in a fixed preprocessing facility can provide significant cost benefits by producing a higher value feedstock with improved handling, transporting, and merchandising potential. In addition, data supporting the preferential deconstruction of feedstock materials due to their bio-composite structure identifies the potential for significant improvements in equipment efficiencies and compositional quality upgrades. Theses data are collected from full-scale low and high capacity hammermill grinders with various screen sizes. Multiple feedstock varieties with a range of moisture values were used in the preprocessing tests. The comparative values of the different grinding configurations, feedstock varieties, and moisture levels are assessed through post-grinding analysis of the different particle fractions separated with a medium-scale forage particle separator and a Rototap separator. The results show that distributed preprocessing produces a material that has bulk flowable properties and fractionation benefits that can improve the ease of transporting, handling and conveying the material to the biorefinery and improve the biochemical and thermochemical conversion processes.

  1. Preprocessing strategy influences graph-based exploration of altered functional networks in major depression.

    PubMed

    Borchardt, Viola; Lord, Anton Richard; Li, Meng; van der Meer, Johan; Heinze, Hans-Jochen; Bogerts, Bernhard; Breakspear, Michael; Walter, Martin

    2016-04-01

    Resting-state fMRI studies have gained widespread use in exploratory studies of neuropsychiatric disorders. Graph metrics derived from whole brain functional connectivity studies have been used to reveal disease-related variations in many neuropsychiatric disorders including major depression (MDD). These techniques show promise in developing diagnostics for these often difficult to identify disorders. However, the analysis of resting-state datasets is increasingly beset by a myriad of approaches and methods, each with underlying assumptions. Choosing the most appropriate preprocessing parameters a priori is difficult. Nevertheless, the specific methodological choice influences graph-theoretical network topologies as well as regional metrics. The aim of this study was to systematically compare different preprocessing strategies by evaluating their influence on group differences between healthy participants (HC) and depressive patients. We thus investigated the effects of common preprocessing variants, including global mean-signal regression (GMR), temporal filtering, detrending, and network sparsity on group differences between brain networks of HC and MDD patients measured by global and nodal graph theoretical metrics. Occurrence of group differences in global metrics was absent in the majority of tested preprocessing variants, but in local graph metrics it is sparse, variable, and highly dependent on the combination of preprocessing variant and sparsity threshold. Sparsity thresholds between 16 and 22% were shown to have the greatest potential to reveal differences between HC and MDD patients in global and local network metrics. Our study offers an overview of consequences of methodological decisions and which neurobiological characteristics of MDD they implicate, adding further caution to this rapidly growing field.

  2. Validation of affinity reagents using antigen microarrays.

    PubMed

    Sjöberg, Ronald; Sundberg, Mårten; Gundberg, Anna; Sivertsson, Asa; Schwenk, Jochen M; Uhlén, Mathias; Nilsson, Peter

    2012-06-15

    There is a need for standardised validation of affinity reagents to determine their binding selectivity and specificity. This is of particular importance for systematic efforts that aim to cover the human proteome with different types of binding reagents. One such international program is the SH2-consortium, which was formed to generate a complete set of renewable affinity reagents to the SH2-domain containing human proteins. Here, we describe a microarray strategy to validate various affinity reagents, such as recombinant single-chain antibodies, mouse monoclonal antibodies and antigen-purified polyclonal antibodies using a highly multiplexed approach. An SH2-specific antigen microarray was designed and generated, containing more than 6000 spots displayed by 14 identical subarrays each with 406 antigens, where 105 of them represented SH2-domain containing proteins. Approximately 400 different affinity reagents of various types were analysed on these antigen microarrays carrying antigens of different types. The microarrays revealed not only very detailed specificity profiles for all the binders, but also showed that overlapping target sequences of spotted antigens were detected by off-target interactions. The presented study illustrates the feasibility of using antigen microarrays for integrative, high-throughput validation of various types of binders and antigens.

  3. Terrain matching image pre-process and its format transform in autonomous underwater navigation

    NASA Astrophysics Data System (ADS)

    Cao, Xuejun; Zhang, Feizhou; Yang, Dongkai; Yang, Bogang

    2007-06-01

    matching precision directly influences the final precision of integrated navigation system. Image matching assistant navigation is spatially matching and aiming at two underwater scenery images coming from two different sensors matriculating of the same scenery in order to confirm the relative displacement of the two images. In this way, we can obtain the vehicle's location in fiducial image known geographical relation, and the precise location information given from image matching location is transmitted to INS to eliminate its location error and greatly enhance the navigation precision of vehicle. Digital image data analysis and processing of image matching in underwater passive navigation is important. In regard to underwater geographic data analysis, we focus on the acquirement, disposal, analysis, expression and measurement of database information. These analysis items structure one of the important contents of underwater terrain matching and are propitious to know the seabed terrain configuration of navigation areas so that the best advantageous seabed terrain district and dependable navigation algorithm can be selected. In this way, we can improve the precision and reliability of terrain assistant navigation system. The pre-process and format transformation of digital image during underwater image matching are expatiated in this paper. The information of the terrain status in navigation areas need further study to provide the reliable data terrain characteristic and underwater overcast for navigation. Through realizing the choice of sea route, danger district prediction and navigating algorithm analysis, TAN can obtain more high location precision and probability, hence provide technological support for image matching of underwater passive navigation.

  4. Posttranslational Modification Assays on Functional Protein Microarrays.

    PubMed

    Neiswinger, Johnathan; Uzoma, Ijeoma; Cox, Eric; Rho, HeeSool; Jeong, Jun Seop; Zhu, Heng

    2016-10-03

    Protein microarray technology provides a straightforward yet powerful strategy for identifying substrates of posttranslational modifications (PTMs) and studying the specificity of the enzymes that catalyze these reactions. Protein microarray assays can be designed for individual enzymes or a mixture to establish connections between enzymes and substrates. Assays for four well-known PTMs-phosphorylation, acetylation, ubiquitylation, and SUMOylation-have been developed and are described here for use on functional protein microarrays. Phosphorylation and acetylation require a single enzyme and are easily adapted for use on an array. The ubiquitylation and SUMOylation cascades are very similar, and the combination of the E1, E2, and E3 enzymes plus ubiquitin or SUMO protein and ATP is sufficient for in vitro modification of many substrates.

  5. Joint Adaptive Pre-processing Resilience and Post-processing Concealment Schemes for 3D Video Transmission

    NASA Astrophysics Data System (ADS)

    El-Shafai, Walid

    2015-03-01

    3D video transmission over erroneous networks is still a considerable issue due to restricted resources and the presence of severe channel errors. Efficiently compressing 3D video with low transmission rate, while maintaining a high quality of received 3D video, is very challenging. Since it is not plausible to re-transmit all the corrupted macro-blocks (MBs) due to real time applications and limited resources. Thus it is mandatory to retrieve the lost MBs at the decoder side using sufficient post-processing schemes, such as error concealment (EC). In this paper, we propose an adaptive multi-mode EC (AMMEC) algorithm at the decoder based on utilizing pre-processing flexible macro-block ordering error resilience (FMO-ER) technique at the encoder; to efficiently conceal the erroneous MBs of intra and inter coded frames of 3D video. Experimental simulation results show that the proposed FMO-ER/AMMEC schemes can significantly improve the objective and subjective 3D video quality.

  6. Advanced Recording and Preprocessing of Physiological Signals. [data processing equipment for flow measurement of blood flow by ultrasonics

    NASA Technical Reports Server (NTRS)

    Bentley, P. B.

    1975-01-01

    The measurement of the volume flow-rate of blood in an artery or vein requires both an estimate of the flow velocity and its spatial distribution and the corresponding cross-sectional area. Transcutaneous measurements of these parameters can be performed using ultrasonic techniques that are analogous to the measurement of moving objects by use of a radar. Modern digital data recording and preprocessing methods were applied to the measurement of blood-flow velocity by means of the CW Doppler ultrasonic technique. Only the average flow velocity was measured and no distribution or size information was obtained. Evaluations of current flowmeter design and performance, ultrasonic transducer fabrication methods, and other related items are given. The main thrust was the development of effective data-handling and processing methods by application of modern digital techniques. The evaluation resulted in useful improvements in both the flowmeter instrumentation and the ultrasonic transducers. Effective digital processing algorithms that provided enhanced blood-flow measurement accuracy and sensitivity were developed. Block diagrams illustrative of the equipment setup are included.

  7. Hybridization and Selective Release of DNA Microarrays

    SciTech Connect

    Beer, N R; Baker, B; Piggott, T; Maberry, S; Hara, C M; DeOtte, J; Benett, W; Mukerjee, E; Dzenitis, J; Wheeler, E K

    2011-11-29

    DNA microarrays contain sequence specific probes arrayed in distinct spots numbering from 10,000 to over 1,000,000, depending on the platform. This tremendous degree of multiplexing gives microarrays great potential for environmental background sampling, broad-spectrum clinical monitoring, and continuous biological threat detection. In practice, their use in these applications is not common due to limited information content, long processing times, and high cost. The work focused on characterizing the phenomena of microarray hybridization and selective release that will allow these limitations to be addressed. This will revolutionize the ways that microarrays can be used for LLNL's Global Security missions. The goals of this project were two-fold: automated faster hybridizations and selective release of hybridized features. The first study area involves hybridization kinetics and mass-transfer effects. the standard hybridization protocol uses an overnight incubation to achieve the best possible signal for any sample type, as well as for convenience in manual processing. There is potential to significantly shorten this time based on better understanding and control of the rate-limiting processes and knowledge of the progress of the hybridization. In the hybridization work, a custom microarray flow cell was used to manipulate the chemical and thermal environment of the array and autonomously image the changes over time during hybridization. The second study area is selective release. Microarrays easily generate hybridization patterns and signatures, but there is still an unmet need for methodologies enabling rapid and selective analysis of these patterns and signatures. Detailed analysis of individual spots by subsequent sequencing could potentially yield significant information for rapidly mutating and emerging (or deliberately engineered) pathogens. In the selective release work, optical energy deposition with coherent light quickly provides the thermal energy to

  8. Overview of DNA microarrays: types, applications, and their future.

    PubMed

    Bumgarner, Roger

    2013-01-01

    This unit provides an overview of DNA microarrays. Microarrays are a technology in which thousands of nucleic acids are bound to a surface and are used to measure the relative concentration of nucleic acid sequences in a mixture via hybridization and subsequent detection of the hybridization events. This overview first discusses the history of microarrays and the antecedent technologies that led to their development. This is followed by discussion of the methods of manufacture of microarrays and the most common biological applications. The unit ends with a brief description of the limitations of microarrays and discusses how microarrays are being rapidly replaced by DNA sequencing technologies.

  9. The use of microarrays in microbial ecology

    SciTech Connect

    Andersen, G.L.; He, Z.; DeSantis, T.Z.; Brodie, E.L.; Zhou, J.

    2009-09-15

    Microarrays have proven to be a useful and high-throughput method to provide targeted DNA sequence information for up to many thousands of specific genetic regions in a single test. A microarray consists of multiple DNA oligonucleotide probes that, under high stringency conditions, hybridize only to specific complementary nucleic acid sequences (targets). A fluorescent signal indicates the presence and, in many cases, the abundance of genetic regions of interest. In this chapter we will look at how microarrays are used in microbial ecology, especially with the recent increase in microbial community DNA sequence data. Of particular interest to microbial ecologists, phylogenetic microarrays are used for the analysis of phylotypes in a community and functional gene arrays are used for the analysis of functional genes, and, by inference, phylotypes in environmental samples. A phylogenetic microarray that has been developed by the Andersen laboratory, the PhyloChip, will be discussed as an example of a microarray that targets the known diversity within the 16S rRNA gene to determine microbial community composition. Using multiple, confirmatory probes to increase the confidence of detection and a mismatch probe for every perfect match probe to minimize the effect of cross-hybridization by non-target regions, the PhyloChip is able to simultaneously identify any of thousands of taxa present in an environmental sample. The PhyloChip is shown to reveal greater diversity within a community than rRNA gene sequencing due to the placement of the entire gene product on the microarray compared with the analysis of up to thousands of individual molecules by traditional sequencing methods. A functional gene array that has been developed by the Zhou laboratory, the GeoChip, will be discussed as an example of a microarray that dynamically identifies functional activities of multiple members within a community. The recent version of GeoChip contains more than 24,000 50mer

  10. An image-data-compression algorithm

    NASA Technical Reports Server (NTRS)

    Hilbert, E. E.; Rice, R. F.

    1981-01-01

    Cluster Compression Algorithm (CCA) preprocesses Landsat image data immediately following satellite data sensor (receiver). Data are reduced by extracting pertinent image features and compressing this result into concise format for transmission to ground station. This results in narrower transmission bandwidth, increased data-communication efficiency, and reduced computer time in reconstructing and analyzing image. Similar technique could be applied to other types of recorded data to cut costs of transmitting, storing, distributing, and interpreting complex information.

  11. Unsupervised pattern recognition: an introduction to the whys and wherefores of clustering microarray data.

    PubMed

    Boutros, Paul C; Okey, Allan B

    2005-12-01

    Clustering has become an integral part of microarray data analysis and interpretation. The algorithmic basis of clustering -- the application of unsupervised machine-learning techniques to identify the patterns inherent in a data set -- is well established. This review discusses the biological motivations for and applications of these techniques to integrating gene expression data with other biological information, such as functional annotation, promoter data and proteomic data.

  12. A Combinational Clustering Based Method for cDNA Microarray Image Segmentation.

    PubMed

    Shao, Guifang; Li, Tiejun; Zuo, Wangda; Wu, Shunxiang; Liu, Tundong

    2015-01-01

    Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1) Stanford Microarray Database (SMD), 2) Gene Expression Omnibus (GEO), 3) Baylor College of Medicine (BCM), 4) Swiss Institute of Bioinformatics (SIB), 5) Joe DeRisi's individual tiff files (DeRisi), and 6) University of California, San Francisco (UCSF), indicate that the improved approach is

  13. A Combinational Clustering Based Method for cDNA Microarray Image Segmentation

    PubMed Central

    Shao, Guifang; Li, Tiejun; Zuo, Wangda; Wu, Shunxiang; Liu, Tundong

    2015-01-01

    Microarray technology plays an important role in drawing useful biological conclusions by analyzing thousands of gene expressions simultaneously. Especially, image analysis is a key step in microarray analysis and its accuracy strongly depends on segmentation. The pioneering works of clustering based segmentation have shown that k-means clustering algorithm and moving k-means clustering algorithm are two commonly used methods in microarray image processing. However, they usually face unsatisfactory results because the real microarray image contains noise, artifacts and spots that vary in size, shape and contrast. To improve the segmentation accuracy, in this article we present a combination clustering based segmentation approach that may be more reliable and able to segment spots automatically. First, this new method starts with a very simple but effective contrast enhancement operation to improve the image quality. Then, an automatic gridding based on the maximum between-class variance is applied to separate the spots into independent areas. Next, among each spot region, the moving k-means clustering is first conducted to separate the spot from background and then the k-means clustering algorithms are combined for those spots failing to obtain the entire boundary. Finally, a refinement step is used to replace the false segmentation and the inseparable ones of missing spots. In addition, quantitative comparisons between the improved method and the other four segmentation algorithms--edge detection, thresholding, k-means clustering and moving k-means clustering--are carried out on cDNA microarray images from six different data sets. Experiments on six different data sets, 1) Stanford Microarray Database (SMD), 2) Gene Expression Omnibus (GEO), 3) Baylor College of Medicine (BCM), 4) Swiss Institute of Bioinformatics (SIB), 5) Joe DeRisi’s individual tiff files (DeRisi), and 6) University of California, San Francisco (UCSF), indicate that the improved approach is

  14. Preprocessing, classification modeling and feature selection using flow injection electrospray mass spectrometry metabolite fingerprint data.

    PubMed

    Enot, David P; Lin, Wanchang; Beckmann, Manfred; Parker, David; Overy, David P; Draper, John

    2008-01-01

    Metabolome analysis by flow injection electrospray mass spectrometry (FIE-MS) fingerprinting generates measurements relating to large numbers of m/z signals. Such data sets often exhibit high variance with a paucity of replicates, thus providing a challenge for data mining. We describe data preprocessing and modeling methods that have proved reliable in projects involving samples from a range of organisms. The protocols interact with software resources specifically for metabolomics provided in a Web-accessible data analysis package FIEmspro (http://users.aber.ac.uk/jhd) written in the R environment and requiring a moderate knowledge of R command-line usage. Specific emphasis is placed on describing the outcome of modeling experiments using FIE-MS data that require further preprocessing to improve quality. The salient features of both poor and robust (i.e., highly generalizable) multivariate models are outlined together with advice on validating classifiers and avoiding false discovery when seeking explanatory variables.

  15. Radar signal pre-processing to suppress surface bounce and multipath

    SciTech Connect

    Paglieroni, David W; Mast, Jeffrey E; Beer, N. Reginald

    2013-12-31

    A method and system for detecting the presence of subsurface objects within a medium is provided. In some embodiments, the imaging and detection system operates in a multistatic mode to collect radar return signals generated by an array of transceiver antenna pairs that is positioned across the surface and that travels down the surface. The imaging and detection system pre-processes that return signal to suppress certain undesirable effects. The imaging and detection system then generates synthetic aperture radar images from real aperture radar images generated from the pre-processed return signal. The imaging and detection system then post-processes the synthetic aperture radar images to improve detection of subsurface objects. The imaging and detection system identifies peaks in the energy levels of the post-processed image frame, which indicates the presence of a subsurface object.

  16. KONFIG and REKONFIG: Two interactive preprocessing to the Navy/NASA Engine Program (NNEP)

    NASA Technical Reports Server (NTRS)

    Fishbach, L. H.

    1981-01-01

    The NNEP is a computer program that is currently being used to simulate the thermodynamic cycle performance of almost all types of turbine engines by many government, industry, and university personnel. The NNEP uses arrays of input data to set up the engine simulation and component matching method as well as to describe the characteristics of the components. A preprocessing program (KONFIG) is described in which the user at a terminal on a time shared computer can interactively prepare the arrays of data required. It is intended to make it easier for the occasional or new user to operate NNEP. Another preprocessing program (REKONFIG) in which the user can modify the component specifications of a previously configured NNEP dataset is also described. It is intended to aid in preparing data for parametric studies and/or studies of similar engines such a mixed flow turbofans, turboshafts, etc.

  17. The Role of GRAIL Orbit Determination in Preprocessing of Gravity Science Measurements

    NASA Technical Reports Server (NTRS)

    Kruizinga, Gerhard; Asmar, Sami; Fahnestock, Eugene; Harvey, Nate; Kahan, Daniel; Konopliv, Alex; Oudrhiri, Kamal; Paik, Meegyeong; Park, Ryan; Strekalov, Dmitry; Watkins, Michael; Yuan, Dah-Ning

    2013-01-01

    The Gravity Recovery And Interior Laboratory (GRAIL) mission has constructed a lunar gravity field with unprecedented uniform accuracy on the farside and nearside of the Moon. GRAIL lunar gravity field determination begins with preprocessing of the gravity science measurements by applying corrections for time tag error, general relativity, measurement noise and biases. Gravity field determination requires the generation of spacecraft ephemerides of an accuracy not attainable with the pre-GRAIL lunar gravity fields. Therefore, a bootstrapping strategy was developed, iterating between science data preprocessing and lunar gravity field estimation in order to construct sufficiently accurate orbit ephemerides.This paper describes the GRAIL measurements, their dependence on the spacecraft ephemerides and the role of orbit determination in the bootstrapping strategy. Simulation results will be presented that validate the bootstrapping strategy followed by bootstrapping results for flight data, which have led to the latest GRAIL lunar gravity fields.

  18. DISC-BASED IMMUNOASSAY MICROARRAYS. (R825433)

    EPA Science Inventory

    Microarray technology as applied to areas that include genomics, diagnostics, environmental, and drug discovery, is an interesting research topic for which different chip-based devices have been developed. As an alternative, we have explored the principle of compact disc-based...

  19. Raman-based microarray readout: a review.

    PubMed

    Haisch, Christoph

    2016-07-01

    For a quarter of a century, microarrays have been part of the routine analytical toolbox. Label-based fluorescence detection is still the commonest optical readout strategy. Since the 1990s, a continuously increasing number of label-based as well as label-free experiments on Raman-based microarray readout concepts have been reported. This review summarizes the possible concepts and methods and their advantages and challenges. A common label-based strategy is based on the binding of selective receptors as well as Raman reporter molecules to plasmonic nanoparticles in a sandwich immunoassay, which results in surface-enhanced Raman scattering signals of the reporter molecule. Alternatively, capture of the analytes can be performed by receptors on a microarray surface. Addition of plasmonic nanoparticles again leads to a surface-enhanced Raman scattering signal, not of a label but directly of the analyte. This approach is mostly proposed for bacteria and cell detection. However, although many promising readout strategies have been discussed in numerous publications, rarely have any of them made the step from proof of concept to a practical application, let alone routine use. Graphical Abstract Possible realization of a SERS (Surface-Enhanced Raman Scattering) system for microarray readout.

  20. Annotating nonspecific SAGE tags with microarray data.

    PubMed

    Ge, Xijin; Jung, Yong-Chul; Wu, Qingfa; Kibbe, Warren A; Wang, San Ming

    2006-01-01

    SAGE (serial analysis of gene expression) detects transcripts by extracting short tags from the transcripts. Because of the limited length, many SAGE tags are shared by transcripts from different genes. Relying on sequence information in the general gene expression database has limited power to solve this problem due to the highly heterogeneous nature of the deposited sequences. Considering that the complexity of gene expression at a single tissue level should be much simpler than that in the general expression database, we reasoned that by restricting gene expression to tissue level, the accuracy of gene annotation for the nonspecific SAGE tags should be significantly improved. To test the idea, we developed a tissue-specific SAGE annotation database based on microarray data (). This database contains microarray expression information represented as UniGene clusters for 73 normal human tissues and 18 cancer tissues and cell lines. The nonspecific SAGE tag is first matched to the database by the same tissue type used by both SAGE and microarray analysis; then the multiple UniGene clusters assigned to the nonspecific SAGE tag are searched in the database under the matched tissue type. The UniGene cluster presented solely or at higher expression levels in the database is annotated to represent the specific gene for the nonspecific SAGE tags. The accuracy of gene annotation by this database was largely confirmed by experimental data. Our study shows that microarray data provide a useful source for annotating the nonspecific SAGE tags.

  1. Analytical Protein Microarrays: Advancements Towards Clinical Applications

    PubMed Central

    Sauer, Ursula

    2017-01-01

    Protein microarrays represent a powerful technology with the potential to serve as tools for the detection of a broad range of analytes in numerous applications such as diagnostics, drug development, food safety, and environmental monitoring. Key features of analytical protein microarrays include high throughput and relatively low costs due to minimal reagent consumption, multiplexing, fast kinetics and hence measurements, and the possibility of functional integration. So far, especially fundamental studies in molecular and cell biology have been conducted using protein microarrays, while the potential for clinical, notably point-of-care applications is not yet fully utilized. The question arises what features have to be implemented and what improvements have to be made in order to fully exploit the technology. In the past we have identified various obstacles that have to be overcome in order to promote protein microarray technology in the diagnostic field. Issues that need significant improvement to make the technology more attractive for the diagnostic market are for instance: too low sensitivity and deficiency in reproducibility, inadequate analysis time, lack of high-quality antibodies and validated reagents, lack of automation and portable instruments, and cost of instruments necessary for chip production and read-out. The scope of the paper at hand is to review approaches to solve these problems. PMID:28146048

  2. Analytical Protein Microarrays: Advancements Towards Clinical Applications.

    PubMed

    Sauer, Ursula

    2017-01-29

    Protein microarrays represent a powerful technology with the potential to serve as tools for the detection of a broad range of analytes in numerous applications such as diagnostics, drug development, food safety, and environmental monitoring. Key features of analytical protein microarrays include high throughput and relatively low costs due to minimal reagent consumption, multiplexing, fast kinetics and hence measurements, and the possibility of functional integration. So far, especially fundamental studies in molecular and cell biology have been conducted using protein microarrays, while the potential for clinical, notably point-of-care applications is not yet fully utilized. The question arises what features have to be implemented and what improvements have to be made in order to fully exploit the technology. In the past we have identified various obstacles that have to be overcome in order to promote protein microarray technology in the diagnostic field. Issues that need significant improvement to make the technology more attractive for the diagnostic market are for instance: too low sensitivity and deficiency in reproducibility, inadequate analysis time, lack of high-quality antibodies and validated reagents, lack of automation and portable instruments, and cost of instruments necessary for chip production and read-out. The scope of the paper at hand is to review approaches to solve these problems.

  3. Microarrays (DNA Chips) for the Classroom Laboratory

    ERIC Educational Resources Information Center

    Barnard, Betsy; Sussman, Michael; BonDurant, Sandra Splinter; Nienhuis, James; Krysan, Patrick

    2006-01-01

    We have developed and optimized the necessary laboratory materials to make DNA microarray technology accessible to all high school students at a fraction of both cost and data size. The primary component is a DNA chip/array that students "print" by hand and then analyze using research tools that have been adapted for classroom use. The…

  4. Diagnostic Oligonucleotide Microarray Fingerprinting of Bacillus Isolates

    SciTech Connect

    Chandler, Darrell P.; Alferov, Oleg; Chernov, Boris; Daly, Don S.; Golova, Julia; Perov, Alexander N.; Protic, Miroslava; Robison, Richard; Shipma, Matthew; White, Amanda M.; Willse, Alan R.

    2006-01-01

    A diagnostic, genome-independent microbial fingerprinting method using DNA oligonucleotide microarrays was used for high-resolution differentiation between closely related Bacillus strains, including two strains of Bacillus anthracis that are monomorphic (indistinguishable) via amplified fragment length polymorphism fingerprinting techniques. Replicated hybridizations on 391-probe nonamer arrays were used to construct a prototype fingerprint library for quantitative comparisons. Descriptive analysis of the fingerprints, including phylogenetic reconstruction, is consistent with previous taxonomic organization of the genus. Newly developed statistical analysis methods were used to quantitatively compare and objectively confirm apparent differences in microarray fingerprints with the statistical rigor required for microbial forensics and clinical diagnostics. These data suggest that a relatively simple fingerprinting microarray and statistical analysis method can differentiate between species in the Bacillus cereus complex, and between strains of B. anthracis. A synthetic DNA standard was used to understand underlying microarray and process-level variability, leading to specific recommendations for the development of a standard operating procedure and/or continued technology enhancements for microbial forensics and diagnostics.

  5. MICROARRAY DATA ANALYSIS USING MULTIPLE STATISTICAL MODELS

    EPA Science Inventory

    Microarray Data Analysis Using Multiple Statistical Models

    Wenjun Bao1, Judith E. Schmid1, Amber K. Goetz1, Ming Ouyang2, William J. Welsh2,Andrew I. Brooks3,4, ChiYi Chu3,Mitsunori Ogihara3,4, Yinhe Cheng5, David J. Dix1. 1National Health and Environmental Effects Researc...

  6. Shrinkage covariance matrix approach for microarray data

    NASA Astrophysics Data System (ADS)

    Karjanto, Suryaefiza; Aripin, Rasimah

    2013-04-01

    Microarray technology was developed for the purpose of monitoring the expression levels of thousands of genes. A microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints including the high cost of producing microarray chips. As a result, the widely used standard covariance estimator is not appropriate for this purpose. One such technique is the Hotelling's T2 statistic which is a multivariate test statistic for comparing means between two groups. It requires that the number of observations (n) exceeds the number of genes (p) in the set but in microarray studies it is common that n < p. This leads to a biased estimate of the covariance matrix. In this study, the Hotelling's T2 statistic with the shrinkage approach is proposed to estimate the covariance matrix for testing differential gene expression. The performance of this approach is then compared with other commonly used multivariate tests using a widely analysed diabetes data set as illustrations. The results across the methods are consistent, implying that this approach provides an alternative to existing techniques.

  7. Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.

    PubMed

    Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar

    2016-04-01

    Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data.

  8. A Method of Microarray Data Storage Using Array Data Type

    PubMed Central

    Tsoi, Lam C.; Zheng, W. Jim

    2009-01-01

    A well-designed microarray database can provide valuable information on gene expression levels. However, designing an efficient microarray database with minimum space usage is not an easy task since designers need to integrate the microarray data with the information of genes, probe annotation, and the descriptions of each microarray experiment. Developing better methods to store microarray data can greatly improve the efficiency and usefulness of such data. A new schema is proposed to store microarray data by using array data type in an object-relational database management system – PostgreSQL. The implemented database can store all the microarray data from the same chip in an array data structure. The variable length array data type in PostgreSQL can store microarray data from same chip. The implementation of our schema can help to increase the data retrieval and space efficiency. PMID:17392028

  9. PRACTICAL STRATEGIES FOR PROCESSING AND ANALYZING SPOTTED OLIGONUCLEOTIDE MICROARRAY DATA

    EPA Science Inventory

    Thoughtful data analysis is as important as experimental design, biological sample quality, and appropriate experimental procedures for making microarrays a useful supplement to traditional toxicology. In the present study, spotted oligonucleotide microarrays were used to profile...

  10. Hyperspectral imaging in medicine: image pre-processing problems and solutions in Matlab.

    PubMed

    Koprowski, Robert

    2015-11-01

    The paper presents problems and solutions related to hyperspectral image pre-processing. New methods of preliminary image analysis are proposed. The paper shows problems occurring in Matlab when trying to analyse this type of images. Moreover, new methods are discussed which provide the source code in Matlab that can be used in practice without any licensing restrictions. The proposed application and sample result of hyperspectral image analysis.

  11. Data preprocessing for a vehicle-based localization system used in road traffic applications

    NASA Astrophysics Data System (ADS)

    Patelczyk, Timo; Löffler, Andreas; Biebl, Erwin

    2016-09-01

    This paper presents a fixed-point implementation of the preprocessing using a field programmable gate array (FPGA), which is required for a multipath joint angle and delay estimation (JADE) used in road traffic applications. This paper lays the foundation for many model-based parameter estimation methods. Here, a simulation of a vehicle-based localization system application for protecting vulnerable road users, which were equipped with appropriate transponders, is considered. For such safety critical applications, the robustness and real-time capability of the localization is particularly important. Additionally, a motivation to use a fixed-point implementation for the data preprocessing is a limited computing power of the head unit of a vehicle. This study aims to process the raw data provided by the localization system used in this paper. The data preprocessing applied includes a wideband calibration of the physical localization system, separation of relevant information from the received sampled signal, and preparation of the incoming data via further processing. Further, a channel matrix estimation was implemented to complete the data preprocessing, which contains information on channel parameters, e.g., the positions of the objects to be located. In the presented case of a vehicle-based localization system application we assume an urban environment, in which multipath propagation occurs. Since most methods for localization are based on uncorrelated signals, this fact must be addressed. Hence, a decorrelation of incoming data stream in terms of a further localization is required. This decorrelation was accomplished by considering several snapshots in different time slots. As a final aspect of the use of fixed-point arithmetic, quantization errors are considered. In addition, the resources and runtime of the presented implementation are discussed; these factors are strongly linked to a practical implementation.

  12. Optimizing fMRI Preprocessing Pipelines for Block-Design Tasks as a Function of Age.

    PubMed

    Churchill, Nathan W; Raamana, Pradeep; Spring, Robyn; Strother, Stephen C

    2017-02-12

    Functional Magnetic Resonance Imaging (fMRI) is a powerful neuroimaging tool, which is often hampered by significant noise confounds. There is evidence that our ability to detect activations in task fMRI is highly dependent on the preprocessing steps used to control noise and artifact. However, the vast majority of studies examining preprocessing pipelines in fMRI have focused on young adults. Given the widespread use of fMRI for characterizing the neurobiology of aging, it is critical to examine how the impact of preprocessing choices varies as a function of age. In this study, we employ the NPAIRS cross-validation framework, which optimizes pipelines based on metrics of prediction accuracy (P) and spatial reproducibility (R), to compare the effects of pipeline optimization between young (21-33 years) and older (61-82 years) cohorts, for three different block-design contrasts. Motion is shown to be a greater issue in the older cohort, and we introduce new statistical approaches to control for potential biases due to head motion during pipeline optimization. In comparison, data-driven methods of physiological noise correction show comparable benefits for both young and old cohorts. Using our optimization framework, we demonstrate that the optimal pipelines tend to be highly similar across age cohorts. In addition, there is a comparable, significant benefit of pipeline optimization across age cohorts, for (P, R) metrics and independent validation measures of activation overlap (both between-subject, within-session and within-subject, between-session). The choice of task contrast consistently shows a greater impact than the age cohort, for (P, R) metrics and activation overlap. Finally, adaptive pipeline optimization per task run shows improved sensitivity to age-related changes in brain activity, particularly for weaker, more complex cognitive contrasts. The current study provides the first detailed examination of preprocessing pipelines across age cohorts

  13. Examining microarray slide quality for the EPA using SNL's hyperspectral microarray scanner.

    SciTech Connect

    Rohde, Rachel M.; Timlin, Jerilyn Ann

    2005-11-01

    This report summarizes research performed at Sandia National Laboratories (SNL) in collaboration with the Environmental Protection Agency (EPA) to assess microarray quality on arrays from two platforms of interest to the EPA. Custom microarrays from two novel, commercially produced array platforms were imaged with SNL's unique hyperspectral imaging technology and multivariate data analysis was performed to investigate sources of emission on the arrays. No extraneous sources of emission were evident in any of the array areas scanned. This led to the conclusions that either of these array platforms could produce high quality, reliable microarray data for the EPA toxicology programs. Hyperspectral imaging results are presented and recommendations for microarray analyses using these platforms are detailed within the report.

  14. Reducing Uncertainties of Hydrologic Model Predictions Using a New Ensemble Pre-Processing Approach

    NASA Astrophysics Data System (ADS)

    Khajehei, S.; Moradkhani, H.

    2015-12-01

    Ensemble Streamflow Prediction (ESP) was developed to characterize the uncertainty in hydrologic predictions. However, ESP outputs are still prone to bias due to the uncertainty in the forcing data, initial condition, and model structure. Among these, uncertainty in forcing data has a major impact on the reliability of hydrologic simulations/forecasts. Major steps have been taken in generating less uncertain precipitation forecasts such as the Ensemble Pre-Processing (EPP) to achieve this goal. EPP is introduced as a statistical procedure based on the bivariate joint distribution between observation and forecast to generate ensemble climatologic forecast from single-value forecast. The purpose of this study is to evaluate the performance of pre-processed ensemble precipitation forecast in generating ensemble streamflow predictions. Copula functions used in EPP, model the multivariate joint distribution between univariate variables with any level of dependency. Accordingly, ESP is generated by employing both raw ensemble precipitation forecast as well as pre-processed ensemble precipitation. The ensemble precipitation forecast is taken from Climate Forecast System (CFS) generated by National Weather Service's (NWS) National Centers for Environmental Prediction (NCEP) models. Study is conducted using the precipitation Runoff Modeling System (PRMS) over two basins in the Pacific Northwest USA for the period of 1979 to 2013. Results reveal that applying this new EPP will lead to reduction of uncertainty and overall improvement in the ESP.

  15. Learning-based image preprocessing for robust computer-aided detection

    NASA Astrophysics Data System (ADS)

    Raghupathi, Laks; Devarakota, Pandu R.; Wolf, Matthias

    2013-03-01

    Recent studies have shown that low dose computed tomography (LDCT) can be an effective screening tool to reduce lung cancer mortality. Computer-aided detection (CAD) would be a beneficial second reader for radiologists in such cases. Studies demonstrate that while iterative reconstructions (IR) improve LDCT diagnostic quality, it however degrades CAD performance significantly (increased false positives) when applied directly. For improving CAD performance, solutions such as retraining with newer data or applying a standard preprocessing technique may not be suffice due to high prevalence of CT scanners and non-uniform acquisition protocols. Here, we present a learning-based framework that can adaptively transform a wide variety of input data to boost an existing CAD performance. This not only enhances their robustness but also their applicability in clinical workflows. Our solution consists of applying a suitable pre-processing filter automatically on the given image based on its characteristics. This requires the preparation of ground truth (GT) of choosing an appropriate filter resulting in improved CAD performance. Accordingly, we propose an efficient consolidation process with a novel metric. Using key anatomical landmarks, we then derive consistent feature descriptors for the classification scheme that then uses a priority mechanism to automatically choose an optimal preprocessing filter. We demonstrate CAD prototype∗ performance improvement using hospital-scale datasets acquired from North America, Europe and Asia. Though we demonstrated our results for a lung nodule CAD, this scheme is straightforward to extend to other post-processing tools dedicated to other organs and modalities.

  16. Foveal processing difficulty does not affect parafoveal preprocessing in young readers

    PubMed Central

    Marx, Christina; Hawelka, Stefan; Schuster, Sarah; Hutzler, Florian

    2017-01-01

    Recent evidence suggested that parafoveal preprocessing develops early during reading acquisition, that is, young readers profit from valid parafoveal information and exhibit a resultant preview benefit. For young readers, however, it is unknown whether the processing demands of the currently fixated word modulate the extent to which the upcoming word is parafoveally preprocessed – as it has been postulated (for adult readers) by the foveal load hypothesis. The present study used the novel incremental boundary technique to assess whether 4th and 6th Graders exhibit an effect of foveal load. Furthermore, we attempted to distinguish the foveal load effect from the spillover effect. These effects are hard to differentiate with respect to the expected pattern of results, but are conceptually different. The foveal load effect is supposed to reflect modulations of the extent of parafoveal preprocessing, whereas the spillover effect reflects the ongoing processing of the previous word whilst the reader’s fixation is already on the next word. The findings revealed that the young readers did not exhibit an effect of foveal load, but a substantial spillover effect. The implications for previous studies with adult readers and for models of eye movement control in reading are discussed. PMID:28139718

  17. Reporting of Resting-State Functional Magnetic Resonance Imaging Preprocessing Methodologies.

    PubMed

    Waheed, Syed Hamza; Mirbagheri, Saeedeh; Agarwal, Shruti; Kamali, Arash; Yahyavi-Firouz-Abadi, Noushin; Chaudhry, Ammar; DiGianvittorio, Michael; Gujar, Sachin K; Pillai, Jay J; Sair, Haris I

    2016-11-01

    There has been a rapid increase in resting-state functional magnetic resonance imaging (rs-fMRI) literature in the past few years. We aim to highlight the variability in the current reporting practices of rs-fMRI acquisition and preprocessing parameters. The PubMed database was searched for the selection of appropriate articles in the rs-fMRI literature and the most recent 100 articles were selected based on our criteria. These articles were evaluated based on a checklist for reporting of certain preprocessing steps. All of the studies reported the temporal resolution for the scan and the software used for the analysis. Less than half of the studies reported physiologic monitoring, despiking, global signal regression, framewise displacement, and volume censoring. A majority of the studies mentioned the scanning duration, eye status, and smoothing kernel. Overall, we demonstrate the wide variability in reporting of preprocessing methods in rs-fMRI studies. Although there might be potential variability in reporting across studies due to individual requirements for a study, we suggest the need for standardizing reporting guidelines to ensure reproducibility.

  18. Identifying Fishes through DNA Barcodes and Microarrays

    PubMed Central

    Kochzius, Marc; Seidel, Christian; Antoniou, Aglaia; Botla, Sandeep Kumar; Campo, Daniel; Cariani, Alessia; Vazquez, Eva Garcia; Hauschild, Janet; Hervet, Caroline; Hjörleifsdottir, Sigridur; Hreggvidsson, Gudmundur; Kappel, Kristina; Landi, Monica; Magoulas, Antonios; Marteinsson, Viggo; Nölte, Manfred; Planes, Serge; Tinti, Fausto; Turan, Cemal; Venugopal, Moleyur N.; Weber, Hannes; Blohm, Dietmar

    2010-01-01

    Background International fish trade reached an import value of 62.8 billion Euro in 2006, of which 44.6% are covered by the European Union. Species identification is a key problem throughout the life cycle of fishes: from eggs and larvae to adults in fisheries research and control, as well as processed fish products in consumer protection. Methodology/Principal Findings This study aims to evaluate the applicability of the three mitochondrial genes 16S rRNA (16S), cytochrome b (cyt b), and cytochrome oxidase subunit I (COI) for the identification of 50 European marine fish species by combining techniques of “DNA barcoding” and microarrays. In a DNA barcoding approach, neighbour Joining (NJ) phylogenetic trees of 369 16S, 212 cyt b, and 447 COI sequences indicated that cyt b and COI are suitable for unambiguous identification, whereas 16S failed to discriminate closely related flatfish and gurnard species. In course of probe design for DNA microarray development, each of the markers yielded a high number of potentially species-specific probes in silico, although many of them were rejected based on microarray hybridisation experiments. None of the markers provided probes to discriminate the sibling flatfish and gurnard species. However, since 16S-probes were less negatively influenced by the “position of label” effect and showed the lowest rejection rate and the highest mean signal intensity, 16S is more suitable for DNA microarray probe design than cty b and COI. The large portion of rejected COI-probes after hybridisation experiments (>90%) renders the DNA barcoding marker as rather unsuitable for this high-throughput technology. Conclusions/Significance Based on these data, a DNA microarray containing 64 functional oligonucleotide probes for the identification of 30 out of the 50 fish species investigated was developed. It represents the next step towards an automated and easy-to-handle method to identify fish, ichthyoplankton, and fish products. PMID

  19. Design of a covalently bonded glycosphingolipid microarray.

    PubMed

    Arigi, Emma; Blixt, Ola; Buschard, Karsten; Clausen, Henrik; Levery, Steven B

    2012-01-01

    Glycosphingolipids (GSLs) are well known ubiquitous constituents of all eukaryotic cell membranes, yet their normal biological functions are not fully understood. As with other glycoconjugates and saccharides, solid phase display on microarrays potentially provides an effective platform for in vitro study of their functional interactions. However, with few exceptions, the most widely used microarray platforms display only the glycan moiety of GSLs, which not only ignores potential modulating effects of the lipid aglycone, but inherently limits the scope of application, excluding, for example, the major classes of plant and fungal GSLs. In this work, a prototype "universal" GSL-based covalent microarray has been designed, and preliminary evaluation of its potential utility in assaying protein-GSL binding interactions investigated. An essential step in development involved the enzymatic release of the fatty acyl moiety of the ceramide aglycone of selected mammalian GSLs with sphingolipid N-deacylase (SCDase). Derivatization of the free amino group of a typical lyso-GSL, lyso-G(M1), with a prototype linker assembled from succinimidyl-[(N-maleimidopropionamido)-diethyleneglycol] ester and 2-mercaptoethylamine, was also tested. Underivatized or linker-derivatized lyso-GSL were then immobilized on N-hydroxysuccinimide- or epoxide-activated glass microarray slides and probed with carbohydrate binding proteins of known or partially known specificities (i.e., cholera toxin B-chain; peanut agglutinin, a monoclonal antibody to sulfatide, Sulph 1; and a polyclonal antiserum reactive to asialo-G(M2)). Preliminary evaluation of the method indicated successful immobilization of the GSLs, and selective binding of test probes. The potential utility of this methodology for designing covalent microarrays that incorporate GSLs for serodiagnosis is discussed.

  20. Density based pruning for identification of differentially expressed genes from microarray data

    PubMed Central

    2010-01-01

    Motivation Identification of differentially expressed genes from microarray datasets is one of the most important analyses for microarray data mining. Popular algorithms such as statistical t-test rank genes based on a single statistics. The false positive rate of these methods can be improved by considering other features of differentially expressed genes. Results We proposed a pattern recognition strategy for identifying differentially expressed genes. Genes are mapped to a two dimension feature space composed of average difference of gene expression and average expression levels. A density based pruning algorithm (DB Pruning) is developed to screen out potential differentially expressed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test, rank product, and fold change. Conclusions Density based pruning of non-differentially expressed genes is an effective method for enhancing statistical testing based algorithms for identifying differentially expressed genes. It improves t-test, rank product, and fold change by 11% to 50% in the numbers of identified true differentially expressed genes. The source code of DB pruning is freely available on our website http://mleg.cse.sc.edu/degprune PMID:21047384

  1. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information

    PubMed Central

    Mortazavi, Atiyeh; Moattar, Mohammad Hossein

    2016-01-01

    High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI) for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches. PMID:27127506

  2. Measuring information flow in cellular networks by the systems biology method through microarray data.

    PubMed

    Chen, Bor-Sen; Li, Cheng-Wei

    2015-01-01

    In general, it is very difficult to measure the information flow in a cellular network directly. In this study, based on an information flow model and microarray data, we measured the information flow in cellular networks indirectly by using a systems biology method. First, we used a recursive least square parameter estimation algorithm to identify the system parameters of coupling signal transduction pathways and the cellular gene regulatory network (GRN). Then, based on the identified parameters and systems theory, we estimated the signal transductivities of the coupling signal transduction pathways from the extracellular signals to each downstream protein and the information transductivities of the GRN between transcription factors in response to environmental events. According to the proposed method, the information flow, which is characterized by signal transductivity in coupling signaling pathways and information transductivity in the GRN, can be estimated by microarray temporal data or microarray sample data. It can also be estimated by other high-throughput data such as next-generation sequencing or proteomic data. Finally, the information flows of the signal transduction pathways and the GRN in leukemia cancer cells and non-leukemia normal cells were also measured to analyze the systematic dysfunction in this cancer from microarray sample data. The results show that the signal transductivities of signal transduction pathways change substantially from normal cells to leukemia cancer cells.

  3. Measuring information flow in cellular networks by the systems biology method through microarray data

    PubMed Central

    Chen, Bor-Sen; Li, Cheng-Wei

    2015-01-01

    In general, it is very difficult to measure the information flow in a cellular network directly. In this study, based on an information flow model and microarray data, we measured the information flow in cellular networks indirectly by using a systems biology method. First, we used a recursive least square parameter estimation algorithm to identify the system parameters of coupling signal transduction pathways and the cellular gene regulatory network (GRN). Then, based on the identified parameters and systems theory, we estimated the signal transductivities of the coupling signal transduction pathways from the extracellular signals to each downstream protein and the information transductivities of the GRN between transcription factors in response to environmental events. According to the proposed method, the information flow, which is characterized by signal transductivity in coupling signaling pathways and information transductivity in the GRN, can be estimated by microarray temporal data or microarray sample data. It can also be estimated by other high-throughput data such as next-generation sequencing or proteomic data. Finally, the information flows of the signal transduction pathways and the GRN in leukemia cancer cells and non-leukemia normal cells were also measured to analyze the systematic dysfunction in this cancer from microarray sample data. The results show that the signal transductivities of signal transduction pathways change substantially from normal cells to leukemia cancer cells. PMID:26082788

  4. Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers.

    PubMed

    Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin

    2013-01-01

    DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.

  5. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information.

    PubMed

    Mortazavi, Atiyeh; Moattar, Mohammad Hossein

    2016-01-01

    High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI) for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

  6. A visual analytics approach for understanding biclustering results from microarray data

    PubMed Central

    Santamaría, Rodrigo; Therón, Roberto; Quintales, Luis

    2008-01-01

    Background Microarray analysis is an important area of bioinformatics. In the last few years, biclustering has become one of the most popular methods for classifying data from microarrays. Although biclustering can be used in any kind of classification problem, nowadays it is mostly used for microarray data classification. A large number of biclustering algorithms have been developed over the years, however little effort has been devoted to the representation of the results. Results We present an interactive framework that helps to infer differences or similarities between biclustering results, to unravel trends and to highlight robust groupings of genes and conditions. These linked representations of biclusters can complement biological analysis and reduce the time spent by specialists on interpreting the results. Within the framework, besides other standard representations, a visualization technique is presented which is based on a force-directed graph where biclusters are represented as flexible overlapped groups of genes and conditions. This microarray analysis framework (BicOverlapper), is available at Conclusion The main visualization technique, tested with different biclustering results on a real dataset, allows researchers to extract interesting features of the biclustering results, especially the highlighting of overlapping zones that usually represent robust groups of genes and/or conditions. The visual analytics methodology will permit biology experts to study biclustering results without inspecting an overwhelming number of biclusters individually. PMID:18505552

  7. Microarray missing data imputation based on a set theoretic framework and biological knowledge

    PubMed Central

    Gan, Xiangchao; Liew, Alan Wee-Chung; Yan, Hong

    2006-01-01

    Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods. PMID:16549873

  8. Integrated analysis of microarray data and gene function information.

    PubMed

    Cui, Yan; Zhou, Mi; Wong, Wing Hung

    2004-01-01

    Microarray data should be interpreted in the context of existing biological knowledge. Here we present integrated analysis of microarray data and gene function classification data using homogeneity analysis. Homogeneity analysis is a graphical multivariate statistical method for analyzing categorical data. It converts categorical data into graphical display. By simultaneously quantifying the microarray-derived gene groups and gene function categories, it captures the complex relations between biological information derived from microarray data and the existing knowledge about the gene function. Thus, homogeneity analysis provides a mathematical framework for integrating the analysis of microarray data and the existing biological knowledge.

  9. Viral diagnosis in Indian livestock using customized microarray chips

    PubMed Central

    Yadav, Brijesh S; Pokhriyal, Mayank; Ratta, Barkha; Kumar, Ajay; Saxena, Meeta; Sharma, Bhaskar

    2015-01-01

    Viral diagnosis in Indian livestock using customized microarray chips is gaining momentum in recent years. Hence, it is possible to design customized microarray chip for viruses infecting livestock in India. Customized microarray chips identified Bovine herpes virus-1 (BHV-1), Canine Adeno Virus-1 (CAV-1), and Canine Parvo Virus-2 (CPV-2) in clinical samples. Microarray identified specific probes were further confirmed using RT-PCR in all clinical and known samples. Therefore, the application of microarray chips during viral disease outbreaks in Indian livestock is possible where conventional methods are unsuitable. It should be noted that customized application requires a detailed cost efficiency calculation. PMID:26912948

  10. Viral diagnosis in Indian livestock using customized microarray chips.

    PubMed

    Yadav, Brijesh S; Pokhriyal, Mayank; Ratta, Barkha; Kumar, Ajay; Saxena, Meeta; Sharma, Bhaskar

    2015-01-01

    Viral diagnosis in Indian livestock using customized microarray chips is gaining momentum in recent years. Hence, it is possible to design customized microarray chip for viruses infecting livestock in India. Customized microarray chips identified Bovine herpes virus-1 (BHV-1), Canine Adeno Virus-1 (CAV-1), and Canine Parvo Virus-2 (CPV-2) in clinical samples. Microarray identified specific probes were further confirmed using RT-PCR in all clinical and known samples. Therefore, the application of microarray chips during viral disease outbreaks in Indian livestock is possible where conventional methods are unsuitable. It should be noted that customized application requires a detailed cost efficiency calculation.

  11. Respiratory Tularemia: Francisella Tularensis and Microarray Probe Designing

    PubMed Central

    Ranjbar, Reza; Behzadi, Payam; Mammina, Caterina

    2016-01-01

    Background: Francisella tularensis (F. tularensis) is the etiological microorganism for tularemia. There are different forms of tularemia such as respiratory tularemia. Respiratory tularemia is the most severe form of tularemia with a high rate of mortality; if not treated. Therefore, traditional microbiological tools and Polymerase Chain Reaction (PCR) are not useful for a rapid, reliable, accurate, sensitive and specific diagnosis. But, DNA microarray technology does. DNA microarray technology needs to appropriate microarray probe designing. Objective: The main goal of this original article was to design suitable long oligo microarray probes for detection and identification of F. tularensis. Method: For performing this research, the complete genomes of F. tularensis subsp. tularensis FSC198, F. tularensis subsp. holarctica LVS, F. tularensis subsp. mediasiatica, F. tularensis subsp. novicida (F. novicida U112), and F. philomiragia subsp. philomiragia ATCC 25017 were studied via NCBI BLAST tool, GView and PanSeq Servers and finally the microarray probes were produced and processed via AlleleID 7.7 software and Oligoanalyzer tool, respectively. Results: In this in silico investigation, a number of long oligo microarray probes were designed for detecting and identifying F. tularensis. Among these probes, 15 probes were recognized as the best candidates for microarray chip designing. Conclusion: Calibrated microarray probes reduce the biasis of DNA microarray technology as an advanced, rapid, accurate and cost-effective molecular diagnostic tool with high specificity and sensitivity. Professional microarray probe designing provides us with much more facility and flexibility regarding preparation of a microarray diagnostic chip. PMID:28077973

  12. Detecting outlier samples in microarray data.

    PubMed

    Shieh, Albert D; Hung, Yeung Sam

    2009-01-01

    In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to explore underlying experimental or biological problems and remove erroneous data. We propose an outlier detection method based on principal component analysis (PCA) and robust estimation of Mahalanobis distances that is fully automatic. We demonstrate that our outlier detection method identifies biologically significant outliers with high accuracy and that outlier removal improves the prediction accuracy of classifiers. Our outlier detection method is closely related to existing robust PCA methods, so we compare our outlier detection method to a prominent robust PCA method.

  13. Plasmonically amplified fluorescence bioassay with microarray format

    NASA Astrophysics Data System (ADS)

    Gogalic, S.; Hageneder, S.; Ctortecka, C.; Bauch, M.; Khan, I.; Preininger, Claudia; Sauer, U.; Dostalek, J.

    2015-05-01

    Plasmonic amplification of fluorescence signal in bioassays with microarray detection format is reported. A crossed relief diffraction grating was designed to couple an excitation laser beam to surface plasmons at the wavelength overlapping with the absorption and emission bands of fluorophore Dy647 that was used as a label. The surface of periodically corrugated sensor chip was coated with surface plasmon-supporting gold layer and a thin SU8 polymer film carrying epoxy groups. These groups were employed for the covalent immobilization of capture antibodies at arrays of spots. The plasmonic amplification of fluorescence signal on the developed microarray chip was tested by using interleukin 8 sandwich immunoassay. The readout was performed ex situ after drying the chip by using a commercial scanner with high numerical aperture collecting lens. Obtained results reveal the enhancement of fluorescence signal by a factor of 5 when compared to a regular glass chip.

  14. PMD: A Resource for Archiving and Analyzing Protein Microarray data

    PubMed Central

    Xu, Zhaowei; Huang, Likun; Zhang, Hainan; Li, Yang; Guo, Shujuan; Wang, Nan; Wang, Shi-hua; Chen, Ziqing; Wang, Jingfang; Tao, Sheng-ce

    2016-01-01

    Protein microarray is a powerful technology for both basic research and clinical study. However, because there is no database specifically tailored for protein microarray, the majority of the valuable original protein microarray data is still not publically accessible. To address this issue, we constructed Protein Microarray Database (PMD), which is specifically designed for archiving and analyzing protein microarray data. In PMD, users can easily browse and search the entire database by experimental name, protein microarray type, and sample information. Additionally, PMD integrates several data analysis tools and provides an automated data analysis pipeline for users. With just one click, users can obtain a comprehensive analysis report for their protein microarray data. The report includes preliminary data analysis, such as data normalization, candidate identification, and an in-depth bioinformatics analysis of the candidates, which include functional annotation, pathway analysis, and protein-protein interaction network analysis. PMD is now freely available at www.proteinmicroarray.cn. PMID:26813635

  15. PMD: A Resource for Archiving and Analyzing Protein Microarray data.

    PubMed

    Xu, Zhaowei; Huang, Likun; Zhang, Hainan; Li, Yang; Guo, Shujuan; Wang, Nan; Wang, Shi-Hua; Chen, Ziqing; Wang, Jingfang; Tao, Sheng-Ce

    2016-01-27

    Protein microarray is a powerful technology for both basic research and clinical study. However, because there is no database specifically tailored for protein microarray, the majority of the valuable original protein microarray data is still not publically accessible. To address this issue, we constructed Protein Microarray Database (PMD), which is specifically designed for archiving and analyzing protein microarray data. In PMD, users can easily browse and search the entire database by experimental name, protein microarray type, and sample information. Additionally, PMD integrates several data analysis tools and provides an automated data analysis pipeline for users. With just one click, users can obtain a comprehensive analysis report for their protein microarray data. The report includes preliminary data analysis, such as data normalization, candidate identification, and an in-depth bioinformatics analysis of the candidates, which include functional annotation, pathway analysis, and protein-protein interaction network analysis. PMD is now freely available at www.proteinmicroarray.cn.

  16. Ultrahigh density microarrays of solid samples.

    PubMed

    LeBaron, Matthew J; Crismon, Heidi R; Utama, Fransiscus E; Neilson, Lynn M; Sultan, Ahmed S; Johnson, Kevin J; Andersson, Eva C; Rui, Hallgeir

    2005-07-01

    We present a sectioning and bonding technology to make ultrahigh density microarrays of solid samples, cutting edge matrix assembly (CEMA). Maximized array density is achieved by a scaffold-free, self-supporting construction with rectangular array features that are incrementally scalable. This platform technology facilitates arrays of >10,000 tissue features on a standard glass slide, inclusion of unique sample identifiers for improved manual or automated tracking, and oriented arraying of stratified or polarized samples.

  17. Metadata management and semantics in microarray repositories.

    PubMed

    Kocabaş, F; Can, T; Baykal, N

    2011-12-01

    The number of microarray and other high-throughput experiments on primary repositories keeps increasing as do the size and complexity of the results in response to biomedical investigations. Initiatives have been started on standardization of content, object model, exchange format and ontology. However, there are backlogs and inability to exchange data between microarray repositories, which indicate that there is a great need for a standard format and data management. We have introduced a metadata framework that includes a metadata card and semantic nets that make experimental results visible, understandable and usable. These are encoded in syntax encoding schemes and represented in RDF (Resource Description Frame-word), can be integrated with other metadata cards and semantic nets, and can be exchanged, shared and queried. We demonstrated the performance and potential benefits through a case study on a selected microarray repository. We concluded that the backlogs can be reduced and that exchange of information and asking of knowledge discovery questions can become possible with the use of this metadata framework.

  18. Methods for fabricating microarrays of motile bacteria.

    PubMed

    Rozhok, Sergey; Shen, Clifton K-F; Littler, Pey-Lih H; Fan, Zhifang; Liu, Chang; Mirkin, Chad A; Holz, Richard C

    2005-04-01

    Motile bacterial cell microarrays were fabricated by attaching Escherichia coli K-12 cells onto predesigned 16-mercaptohexadecanoic acid patterned microarrays, which were covalently functionalized with E. coli antibodies or poly-L-lysine. By utilizing 11-mercaptoundecyl-penta(ethylene glycol) or 11-mercapto-1-undecanol as passivating molecules, nonspecific binding of E. coli was significantly reduced. Microcontact printing and dip-pen nanolithography were used to prepare microarrays for bacterial adhesion, which was studied by optical fluorescence and atomic force microscopy. These data indicate that single motile E. coli can be attached to predesigned line or dot features and binding can occur via the cell body or the flagella of bacteria. Adherent bacteria are viable (remain alive and motile after adhesion to patterned surface features) for more than four hours. Individual motile bacterial cells can be placed onto predesigned surface features that are at least 1.3 microm in diameter or larger. The importance of controlling the adhesion of single bacterial cell to a surface is discussed with regard to biomotor design.

  19. A New Distribution Family for Microarray Data.

    PubMed

    Kelmansky, Diana Mabel; Ricci, Lila

    2017-02-10

    The traditional approach with microarray data has been to apply transformations that approximately normalize them, with the drawback of losing the original scale. The alternative stand point taken here is to search for models that fit the data, characterized by the presence of negative values, preserving their scale; one advantage of this strategy is that it facilitates a direct interpretation of the results. A new family of distributions named gpower-normal indexed by p∈R is introduced and it is proven that these variables become normal or truncated normal when a suitable gpower transformation is applied. Expressions are given for moments and quantiles, in terms of the truncated normal density. This new family can be used to model asymmetric data that include non-positive values, as required for microarray analysis. Moreover, it has been proven that the gpower-normal family is a special case of pseudo-dispersion models, inheriting all the good properties of these models, such as asymptotic normality for small variances. A combined maximum likelihood method is proposed to estimate the model parameters, and it is applied to microarray and contamination data. Rcodes are available from the authors upon request.

  20. High-Throughput Enzyme Kinetics Using Microarrays

    SciTech Connect

    Guoxin Lu; Edward S. Yeung

    2007-11-01

    We report a microanalytical method to study enzyme kinetics. The technique involves immobilizing horseradish peroxidase on a poly-L-lysine (PLL)- coated glass slide in a microarray format, followed by applying substrate solution onto the enzyme microarray. Enzyme molecules are immobilized on the PLL-coated glass slide through electrostatic interactions, and no further modification of the enzyme or glass slide is needed. In situ detection of the products generated on the enzyme spots is made possible by monitoring the light intensity of each spot using a scientific-grade charged-coupled device (CCD). Reactions of substrate solutions of various types and concentrations can be carried out sequentially on one enzyme microarray. To account for the loss of enzyme from washing in between runs, a standard substrate solution is used for calibration. Substantially reduced amounts of substrate solution are consumed for each reaction on each enzyme spot. The Michaelis constant K{sub m} obtained by using this method is comparable to the result for homogeneous solutions. Absorbance detection allows universal monitoring, and no chemical modification of the substrate is needed. High-throughput studies of native enzyme kinetics for multiple enzymes are therefore possible in a simple, rapid, and low-cost manner.

  1. Copasetic analysis: a framework for the blind analysis of microarray imagery.

    PubMed

    Fraser, K; O'Neill, P; Wang, Z; Liu, X

    2004-06-01

    From its conception, bioinformatics has been a multidisciplinary field which blends domain expert knowledge with new and existing processing techniques, all of which are focused on a common goal. Typically, these techniques have focused on the direct analysis of raw microarray image data. Unfortunately, this fails to utilise the image's full potential and in practice, this results in the lab technician having to guide the analysis algorithms. This paper presents a dynamic framework that aims to automate the process of microarray image analysis using a variety of techniques. An overview of the entire framework process is presented, the robustness of which is challenged throughout with a selection of real examples containing varying degrees of noise. The results show the potential of the proposed framework in its ability to determine slide layout accurately and perform analysis without prior structural knowledge. The algorithm achieves approximately, a 1 to 3 dB improved peak signal-to-noise ratio compared to conventional processing techniques like those implemented in GenePix when used by a trained operator. As far as the authors are aware, this is the first time such a comprehensive framework concept has been directly applied to the area of microarray image analysis.

  2. Design of a combinatorial DNA microarray for protein-DNA interaction studies

    PubMed Central

    Mintseris, Julian; Eisen, Michael B

    2006-01-01

    Background Discovery of precise specificity of transcription factors is an important step on the way to understanding the complex mechanisms of gene regulation in eukaryotes. Recently, double-stranded protein-binding microarrays were developed as a potentially scalable approach to tackle transcription factor binding site identification. Results Here we present an algorithmic approach to experimental design of a microarray that allows for testing full specificity of a transcription factor binding to all possible DNA binding sites of a given length, with optimally efficient use of the array. This design is universal, works for any factor that binds a sequence motif and is not species-specific. Furthermore, simulation results show that data produced with the designed arrays is easier to analyze and would result in more precise identification of binding sites. Conclusion In this study, we present a design of a double stranded DNA microarray for protein-DNA interaction studies and show that our algorithm allows optimally efficient use of the arrays for this purpose. We believe such a design will prove useful for transcription factor binding site identification and other biological problems. PMID:17018151

  3. Design of a combinatorial dna microarray for protein-dnainteraction studies

    SciTech Connect

    Mintseris, Julian; Eisen, Michael B.

    2006-07-07

    Background: Discovery of precise specificity oftranscription factors is an important step on the way to understandingthe complex mechanisms of gene regulation in eukaryotes. Recently,doublestranded protein-binding microarrays were developed as apotentially scalable approach to tackle transcription factor binding siteidentification. Results: Here we present an algorithmic approach toexperimental design of a microarray that allows for testing fullspecificity of a transcription factor binding to all possible DNA bindingsites of a given length, with optimally efficient use of the array. Thisdesign is universal, works for any factor that binds a sequence motif andis not species-specific. Furthermore, simulation results show that dataproduced with the designed arrays is easier to analyze and would resultin more precise identification of binding sites. Conclusion: In thisstudy, we present a design of a double stranded DNA microarray forprotein-DNA interaction studies and show that our algorithm allowsoptimally efficient use of the arrays for this purpose. We believe such adesign will prove useful for transcription factor binding siteidentification and other biological problems.

  4. Acquisition, preprocessing, and reconstruction of ultralow dose volumetric CT scout for organ-based CT scan planning

    SciTech Connect

    Yin, Zhye De Man, Bruno; Yao, Yangyang; Wu, Mingye; Montillo, Albert; Edic, Peter M.; Kalra, Mannudeep

    2015-05-15

    Purpose: Traditionally, 2D radiographic preparatory scan images (scout scans) are used to plan diagnostic CT scans. However, a 3D CT volume with a full 3D organ segmentation map could provide superior information for customized scan planning and other purposes. A practical challenge is to design the volumetric scout acquisition and processing steps to provide good image quality (at least good enough to enable 3D organ segmentation) while delivering a radiation dose similar to that of the conventional 2D scout. Methods: The authors explored various acquisition methods, scan parameters, postprocessing methods, and reconstruction methods through simulation and cadaver data studies to achieve an ultralow dose 3D scout while simultaneously reducing the noise and maintaining the edge strength around the target organ. Results: In a simulation study, the 3D scout with the proposed acquisition, preprocessing, and reconstruction strategy provided a similar level of organ segmentation capability as a traditional 240 mAs diagnostic scan, based on noise and normalized edge strength metrics. At the same time, the proposed approach delivers only 1.25% of the dose of a traditional scan. In a cadaver study, the authors’ pictorial-structures based organ localization algorithm successfully located the major abdominal-thoracic organs from the ultralow dose 3D scout obtained with the proposed strategy. Conclusions: The authors demonstrated that images with a similar degree of segmentation capability (interpretability) as conventional dose CT scans can be achieved with an ultralow dose 3D scout acquisition and suitable postprocessing. Furthermore, the authors applied these techniques to real cadaver CT scans with a CTDI dose level of less than 0.1 mGy and successfully generated a 3D organ localization map.

  5. Wavelet-based detection of transcriptional activity on a novel Staphylococcus aureus tiling microarray

    PubMed Central

    2012-01-01

    Background High-density oligonucleotide microarray is an appropriate technology for genomic analysis, and is particulary useful in the generation of transcriptional maps, ChIP-on-chip studies and re-sequencing of the genome.Transcriptome analysis of tiling microarray data facilitates the discovery of novel transcripts and the assessment of differential expression in diverse experimental conditions. Although new technologies such as next-generation sequencing have appeared, microarrays might still be useful for the study of small genomes or for the analysis of genomic regions with custom microarrays due to their lower price and good accuracy in expression quantification. Results Here, we propose a novel wavelet-based method, named ZCL (zero-crossing lines), for the combined denoising and segmentation of tiling signals. The denoising is performed with the classical SUREshrink method and the detection of transcriptionally active regions is based on the computation of the Continuous Wavelet Transform (CWT). In particular, the detection of the transitions is implemented as the thresholding of the zero-crossing lines. The algorithm described has been applied to the public Saccharomyces cerevisiae dataset and it has been compared with two well-known algorithms: pseudo-median sliding window (PMSW) and the structural change model (SCM). As a proof-of-principle, we applied the ZCL algorithm to the analysis of the custom tiling microarray hybridization results of a S. aureus mutant deficient in the sigma B transcription factor. The challenge was to identify those transcripts whose expression decreases in the absence of sigma B. Conclusions The proposed method archives the best performance in terms of positive predictive value (PPV) while its sensitivity is similar to the other algorithms used for the comparison. The computation time needed to process the transcriptional signals is low as compared with model-based methods and in the same range to those based on the use of

  6. A fast meteor detection algorithm

    NASA Astrophysics Data System (ADS)

    Gural, P.

    2016-01-01

    A low latency meteor detection algorithm for use with fast steering mirrors had been previously developed to track and telescopically follow meteors in real-time (Gural, 2007). It has been rewritten as a generic clustering and tracking software module for meteor detection that meets both the demanding throughput requirements of a Raspberry Pi while also maintaining a high probability of detection. The software interface is generalized to work with various forms of front-end video pre-processing approaches and provides a rich product set of parameterized line detection metrics. Discussion will include the Maximum Temporal Pixel (MTP) compression technique as a fast thresholding option for feeding the detection module, the detection algorithm trade for maximum processing throughput, details on the clustering and tracking methodology, processing products, performance metrics, and a general interface description.

  7. DNA Microarray for Detection of Gastrointestinal Viruses

    PubMed Central

    Martínez, Miguel A.; Soto-del Río, María de los Dolores; Gutiérrez, Rosa María; Chiu, Charles Y.; Greninger, Alexander L.; Contreras, Juan Francisco; López, Susana; Arias, Carlos F.

    2014-01-01

    Gastroenteritis is a clinical illness of humans and other animals that is characterized by vomiting and diarrhea and caused by a variety of pathogens, including viruses. An increasing number of viral species have been associated with gastroenteritis or have been found in stool samples as new molecular tools have been developed. In this work, a DNA microarray capable in theory of parallel detection of more than 100 viral species was developed and tested. Initial validation was done with 10 different virus species, and an additional 5 species were validated using clinical samples. Detection limits of 1 × 103 virus particles of Human adenovirus C (HAdV), Human astrovirus (HAstV), and group A Rotavirus (RV-A) were established. Furthermore, when exogenous RNA was added, the limit for RV-A detection decreased by one log. In a small group of clinical samples from children with gastroenteritis (n = 76), the microarray detected at least one viral species in 92% of the samples. Single infection was identified in 63 samples (83%), and coinfection with more than one virus was identified in 7 samples (9%). The most abundant virus species were RV-A (58%), followed by Anellovirus (15.8%), HAstV (6.6%), HAdV (5.3%), Norwalk virus (6.6%), Human enterovirus (HEV) (9.2%), Human parechovirus (1.3%), Sapporo virus (1.3%), and Human bocavirus (1.3%). To further test the specificity and sensitivity of the microarray, the results were verified by reverse transcription-PCR (RT-PCR) detection of 5 gastrointestinal viruses. The RT-PCR assay detected a virus in 59 samples (78%). The microarray showed good performance for detection of RV-A, HAstV, and calicivirus, while the sensitivity for HAdV and HEV was low. Furthermore, some discrepancies in detection of mixed infections were observed and were addressed by reverse transcription-quantitative PCR (RT-qPCR) of the viruses involved. It was observed that differences in the amount of genetic material favored the detection of the most abundant

  8. DNA microarray for detection of gastrointestinal viruses.

    PubMed

    Martínez, Miguel A; Soto-Del Río, María de Los Dolores; Gutiérrez, Rosa María; Chiu, Charles Y; Greninger, Alexander L; Contreras, Juan Francisco; López, Susana; Arias, Carlos F; Isa, Pavel

    2015-01-01

    Gastroenteritis is a clinical illness of humans and other animals that is characterized by vomiting and diarrhea and caused by a variety of pathogens, including viruses. An increasing number of viral species have been associated with gastroenteritis or have been found in stool samples as new molecular tools have been developed. In this work, a DNA microarray capable in theory of parallel detection of more than 100 viral species was developed and tested. Initial validation was done with 10 different virus species, and an additional 5 species were validated using clinical samples. Detection limits of 1 × 10(3) virus particles of Human adenovirus C (HAdV), Human astrovirus (HAstV), and group A Rotavirus (RV-A) were established. Furthermore, when exogenous RNA was added, the limit for RV-A detection decreased by one log. In a small group of clinical samples from children with gastroenteritis (n = 76), the microarray detected at least one viral species in 92% of the samples. Single infection was identified in 63 samples (83%), and coinfection with more than one virus was identified in 7 samples (9%). The most abundant virus species were RV-A (58%), followed by Anellovirus (15.8%), HAstV (6.6%), HAdV (5.3%), Norwalk virus (6.6%), Human enterovirus (HEV) (9.2%), Human parechovirus (1.3%), Sapporo virus (1.3%), and Human bocavirus (1.3%). To further test the specificity and sensitivity of the microarray, the results were verified by reverse transcription-PCR (RT-PCR) detection of 5 gastrointestinal viruses. The RT-PCR assay detected a virus in 59 samples (78%). The microarray showed good performance for detection of RV-A, HAstV, and calicivirus, while the sensitivity for HAdV and HEV was low. Furthermore, some discrepancies in detection of mixed infections were observed and were addressed by reverse transcription-quantitative PCR (RT-qPCR) of the viruses involved. It was observed that differences in the amount of genetic material favored the detection of the most abundant

  9. Augmenting Microarray Data with Literature-Based Knowledge to Enhance Gene Regulatory Network Inference

    PubMed Central

    Kilicoglu, Halil; Shin, Dongwook; Rindflesch, Thomas C.

    2014-01-01

    Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to

  10. Gene ARMADA: an integrated multi-analysis platform for microarray data implemented in MATLAB

    PubMed Central

    Chatziioannou, Aristotelis; Moulos, Panagiotis; Kolisis, Fragiskos N

    2009-01-01

    Background The microarray data analysis realm is ever growing through the development of various tools, open source and commercial. However there is absence of predefined rational algorithmic analysis workflows or batch standardized processing to incorporate all steps, from raw data import up to the derivation of significantly differentially expressed gene lists. This absence obfuscates the analytical procedure and obstructs the massive comparative processing of genomic microarray datasets. Moreover, the solutions provided, heavily depend on the programming skills of the user, whereas in the case of GUI embedded solutions, they do not provide direct support of various raw image analysis formats or a versatile and simultaneously flexible combination of signal processing methods. Results We describe here Gene ARMADA (Automated Robust MicroArray Data Analysis), a MATLAB implemented platform with a Graphical User Interface. This suite integrates all steps of microarray data analysis including automated data import, noise correction and filtering, normalization, statistical selection of differentially expressed genes, clustering, classification and annotation. In its current version, Gene ARMADA fully supports 2 coloured cDNA and Affymetrix oligonucleotide arrays, plus custom arrays for which experimental details are given in tabular form (Excel spreadsheet, comma separated values, tab-delimited text formats). It also supports the analysis of already processed results through its versatile import editor. Besides being fully automated, Gene ARMADA incorporates numerous functionalities of the Statistics and Bioinformatics Toolboxes of MATLAB. In addition, it provides numerous visualization and exploration tools plus customizable export data formats for seamless integration by other analysis tools or MATLAB, for further processing. Gene ARMADA requires MATLAB 7.4 (R2007a) or higher and is also distributed as a stand-alone application with MATLAB Component Runtime

  11. Reservoir computing with a slowly modulated mask signal for preprocessing using a mutually coupled optoelectronic system

    NASA Astrophysics Data System (ADS)

    Tezuka, Miwa; Kanno, Kazutaka; Bunsen, Masatoshi

    2016-08-01

    Reservoir computing is a machine-learning paradigm based on information processing in the human brain. We numerically demonstrate reservoir computing with a slowly modulated mask signal for preprocessing by using a mutually coupled optoelectronic system. The performance of our system is quantitatively evaluated by a chaotic time series prediction task. Our system can produce comparable performance with reservoir computing with a single feedback system and a fast modulated mask signal. We showed that it is possible to slow down the modulation speed of the mask signal by using the mutually coupled system in reservoir computing.

  12. Comparative Evaluation of Preprocessing Freeware on Chromatography/Mass Spectrometry Data for Signature Discovery

    SciTech Connect

    Coble, Jamie B.; Fraga, Carlos G.

    2014-07-07

    Preprocessing software is crucial for the discovery of chemical signatures in metabolomics, chemical forensics, and other signature-focused disciplines that involve analyzing large data sets from chemical instruments. Here, four freely available and published preprocessing tools known as metAlign, MZmine, SpectConnect, and XCMS were evaluated for impurity profiling using nominal mass GC/MS data and accurate mass LC/MS data. Both data sets were previously collected from the analysis of replicate samples from multiple stocks of a nerve-agent precursor. Each of the four tools had their parameters set for the untargeted detection of chromatographic peaks from impurities present in the stocks. The peak table generated by each preprocessing tool was analyzed to determine the number of impurity components detected in all replicate samples per stock. A cumulative set of impurity components was then generated using all available peak tables and used as a reference to calculate the percent of component detections for each tool, in which 100% indicated the detection of every component. For the nominal mass GC/MS data, metAlign performed the best followed by MZmine, SpectConnect, and XCMS with detection percentages of 83, 60, 47, and 42%, respectively. For the accurate mass LC/MS data, the order was metAlign, XCMS, and MZmine with detection percentages of 80, 45, and 35%, respectively. SpectConnect did not function for the accurate mass LC/MS data. Larger detection percentages were obtained by combining the top performer with at least one of the other tools such as 96% by combining metAlign with MZmine for the GC/MS data and 93% by combining metAlign with XCMS for the LC/MS data. In terms of quantitative performance, the reported peak intensities had average absolute biases of 41, 4.4, 1.3 and 1.3% for SpectConnect, metAlign, XCMS, and MZmine, respectively, for the GC/MS data. For the LC/MS data, the average absolute biases were 22, 4.5, and 3.1% for metAlign, MZmine, and XCMS

  13. Background adjustment of cDNA microarray images by Maximum Entropy distributions.

    PubMed

    Argyropoulos, Christos; Daskalakis, Antonis; Nikiforidis, George C; Sakellaropoulos, George C

    2010-08-01

    Many empirical studies have demonstrated the exquisite sensitivity of both traditional and novel statistical and machine intelligence algorithms to the method of background adjustment used to analyze microarray datasets. In this paper we develop a statistical framework that approaches background adjustment as a classic stochastic inverse problem, whose noise characteristics are given in terms of Maximum Entropy distributions. We derive analytic closed form approximations to the combined problem of estimating the magnitude of the background in microarray images and adjusting for its presence. The proposed method reduces standardized measures of log expression variability across replicates in situations of known differential and non-differential gene expression without increasing the bias. Additionally, it results in computationally efficient procedures for estimation and learning based on sufficient statistics and can filter out spot measures with intensities that are numerically close to the background level resulting in a noise reduction of about 7%.

  14. An overview of innovations and industrial solutions in Protein Microarray Technology.

    PubMed

    Gupta, Shabarni; Manubhai, K P; Kulkarni, Vishwesh; Srivastava, Sanjeeva

    2016-04-01

    The complexity involving protein array technology reflects in the fact that instrumentation and data analysis are subject to change depending on the biological question, technical compatibility of instruments and software used in each experiment. Industry has played a pivotal role in establishing standards for future deliberations in sustenance of these technologies in the form of protein array chips, arrayers, scanning devices, and data analysis software. This has enhanced the outreach of protein microarray technology to researchers across the globe. These have encouraged a surge in the adaptation of "nonclassical" approaches such as DNA-based protein arrays, micro-contact printing, label-free protein detection, and algorithms for data analysis. This review provides a unique overview of these industrial solutions available for protein microarray based studies. It aims at assessing the developments in various commercial platforms, thus providing a holistic overview of various modalities, options, and compatibility; summarizing the journey of this powerful high-throughput technology.

  15. Computerized system for recognition of autism on the basis of gene expression microarray data.

    PubMed

    Latkowski, Tomasz; Osowski, Stanislaw

    2015-01-01

    The aim of this paper is to provide a means to recognize a case of autism using gene expression microarrays. The crucial task is to discover the most important genes which are strictly associated with autism. The paper presents an application of different methods of gene selection, to select the most representative input attributes for an ensemble of classifiers. The set of classifiers is responsible for distinguishing autism data from the reference class. Simultaneous application of a few gene selection methods enables analysis of the ill-conditioned gene expression matrix from different points of view. The results of selection combined with a genetic algorithm and SVM classifier have shown increased accuracy of autism recognition. Early recognition of autism is extremely important for treatment of children and increases the probability of their recovery and return to normal social communication. The results of this research can find practical application in early recognition of autism on the basis of gene expression microarray analysis.

  16. Detecting variants with Metabolic Design, a new software tool to design probes for explorative functional DNA microarray development

    PubMed Central

    2010-01-01

    Background Microorganisms display vast diversity, and each one has its own set of genes, cell components and metabolic reactions. To assess their huge unexploited metabolic potential in different ecosystems, we need high throughput tools, such as functional microarrays, that allow the simultaneous analysis of thousands of genes. However, most classical functional microarrays use specific probes that monitor only known sequences, and so fail to cover the full microbial gene diversity present in complex environments. We have thus developed an algorithm, implemented in the user-friendly program Metabolic Design, to design efficient explorative probes. Results First we have validated our approach by studying eight enzymes involved in the degradation of polycyclic aromatic hydrocarbons from the model strain Sphingomonas paucimobilis sp. EPA505 using a designed microarray of 8,048 probes. As expected, microarray assays identified the targeted set of genes induced during biodegradation kinetics experiments with various pollutants. We have then confirmed the identity of these new genes by sequencing, and corroborated the quantitative discrimination of our microarray by quantitative real-time PCR. Finally, we have assessed metabolic capacities of microbial communities in soil contaminated with aromatic hydrocarbons. Results show that our probe design (sensitivity and explorative quality) can be used to study a complex environment efficiently. Conclusions We successfully use our microarray to detect gene expression encoding enzymes involved in polycyclic aromatic hydrocarbon degradation for the model strain. In addition, DNA microarray experiments performed on soil polluted by organic pollutants without prior sequence assumptions demonstrate high specificity and sensitivity for gene detection. Metabolic Design is thus a powerful, efficient tool that can be used to design explorative probes and monitor metabolic pathways in complex environments, and it may also be used to

  17. A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data.

    PubMed

    Aziz, Rabia; Verma, C K; Srivastava, Namita

    2016-06-01

    Feature (gene) selection and classification of microarray data are the two most interesting machine learning challenges. In the present work two existing feature selection/extraction algorithms, namely independent component analysis (ICA) and fuzzy backward feature elimination (FBFE) are used which is a new combination of selection/extraction. The main objective of this paper is to select the independent components of the DNA microarray data using FBFE to improve the performance of support vector machine (SVM) and Naïve Bayes (NB) classifier, while making the computational expenses affordable. To show the validity of the proposed method, it is applied to reduce the number of genes for five DNA microarray datasets namely; colon cancer, acute leukemia, prostate cancer, lung cancer II, and high-grade glioma. Now these datasets are then classified using SVM and NB classifiers. Experimental results on these five microarray datasets demonstrate that gene selected by proposed approach, effectively improve the performance of SVM and NB classifiers in terms of classification accuracy. We compare our proposed method with principal component analysis (PCA) as a standard extraction algorithm and find that the proposed method can obtain better classification accuracy, using SVM and NB classifiers with a smaller number of selected genes than the PCA. The curve between the average error rate and number of genes with each dataset represents the selection of required number of genes for the highest accuracy with our proposed method for both the classifiers. ROC shows best subset of genes for both the classifier of different datasets with propose method.

  18. ARACNe-based inference, using curated microarray data, of Arabidopsis thaliana root transcriptional regulatory networks

    PubMed Central

    2014-01-01

    Background Uncovering the complex transcriptional regulatory networks (TRNs) that underlie plant and animal development remains a challenge. However, a vast amount of data from public microarray experiments is available, which can be subject to inference algorithms in order to recover reliable TRN architectures. Results In this study we present a simple bioinformatics methodology that uses public, carefully curated microarray data and the mutual information algorithm ARACNe in order to obtain a database of transcriptional interactions. We used data from Arabidopsis thaliana root samples to show that the transcriptional regulatory networks derived from this database successfully recover previously identified root transcriptional modules and to propose new transcription factors for the SHORT ROOT/SCARECROW and PLETHORA pathways. We further show that these networks are a powerful tool to integrate and analyze high-throughput expression data, as exemplified by our analysis of a SHORT ROOT induction time-course microarray dataset, and are a reliable source for the prediction of novel root gene functions. In particular, we used our database to predict novel genes involved in root secondary cell-wall synthesis and identified the MADS-box TF XAL1/AGL12 as an unexpected participant in this process. Conclusions This study demonstrates that network inference using carefully curated microarray data yields reliable TRN architectures. In contrast to previous efforts to obtain root TRNs, that have focused on particular functional modules or tissues, our root transcriptional interactions provide an overview of the transcriptional pathways present in Arabidopsis thaliana roots and will likely yield a plethora of novel hypotheses to be tested experimentally. PMID:24739361

  19. Osprey: a comprehensive tool employing novel methods for the design of oligonucleotides for DNA sequencing and microarrays

    PubMed Central

    Gordon, Paul M. K.; Sensen, Christoph W.

    2004-01-01

    We have developed a software package called Osprey for the calculation of optimal oligonucleotides for DNA sequencing and the creation of microarrays based on either PCR-products or directly spotted oligomers. It incorporates a novel use of position-specific scoring matrices, for the sensitive and specific identification of secondary binding sites anywhere in the target sequence. Using accelerated hardware is faster and more efficient than the traditional pairwise alignments used in most oligo-design software. Osprey consists of a module for target site selection based on user input, novel utilities for dealing with problematic sequences such as repeats, and a common code base for the identification of optimal oligonucleotides from the target list. Overall, these improvements provide a program that, without major increases in run time, reflects current DNA thermodynamics models, improves specificity and reduces the user's data preprocessing and parameterization requirements. Using a TimeLogic™ hardware accelerator, we report up to 50-fold reduction in search time versus a linear search strategy. Target sites may be derived from computer analysis of DNA sequence assemblies in the case of sequencing efforts, or genome or EST analysis in the case of microarray development in both prokaryotes and eukaryotes. PMID:15456895

  20. Robust symmetrical number system preprocessing for minimizing encoding errors in photonic analog-to-digital converters

    NASA Astrophysics Data System (ADS)

    Arvizo, Mylene R.; Calusdian, James; Hollinger, Kenneth B.; Pace, Phillip E.

    2011-08-01

    A photonic analog-to-digital converter (ADC) preprocessing architecture based on the robust symmetrical number system (RSNS) is presented. The RSNS preprocessing architecture is a modular scheme in which a modulus number of comparators are used at the output of each Mach-Zehnder modulator channel. The number of comparators with a logic 1 in each channel represents the integer values within each RSNS modulus sequence. When considered together, the integers within each sequence change one at a time at the next code position, resulting in an integer Gray code property. The RSNS ADC has the feature that the maximum nonlinearity is less than a least significant bit (LSB). Although the observed dynamic range (greatest length of combined sequences that contain no ambiguities) of the RSNS ADC is less than the optimum symmetrical number system ADC, the integer Gray code properties make it attractive for error control. A prototype is presented to demonstrate the feasibility of the concept and to show the important RSNS property that the largest nonlinearity is always less than a LSB. Also discussed are practical considerations related to multi-gigahertz implementations.

  1. CMS Preprocessing Subsystem user`s guide. Software version 1.2

    SciTech Connect

    Didier, B.T.; Gash, J.D.; Greitzer, F.L.; Havre, S.L.; Ramsdell, J.V.; Turney, C.R.

    1993-10-01

    The Common Mapping Standard (CMS) Data Production System (CDPS) produces and distributes CMS data in compliance with the Common Mapping Standard Interface Control Document, Revision 2.2. Historically, tactical mission planning systems have been the primary clients of CMS data. CDPS is composed of two subsystems, the CMS Preprocessing Subsystem (CPS) and the CMS Distribution Subsystem (CDS). This guide describes the operation of CPS, which is responsible for the management of source data and the production of CMS data from source data. The CPS system was developed for use on a workstation running Ultrix 4.2, and X Window System Version X11R4, and Motif Version 1.1. This subsystem is organized into four major functional groups: CPS Executive; Manage Source Data; Manage CMS Data Preprocessing; and CPS System Utilities. CPS supports the production of CMS data from the following source chart, image, and elevation data products: Global Navigation Chart; Jet Navigation Chart; Operational Navigation Chart; Tactical Pilotage Chart; Joint Operations Graphics-Air; Topographic Line Map; ARC Digital Raster Imagery; Digital Terrain Elevation Data (Level 1); and Low Flying Chart.

  2. Preprocessing of A-scan GPR data based on energy features

    NASA Astrophysics Data System (ADS)

    Dogan, Mesut; Turhan-Sayan, Gonul

    2016-05-01

    There is an increasing demand for noninvasive real-time detection and classification of buried objects in various civil and military applications. The problem of detection and annihilation of landmines is particularly important due to strong safety concerns. The requirement for a fast real-time decision process is as important as the requirements for high detection rates and low false alarm rates. In this paper, we introduce and demonstrate a computationally simple, timeefficient, energy-based preprocessing approach that can be used in ground penetrating radar (GPR) applications to eliminate reflections from the air-ground boundary and to locate the buried objects, simultaneously, at one easy step. The instantaneous power signals, the total energy values and the cumulative energy curves are extracted from the A-scan GPR data. The cumulative energy curves, in particular, are shown to be useful to detect the presence and location of buried objects in a fast and simple way while preserving the spectral content of the original A-scan data for further steps of physics-based target classification. The proposed method is demonstrated using the GPR data collected at the facilities of IPA Defense, Ankara at outdoor test lanes. Cylindrically shaped plastic containers were buried in fine-medium sand to simulate buried landmines. These plastic containers were half-filled by ammonium nitrate including metal pins. Results of this pilot study are demonstrated to be highly promising to motivate further research for the use of energy-based preprocessing features in landmine detection problem.

  3. MODIStsp: An R package for automatic preprocessing of MODIS Land Products time series

    NASA Astrophysics Data System (ADS)

    Busetto, L.; Ranghetti, L.

    2016-12-01

    MODIStsp is a new R package allowing automating the creation of raster time series derived from MODIS Land Products. It allows performing several preprocessing steps (e.g. download, mosaicing, reprojection and resize) on MODIS products on a selected time period and area. All processing parameters can be set with a user-friendly GUI, allowing users to select which specific layers of the original MODIS HDF files have to be processed and which Quality Indicators have to be extracted from the aggregated MODIS Quality Assurance layers. Moreover, the tool allows on-the-fly computation of time series of Spectral Indexes (either standard or custom-specified by the user through the GUI) from surface reflectance bands. Outputs are saved as single-band rasters corresponding to each available acquisition date and output layer. Virtual files allowing easy access to the entire time series as a single file using common image processing/GIS software or R scripts can be also created. Non-interactive execution within an R script and stand-alone execution outside an R environment exploiting a previously created Options File are also possible, the latter allowing scheduling execution of MODIStsp to automatically update a time series when a new image is available. The proposed software constitutes a very useful tool for the Remote Sensing community, since it allows performing all the main preprocessing steps required for the creation of time series of MODIS data within a common framework, and without requiring any particular programming skills by its users.

  4. A Technical Review on Biomass Processing: Densification, Preprocessing, Modeling and Optimization

    SciTech Connect

    Jaya Shankar Tumuluru; Christopher T. Wright

    2010-06-01

    It is now a well-acclaimed fact that burning fossil fuels and deforestation are major contributors to climate change. Biomass from plants can serve as an alternative renewable and carbon-neutral raw material for the production of bioenergy. Low densities of 40–60 kg/m3 for lignocellulosic and 200–400 kg/m3 for woody biomass limits their application for energy purposes. Prior to use in energy applications these materials need to be densified. The densified biomass can have bulk densities over 10 times the raw material helping to significantly reduce technical limitations associated with storage, loading and transportation. Pelleting, briquetting, or extrusion processing are commonly used methods for densification. The aim of the present research is to develop a comprehensive review of biomass processing that includes densification, preprocessing, modeling and optimization. The specific objective include carrying out a technical review on (a) mechanisms of particle bonding during densification; (b) methods of densification including extrusion, briquetting, pelleting, and agglomeration; (c) effects of process and feedstock variables and biomass biochemical composition on the densification (d) effects of preprocessing such as grinding, preheating, steam explosion, and torrefaction on biomass quality and binding characteristics; (e) models for understanding the compression characteristics; and (f) procedures for response surface modeling and optimization.

  5. Pre-Processing and Cross-Correlation Techniques for Time-Distance Helioseismology

    NASA Astrophysics Data System (ADS)

    Wang, N.; de Ridder, S.; Zhao, J.

    2014-12-01

    In chaotic wave fields excited by a random distribution of noise sources a cross-correlation of the recordings made at two stations yield the interstation wave-field response. After early successes in helioseismology, laboratory studies and earth-seismology, this technique found broad application in global and regional seismology. This development came with an increasing understanding of pre-processing and cross-correlation workflows to yield an optimal signal-to-noise ratio (SNR). Helioseismologist rely heavily on stacking to increase the SNR. Until now, they have not studied different spectral-whitening and cross-correlation workflows and relies heavily on stacking to increase the SNR. The recordings vary considerably between sunspots and regular portions of the sun. Within the sunspot the periodic effects of the observation satellite orbit are difficult to remove. We remove a running alpha-mean from the data and apply a soft clip to deal with data glitches. The recordings contain energy of both flow and waves. A frequency domain filter selects the wave energy. Then the data is input to several pre-processing and cross-correlation techniques, common to earth seismology. We anticipate that spectral whitening will flatten the energy spectrum of the cross-correlations. We also expect that the cross-correlations converge faster to their expected value when the data is processed over overlapping windows. The result of this study are expected to aid in decreasing the stacking while maintaining good SNR.

  6. Star sensor image acquisition and preprocessing hardware system based on CMOS image sensor and FGPA

    NASA Astrophysics Data System (ADS)

    Hao, Xuetao; Jiang, Jie; Zhang, Guangjun

    2003-09-01

    Star Sensor is an avionics instrument used to provide the absolute 3-axis attitude of a spacecraft utilizing star observations. It consists of an electronic camera and associated processing electronics. As outcome of advancing state-of-the-art, new generation star sensor features faster, lower cost, power dissipation and size than the first generation star sensor. This paper describes a star sensor anterior image acquisition and pre-processing hardware system based on CMOS image-sensor and FPGA technology. Practically, star images are produced by a simple simulator on PC, acquired by CMOS image sensor, pre-processed by FPGA, saved in SRAM, read out by EPP protocol and validated by an image process software on PC. The hardware part of system acquires images thought CMOS image-sensor controlled by FPGA, then processes image data by a circuit module of FPGA, and save images to SRAM for test. Basic image data for star recognition and attitude determination of spacecrafts are provided by it. As an important reference for developing star sensor prototype, the system validates the performance advantages of new generation star sensor.

  7. Preprocessing of Hinode/SOT Vector Magnetograms for Nonlinear Force-Free Coronal Magnetic Field Modeling

    NASA Astrophysics Data System (ADS)

    Wiegelmann, T.; Thalmann, J. K.; Schrijver, C. J.; De Rosa, M. L.; Metcalf, T. R.

    2008-09-01

    The solar magnetic field is key to understanding the physical processes in the solar atmosphere. Nonlinear force-free codes have been shown to be useful in extrapolating the coronal field from underlying vector boundary data (for an overview see Schrijver et al. (2006)). However, we can only measure the magnetic field vector routinely with high accuracy in the photosphere with, e.g., Hinode/SOT, and unfortunately these data do not fulfill the force-free consistency condition as defined by Aly (1989). We must therefore apply some transformations to these data before nonlinear force-free extrapolation codes can be legitimately applied. To this end, we have developed a minimization procedure that uses the measured photospheric field vectors as input to approximate a more chromospheric like field (The method was dubbed preprocessing. See Wiegelmann et al. (2006) for details). The procedure includes force-free consistency integrals and spatial smoothing. The method has been intensively tested with model active regions (see Metcalf et al. 2008) and been applied to several ground based vector magnetogram data before. Here we apply the preprocessing program to photospheric magnetic field measurements with the Hinode/SOT instrument.

  8. Applying Enhancement Filters in the Pre-processing of Images of Lymphoma

    NASA Astrophysics Data System (ADS)

    Henrique Silva, Sérgio; Zanchetta do Nascimento, Marcelo; Alves Neves, Leandro; Ramos Batista, Valério

    2015-01-01

    Lymphoma is a type of cancer that affects the immune system, and is classified as Hodgkin or non-Hodgkin. It is one of the ten types of cancer that are the most common on earth. Among all malignant neoplasms diagnosed in the world, lymphoma ranges from three to four percent of them. Our work presents a study of some filters devoted to enhancing images of lymphoma at the pre-processing step. Here the enhancement is useful for removing noise from the digital images. We have analysed the noise caused by different sources like room vibration, scraps and defocusing, and in the following classes of lymphoma: follicular, mantle cell and B-cell chronic lymphocytic leukemia. The filters Gaussian, Median and Mean-Shift were applied to different colour models (RGB, Lab and HSV). Afterwards, we performed a quantitative analysis of the images by means of the Structural Similarity Index. This was done in order to evaluate the similarity between the images. In all cases we have obtained a certainty of at least 75%, which rises to 99% if one considers only HSV. Namely, we have concluded that HSV is an important choice of colour model at pre-processing histological images of lymphoma, because in this case the resulting image will get the best enhancement.

  9. Contour Error Map Algorithm

    NASA Technical Reports Server (NTRS)

    Merceret, Francis; Lane, John; Immer, Christopher; Case, Jonathan; Manobianco, John

    2005-01-01

    The contour error map (CEM) algorithm and the software that implements the algorithm are means of quantifying correlations between sets of time-varying data that are binarized and registered on spatial grids. The present version of the software is intended for use in evaluating numerical weather forecasts against observational sea-breeze data. In cases in which observational data come from off-grid stations, it is necessary to preprocess the observational data to transform them into gridded data. First, the wind direction is gridded and binarized so that D(i,j;n) is the input to CEM based on forecast data and d(i,j;n) is the input to CEM based on gridded observational data. Here, i and j are spatial indices representing 1.25-km intervals along the west-to-east and south-to-north directions, respectively; and n is a time index representing 5-minute intervals. A binary value of D or d = 0 corresponds to an offshore wind, whereas a value of D or d = 1 corresponds to an onshore wind. CEM includes two notable subalgorithms: One identifies and verifies sea-breeze boundaries; the other, which can be invoked optionally, performs an image-erosion function for the purpose of attempting to eliminate river-breeze contributions in the wind fields.

  10. Data Pre-Processing Method to Remove Interference of Gas Bubbles and Cell Clusters During Anaerobic and Aerobic Yeast Fermentations in a Stirred Tank Bioreactor

    NASA Astrophysics Data System (ADS)

    Princz, S.; Wenzel, U.; Miller, R.; Hessling, M.

    2014-11-01

    One aerobic and four anaerobic batch fermentations of the yeast Saccharomyces cerevisiae were conducted in a stirred bioreactor and monitored inline by NIR spectroscopy and a transflectance dip probe. From the acquired NIR spectra, chemometric partial least squares regression (PLSR) models for predicting biomass, glucose and ethanol were constructed. The spectra were directly measured in the fermentation broth and successfully inspected for adulteration using our novel data pre-processing method. These adulterations manifested as strong fluctuations in the shape and offset of the absorption spectra. They resulted from cells, cell clusters, or gas bubbles intercepting the optical path of the dip probe. In the proposed data pre-processing method, adulterated signals are removed by passing the time-scanned non-averaged spectra through two filter algorithms with a 5% quantile cutoff. The filtered spectra containing meaningful data are then averaged. A second step checks whether the whole time scan is analyzable. If true, the average is calculated and used to prepare the PLSR models. This new method distinctly improved the prediction results. To dissociate possible correlations between analyte concentrations, such as glucose and ethanol, the feeding analytes were alternately supplied at different concentrations (spiking) at the end of the four anaerobic fermentations. This procedure yielded low-error (anaerobic) PLSR models for predicting analyte concentrations of 0.31 g/l for biomass, 3.41 g/l for glucose, and 2.17 g/l for ethanol. The maximum concentrations were 14 g/l biomass, 167 g/l glucose, and 80 g/l ethanol. Data from the aerobic fermentation, carried out under high agitation and high aeration, were incorporated to realize combined PLSR models, which have not been previously reported to our knowledge.

  11. Microarray Technology Applied to Human Allergic Disease

    PubMed Central

    Hamilton, Robert G.

    2017-01-01

    IgE antibodies serve as the gatekeeper for the release of mediators from sensitized (IgE positive) mast cells and basophils following a relevant allergen exposure which can lead to an immediate-type hypersensitivity (allergic) reaction. Purified recombinant and native allergens were combined in the 1990s with state of the art chip technology to establish the first microarray-based IgE antibody assay. Triplicate spots to over 100 allergenic molecules are immobilized on an amine-activated glass slide to form a single panel multi-allergosorbent assay. Human antibodies, typically of the IgE and IgG isotypes, specific for one or many allergens bind to their respective allergen(s) on the chip. Following removal of unbound serum proteins, bound IgE antibody is detected with a fluorophore-labeled anti-human IgE reagent. The fluorescent profile from the completed slide provides a sensitization profile of an allergic patient which can identify IgE antibodies that bind to structurally similar (cross-reactive) allergen families versus molecules that are unique to a single allergen specificity. Despite its ability to rapidly analyze many IgE antibody specificities in a single simple assay format, the chip-based microarray remains less analytically sensitive and quantitative than its singleplex assay counterpart (ImmunoCAP, Immulite). Microgram per mL quantities of allergen-specific IgG antibody can also complete with nanogram per mL quantities of specific IgE for limited allergen binding sites on the chip. Microarray assays, while not used in clinical immunology laboratories for routine patient IgE antibody testing, will remain an excellent research tool for defining sensitization profiles of populations in epidemiological studies. PMID:28134842

  12. Development of a microarray for identification of pathogenic Clostridium species

    PubMed Central

    Janvilisri, Tavan; Scaria, Joy; Gleed, Robin; Fubini, Susan; Bonkosky, Michelle M.; Gröhn, Yrjö T.; Chang, Yung-Fu

    2009-01-01

    In recent years, Clostridium species have rapidly reemerged as human and animal pathogens. The detection and identification of pathogenic Clostridium species is therefore critical for clinical diagnosis and antimicrobial therapy. Traditional diagnostic techniques for clostridia are laborious, time-consuming and may adversely affect the therapeutic outcome. In this study, we developed an oligonucleotide diagnostic microarray for pathogenic Clostridium species. The microarray specificity was tested against 65 Clostridium isolates. The applicability of this microarray in a clinical setting was assessed with the use of mock stool samples. The microarray was successful in discriminating at least four species with the limit of detection as low as 104 CFU/ml. In addition, the pattern of virulence and antibiotic resistance genes of tested strains were determined through the microarrays. This approach demonstrates the high-throughput detection and identification of Clostridium species and provides advantages over traditional methods. Microarray-based techniques are promising applications for clinical diagnosis and epidemiological investigations. PMID:19879710

  13. Application of DNA microarray technology to gerontological studies.

    PubMed

    Masuda, Kiyoshi; Kuwano, Yuki; Nishida, Kensei; Rokutan, Kazuhito

    2013-01-01

    Gene expression patterns change dramatically in aging and age-related events. The DNA microarray is now recognized as a useful device in molecular biology and widely used to identify the molecular mechanisms of aging and the biological effects of drugs for therapeutic purpose in age-related diseases. Recently, numerous technological advantages have led to the evolution of DNA microarrays and microarray-based techniques, revealing the genomic modification and all transcriptional activity. Here, we show the step-by-step methods currently used in our lab to handling the oligonucleotide microarray and miRNA microarray. Moreover, we introduce the protocols of ribonucleoprotein [RNP] immunoprecipitation followed by microarray analysis (RIP-chip) which reveal the target mRNA of age-related RNA-binding proteins.

  14. Evaluating the reliability of different preprocessing steps to estimate graph theoretical measures in resting state fMRI data.

    PubMed

    Aurich, Nathassia K; Alves Filho, José O; Marques da Silva, Ana M; Franco, Alexandre R

    2015-01-01

    With resting-state functional MRI (rs-fMRI) there are a variety of post-processing methods that can be used to quantify the human brain connectome. However, there is also a choice of which preprocessing steps will be used prior to calculating the functional connectivity of the brain. In this manuscript, we have tested seven different preprocessing schemes and assessed the reliability between and reproducibility within the various strategies by means of graph theoretical measures. Different preprocessing schemes were tested on a publicly available dataset, which includes rs-fMRI data of healthy controls. The brain was parcellated into 190 nodes and four graph theoretical (GT) measures were calculated; global efficiency (GEFF), characteristic path length (CPL), average clustering coefficient (ACC), and average local efficiency (ALE). Our findings indicate that results can significantly differ based on which preprocessing steps are selected. We also found dependence between motion and GT measurements in most preprocessing strategies. We conclude that by using censoring based on outliers within the functional time-series as a processing, results indicate an increase in reliability of GT measurements with a reduction of the dependency of head motion.

  15. Tissue microarrays for early target evaluation.

    PubMed

    Simon, Ronald; Mirlacher, Martina; Sauter, Guido

    2004-09-01

    Early assessment of the probable biological importance of drug targets, the potential market size of a successful new drug, and possible treatment side effects are critical for risk management in drug development. A comprehensive molecular epidemiology analysis involving thousands of well-characterized human tissues will thus provide vital information for strategic decision-making. Tissue microarray (TMA) technology is ideally suited for such projects. The simultaneous analysis of thousands of tissues enables highly standardized, fast and affordable translational research studies of unprecedented scale.:

  16. Protein Microarrays--Without a Trace

    SciTech Connect

    Camarero, J A

    2007-04-05

    Many experimental approaches in biology and biophysics, as well as applications in diagnosis and drug discovery, require proteins to be immobilized on solid supports. Protein microarrays, for example, provide a high-throughput format to study biomolecular interactions. The technique employed for protein immobilization is a key to the success of these applications. Recent biochemical developments are allowing, for the first time, the selective and traceless immobilization of proteins generated by cell-free systems without the need for purification and/or reconcentration prior to the immobilization step.

  17. ProMAT: protein microarray analysis tool

    SciTech Connect

    White, Amanda M.; Daly, Don S.; Varnum, Susan M.; Anderson, Kevin K.; Bollinger, Nikki; Zangar, Richard C.

    2006-04-04

    Summary: ProMAT is a software tool for statistically analyzing data from ELISA microarray experiments. The software estimates standard curves, sample protein concentrations and their uncertainties for multiple assays. ProMAT generates a set of comprehensive figures for assessing results and diagnosing process quality. The tool is available for Windows or Mac, and is distributed as open-source Java and R code. Availability: ProMAT is available at http://www.pnl.gov/statistics/ProMAT. ProMAT requires Java version 1.5.0 and R version 1.9.1 (or more recent versions) which are distributed with the tool.

  18. Refractive index change detection based on porous silicon microarray

    NASA Astrophysics Data System (ADS)

    Chen, Weirong; Jia, Zhenhong; Li, Peng; Lv, Guodong; Lv, Xiaoyi

    2016-05-01

    By combining photolithography with the electrochemical anodization method, a microarray device of porous silicon (PS) photonic crystal was fabricated on the crystalline silicon substrate. The optical properties of the microarray were analyzed with the transfer matrix method. The relationship between refractive index and reflectivity of each array element of the microarray at 633 nm was also studied, and the array surface reflectivity changes were observed through digital imaging. By means of the reflectivity measurement method, reflectivity changes below 10-3 can be observed based on PS microarray. The results of this study can be applied to the detection of biosensor arrays.

  19. Chemiluminescence microarrays in analytical chemistry: a critical review.

    PubMed

    Seidel, Michael; Niessner, Reinhard

    2014-09-01

    Multi-analyte immunoassays on microarrays and on multiplex DNA microarrays have been described for quantitative analysis of small organic molecules (e.g., antibiotics, drugs of abuse, small molecule toxins), proteins (e.g., antibodies or protein toxins), and microorganisms, viruses, and eukaryotic cells. In analytical chemistry, multi-analyte detection by use of analytical microarrays has become an innovative research topic because of the possibility of generating several sets of quantitative data for different analyte classes in a short time. Chemiluminescence (CL) microarrays are powerful tools for rapid multiplex analysis of complex matrices. A wide range of applications for CL microarrays is described in the literature dealing with analytical microarrays. The motivation for this review is to summarize the current state of CL-based analytical microarrays. Combining analysis of different compound classes on CL microarrays reduces analysis time, cost of reagents, and use of laboratory space. Applications are discussed, with examples from food safety, water safety, environmental monitoring, diagnostics, forensics, toxicology, and biosecurity. The potential and limitations of research on multiplex analysis by use of CL microarrays are discussed in this review.

  20. Studying cellular processes and detecting disease with protein microarrays

    SciTech Connect

    Zangar, Richard C.; Varnum, Susan M.; Bollinger, Nikki

    2005-10-31

    Protein microarrays are a rapidly developing analytic tool with diverse applications in biomedical research. These applications include profiling of disease markers or autoimmune responses, understanding molecular pathways, protein modifications and protein activities. One factor that is driving this expanding usage is the wide variety of experimental formats that protein microarrays can take. In this review, we provide a short, conceptual overview of the different approaches for protein microarray. We then examine some of the most significant applications of these microarrays to date, with an emphasis on how global protein analyses can be used to facilitate biomedical research.

  1. Re-Annotator: Annotation Pipeline for Microarray Probe Sequences.

    PubMed

    Arloth, Janine; Bader, Daniel M; Röh, Simone; Altmann, Andre

    2015-01-01

    Microarray technologies are established approaches for high throughput gene expression, methylation and genotyping analysis. An accurate mapping of the array probes is essential to generate reliable biological findings. However, manufacturers of the microarray platforms typically provide incomplete and outdated annotation tables, which often rely on older genome and transcriptome versions that differ substantially from up-to-date sequence databases. Here, we present the Re-Annotator, a re-annotation pipeline for microarray probe sequences. It is primarily designed for gene expression microarrays but can also be adapted to other types of microarrays. The Re-Annotator uses a custom-built mRNA reference database to identify the positions of gene expression array probe sequences. We applied Re-Annotator to the Illumina Human-HT12 v4 microarray platform and found that about one quarter (25%) of the probes differed from the manufacturer's annotation. In further computational experiments on experimental gene expression data, we compared Re-Annotator to another probe re-annotation tool, ReMOAT, and found that Re-Annotator provided an improved re-annotation of microarray probes. A thorough re-annotation of probe information is crucial to any microarray analysis. The Re-Annotator pipeline is freely available at http://sourceforge.net/projects/reannotator along with re-annotated files for Illumina microarrays HumanHT-12 v3/v4 and MouseRef-8 v2.

  2. Application of DNA microarrays in occupational health research.

    PubMed

    Koizumi, Shinji

    2004-01-01

    The profiling of gene expression patterns with DNA microarrays is recently being widely used not only in basic molecular biological studies but also in the practical fields. In clinical application, for example, this technique is expected to be quite useful in making a correct diagnosis. In the pharmacological area, the microarray analysis can be applied to drug discovery and individualized drug treatment. Although not so popular as these examples, DNA microarrays could also be a powerful tool in studies relevant to occupational health. This review will describe the outline of gene expression profiling with DNA microarrays and prospects in occupational health research.

  3. Zeptosens' protein microarrays: a novel high performance microarray platform for low abundance protein analysis.

    PubMed

    Pawlak, Michael; Schick, Eginhard; Bopp, Martin A; Schneider, Michael J; Oroszlan, Peter; Ehrat, Markus

    2002-04-01

    Protein microarrays are considered an enabling technology, which will significantly expand the scope of current protein expression and protein interaction analysis. Current technologies, such as two-dimensional gel electrophoresis (2-DE) in combination with mass spectrometry, allowing the identification of biologically relevant proteins, have a high resolving power, but also considerable limitations. As was demonstrated by Gygi et al. (Proc. Nat. Acad. Sci. USA 2000,97, 9390-9395), most spots in 2-DE, observed from whole cell extracts, are from high abundance proteins, whereas low abundance proteins, such as signaling molecules or kinases, are only poorly represented. Protein microarrays are expected to significantly expedite the discovery of new markers and targets of pharmaceutical interest, and to have the potential for high-throughput applications. Key factors to reach this goal are: high read-out sensitivity for quantification also of low abundance proteins, functional analysis of proteins, short assay analysis times, ease of handling and the ability to integrate a variety of different targets and new assays. Zeptosens has developed a revolutionary new bioanalytical system based on the proprietary planar waveguide technology which allows us to perform multiplexed, quantitative biomolecular interaction analysis with highest sensitivity in a microarray format upon utilizing the specific advantages of the evanescent field fluorescence detection. The analytical system, comprising an ultrasensitive fluorescence reader and microarray chips with integrated microfluidics, enables the user to generate a multitude of high fidelity data in applications such as protein expression profiling or investigating protein-protein interactions. In this paper, the important factors for developing high performance protein microarray systems, especially for targeting low abundant messengers of relevant biological information, will be discussed and the performance of the system will

  4. Improving night sky star image processing algorithm for star sensors.

    PubMed

    Arbabmir, Mohammad Vali; Mohammadi, Seyyed Mohammad; Salahshour, Sadegh; Somayehee, Farshad

    2014-04-01

    In this paper, the night sky star image processing algorithm, consisting of image preprocessing, star pattern recognition, and centroiding steps, is improved. It is shown that the proposed noise reduction approach can preserve more necessary information than other frequently used approaches. It is also shown that the proposed thresholding method unlike commonly used techniques can properly perform image binarization, especially in images with uneven illumination. Moreover, the higher performance rate and lower average centroiding estimation error of near 0.045 for 400 simulated images compared to other algorithms show the high capability of the proposed night sky star image processing algorithm.

  5. Manufacturing DNA microarrays from unpurified PCR products

    PubMed Central

    Diehl, Frank; Beckmann, Boris; Kellner, Nadine; Hauser, Nicole C.; Diehl, Susanne; Hoheisel, Jörg D.

    2002-01-01

    For the production of DNA microarrays from PCR products, purification of the the DNA fragments prior to spotting is a major expense in cost and time. Also, a considerable amount of material is lost during this process and contamination might occur. Here, a protocol is presented that permits the manufacture of microarrays from unpurified PCR products on aminated surfaces such as glass slides coated with the widely used poly(l-lysine) or aminosilane. The presence of primer molecules in the PCR sample does not increase the non-specific signal upon hybridisation. Overall, signal intensity on arrays made of unpurified PCR products is 94% of the intensity obtained with the respective purified molecules. This slight loss in signal, however, is offset by a reduced variation in the amount of DNA present at the individual spot positions across an array, apart from the considerable savings in time and cost. In addition, a larger number of arrays can be made from one batch of amplification products. PMID:12177307

  6. Solution processed organic microarray with inverted structure

    NASA Astrophysics Data System (ADS)

    Toglia, Patrick; Lewis, Jason; Lafalce, Evan; Jiang, Xiaomei

    2011-03-01

    We have fabricated inverted organic microarray using a novel solution-based technique. The array consists of 60 small (1 square mm) solar cells on a one inch by one inch glass substrate. The device utilizes photoactive materials such as a blend of poly(3-hexylthiophene) (P3HT) and [6,6]-phenyl-C61-butyric acid methyl ester (PCBM). Manipulation of active layer nanomorphology has been done by choice of solvents and annealing conditions. Detailed analysis of device physics including current voltage characteristics, external quantum efficiency and carrier recombinations will be presented and complimented by AFM images and glazing angle XRD of the active layer under different processing conditions. The procedure described here has the full potential for use in future fabrication of microarrays with single cell as small as 0.01 square mm for application in DC power supplies for electrostatic Microelectromechanical systems (MEMS) devices. This work was supported by New Energy Technology Inc. and Florida High Tech Corridor Matching Fund (FHT 09-18).

  7. Clickable Polymeric Coating for Glycan Microarrays.

    PubMed

    Zilio, Caterina; Sola, Laura; Cretich, Marina; Bernardi, Anna; Chiari, Marcella

    2017-01-01

    The interaction of carbohydrates with a variety of biological targets, including antibodies, proteins, viruses, and cells are of utmost importance in many aspects of biology. Glycan microarrays are increasingly used to determine the binding specificity of glycan-binding proteins. In this study, a novel microarray support is reported for the fabrication of glycan arrays that combines the higher sensitivity of a layered Si-SiO2 surface with a novel polymeric coating easily modifiable by subsequent click reaction. The alkyne-containing copolymer, adsorbed from an aqueous solution, produces a coating by a single step procedure and serves as a soft, tridimensional support for the oriented immobilization of carbohydrates via azide/alkyne Cu (I) catalyzed "click" reaction. The advantages of a functional 3D polymer coating making use of a click chemistry immobilization are combined with the high fluorescence sensitivity and superior signal-to-noise ratio of a Si-SiO2 substrate. The proposed approach enables the attachment of complex sugars on a silicon oxide surface by a method that does not require skilled personnel and chemistry laboratories.

  8. Discovering Pair-wise Synergies in Microarray Data

    PubMed Central

    Chen, Yuan; Cao, Dan; Gao, Jun; Yuan, Zheming

    2016-01-01

    Informative gene selection can have important implications for the improvement of cancer diagnosis and the identification of new drug targets. Individual-gene-ranking methods ignore interactions between genes. Furthermore, popular pair-wise gene evaluation methods, e.g. TSP and TSG, are helpless for discovering pair-wise interactions. Several efforts to discover pair-wise synergy have been made based on the information approach, such as EMBP and FeatKNN. However, the methods which are employed to estimate mutual information, e.g. binarization, histogram-based and KNN estimators, depend on known data or domain characteristics. Recently, Reshef et al. proposed a novel maximal information coefficient (MIC) measure to capture a wide range of associations between two variables that has the property of generality. An extension from MIC(X; Y) to MIC(X1; X2; Y) is therefore desired. We developed an approximation algorithm for estimating MIC(X1; X2; Y) where Y is a discrete variable. MIC(X1; X2; Y) is employed to detect pair-wise synergy in simulation and cancer microarray data. The results indicate that MIC(X1; X2; Y) also has the property of generality. It can discover synergic genes that are undetectable by reference feature selection methods such as MIC(X; Y) and TSG. Synergic genes can distinguish different phenotypes. Finally, the biological relevance of these synergic genes is validated with GO annotation and OUgene database. PMID:27470995

  9. Image quantification of high-throughput tissue microarray

    NASA Astrophysics Data System (ADS)

    Wu, Jiahua; Dong, Junyu; Zhou, Huiyu

    2006-03-01

    Tissue microarray (TMA) technology allows rapid visualization of molecular targets in thousands of tissue specimens at a time and provides valuable information on expression of proteins within tissues at a cellular and sub-cellular level. TMA technology overcomes the bottleneck of traditional tissue analysis and allows it to catch up with the rapid advances in lead discovery. Studies using TMA on immunohistochemistry (IHC) can produce a large amount of images for interpretation within a very short time. Manual interpretation does not allow accurate quantitative analysis of staining to be undertaken. Automatic image capture and analysis has been shown to be superior to manual interpretation. The aims of this work is to develop a truly high-throughput and fully automated image capture and analysis system. We develop a robust colour segmentation algorithm using hue-saturation-intensity (HSI) colour space to provide quantification of signal intensity and partitioning of staining on high-throughput TMA. Initial segmentation results and quantification data have been achieved on 16,000 TMA colour images over 23 different tissue types.

  10. Improved covariance matrix estimators for weighted analysis of microarray data.

    PubMed

    Astrand, Magnus; Mostad, Petter; Rudemo, Mats

    2007-12-01

    Empirical Bayes models have been shown to be powerful tools for identifying differentially expressed genes from gene expression microarray data. An example is the WAME model, where a global covariance matrix accounts for array-to-array correlations as well as differing variances between arrays. However, the existing method for estimating the covariance matrix is very computationally intensive and the estimator is biased when data contains many regulated genes. In this paper, two new methods for estimating the covariance matrix are proposed. The first method is a direct application of the EM algorithm for fitting the multivariate t-distribution of the WAME model. In the second method, a prior distribution for the log fold-change is added to the WAME model, and a discrete approximation is used for this prior. Both methods are evaluated using simulated and real data. The first method shows equal performance compared to the existing method in terms of bias and variability, but is superior in terms of computer time. For large data sets (>15 arrays), the second method also shows superior computer run time. Moreover, for simulated data with regulated genes the second method greatly reduces the bias. With the proposed methods it is possible to apply the WAME model to large data sets with reasonable computer run times. The second method shows a small bias for simulated data, but appears to have a larger bias for real data with many regulated genes.

  11. Fine-scaled human genetic structure revealed by SNP microarrays.

    PubMed

    Xing, Jinchuan; Watkins, W Scott; Witherspoon, David J; Zhang, Yuhua; Guthery, Stephen L; Thara, Rangaswamy; Mowry, Bryan J; Bulayeva, Kazima; Weiss, Robert B; Jorde, Lynn B

    2009-05-01

    We report an analysis of more than 240,000 loci genotyped using the Affymetrix SNP microarray in 554 individuals from 27 worldwide populations in Africa, Asia, and Europe. To provide a more extensive and complete sampling of human genetic variation, we have included caste and tribal samples from two states in South India, Daghestanis from eastern Europe, and the Iban from Malaysia. Consistent with observations made by Charles Darwin, our results highlight shared variation among human populations and demonstrate that much genetic variation is geographically continuous. At the same time, principal components analyses reveal discernible genetic differentiation among almost all identified populations in our sample, and in most cases, individuals can be clearly assigned to defined populations on the basis of SNP genotypes. All individuals are accurately classified into continental groups using a model-based clustering algorithm, but between closely related populations, genetic and self-classifications conflict for some individuals. The 250K data permitted high-level resolution of genetic variation among Indian caste and tribal populations and between highland and lowland Daghestani populations. In particular, upper-caste individuals from Tamil Nadu and Andhra Pradesh form one defined group, lower-caste individuals from these two states form another, and the tribal Irula samples form a third. Our results emphasize the correlation of genetic and geographic distances and highlight other elements, including social factors that have contributed to population structure.

  12. Cellular neural networks, the Navier-Stokes equation, and microarray image reconstruction.

    PubMed

    Zineddin, Bachar; Wang, Zidong; Liu, Xiaohui

    2011-11-01

    Although the last decade has witnessed a great deal of improvements achieved for the microarray technology, many major developments in all the main stages of this technology, including image processing, are still needed. Some hardware implementations of microarray image processing have been proposed in the literature and proved to be promising alternatives to the currently available software systems. However, the main drawback of those proposed approaches is the unsuitable addressing of the quantification of the gene spot in a realistic way without any assumption about the image surface. Our aim in this paper is to present a new image-reconstruction algorithm using the cellular neural network that solves the Navier-Stokes equation. This algorithm offers a robust method for estimating the background signal within the gene-spot region. The MATCNN toolbox for Matlab is used to test the proposed method. Quantitative comparisons are carried out, i.e., in terms of objective criteria, between our approach and some other available methods. It is shown that the proposed algorithm gives highly accurate and realistic measurements in a fully automated manner within a remarkably efficient time.

  13. An Enhanced TIMESAT Algorithm for Estimating Vegetation Phenology Metrics from MODIS Data

    NASA Technical Reports Server (NTRS)

    Tan, Bin; Morisette, Jeffrey T.; Wolfe, Robert E.; Gao, Feng; Ederer, Gregory A.; Nightingale, Joanne; Pedelty, Jeffrey A.

    2012-01-01

    An enhanced TIMESAT algorithm was developed for retrieving vegetation phenology metrics from 250 m and 500 m spatial resolution Moderate Resolution Imaging Spectroradiometer (MODIS) vegetation indexes (VI) over North America. MODIS VI data were pre-processed using snow-cover and land surface temperature data, and temporally smoothed with the enhanced TIMESAT algorithm. An objective third derivative test was applied to define key phenology dates and retrieve a set of phenology metrics. This algorithm has been applied to two MODIS VIs: Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI). In this paper, we describe the algorithm and use EVI as an example to compare three sets of TIMESAT algorithm/MODIS VI combinations: a) original TIMESAT algorithm with original MODIS VI, b) original TIMESAT algorithm with pre-processed MODIS VI, and c) enhanced TIMESAT and pre-processed MODIS VI. All retrievals were compared with ground phenology observations, some made available through the National Phenology Network. Our results show that for MODIS data in middle to high latitude regions, snow and land surface temperature information is critical in retrieving phenology metrics from satellite observations. The results also show that the enhanced TIMESAT algorithm can better accommodate growing season start and end dates that vary significantly from year to year. The TIMESAT algorithm improvements contribute to more spatial coverage and more accurate retrievals of the phenology metrics. Among three sets of TIMESAT/MODIS VI combinations, the start of the growing season metric predicted by the enhanced TIMESAT algorithm using pre-processed MODIS VIs has the best associations with ground observed vegetation greenup dates.

  14. An enhanced TIMESAT algorithm for estimating vegetation phenology metrics from MODIS data

    USGS Publications Warehouse

    Tan, B.; Morisette, J.T.; Wolfe, R.E.; Gao, F.; Ederer, G.A.; Nightingale, J.; Pedelty, J.A.

    2011-01-01

    An enhanced TIMESAT algorithm was developed for retrieving vegetation phenology metrics from 250 m and 500 m spatial resolution Moderate Resolution Imaging Spectroradiometer (MODIS) vegetation indexes (VI) over North America. MODIS VI data were pre-processed using snow-cover and land surface temperature data, and temporally smoothed with the enhanced TIMESAT algorithm. An objective third derivative test was applied to define key phenology dates and retrieve a set of phenology metrics. This algorithm has been applied to two MODIS VIs: Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI). In this paper, we describe the algorithm and use EVI as an example to compare three sets of TIMESAT algorithm/MODIS VI combinations: a) original TIMESAT algorithm with original MODIS VI, b) original TIMESAT algorithm with pre-processed MODIS VI, and c) enhanced TIMESAT and pre-processed MODIS VI. All retrievals were compared with ground phenology observations, some made available through the National Phenology Network. Our results show that for MODIS data in middle to high latitude regions, snow and land surface temperature information is critical in retrieving phenology metrics from satellite observations. The results also show that the enhanced TIMESAT algorithm can better accommodate growing season start and end dates that vary significantly from year to year. The TIMESAT algorithm improvements contribute to more spatial coverage and more accurate retrievals of the phenology metrics. Among three sets of TIMESAT/MODIS VI combinations, the start of the growing season metric predicted by the enhanced TIMESAT algorithm using pre-processed MODIS VIs has the best associations with ground observed vegetation greenup dates. ?? 2010 IEEE.

  15. Evaluation of the robustness of the preprocessing technique improving reversible compressibility of CT images: Tested on various CT examinations

    SciTech Connect

    Jeon, Chang Ho; Kim, Bohyoung; Gu, Bon Seung; Lee, Jong Min; Kim, Kil Joong; Lee, Kyoung Ho; Kim, Tae Ki

    2013-10-15

    Purpose: To modify the preprocessing technique, which was previously proposed, improving compressibility of computed tomography (CT) images to cover the diversity of three dimensional configurations of different body parts and to evaluate the robustness of the technique in terms of segmentation correctness and increase in reversible compression ratio (CR) for various CT examinations.Methods: This study had institutional review board approval with waiver of informed patient consent. A preprocessing technique was previously proposed to improve the compressibility of CT images by replacing pixel values outside the body region with a constant value resulting in maximizing data redundancy. Since the technique was developed aiming at only chest CT images, the authors modified the segmentation method to cover the diversity of three dimensional configurations of different body parts. The modified version was evaluated as follows. In randomly selected 368 CT examinations (352 787 images), each image was preprocessed by using the modified preprocessing technique. Radiologists visually confirmed whether the segmented region covers the body region or not. The images with and without the preprocessing were reversibly compressed using Joint Photographic Experts Group (JPEG), JPEG2000 two-dimensional (2D), and JPEG2000 three-dimensional (3D) compressions. The percentage increase in CR per examination (CR{sub I}) was measured.Results: The rate of correct segmentation was 100.0% (95% CI: 99.9%, 100.0%) for all the examinations. The median of CR{sub I} were 26.1% (95% CI: 24.9%, 27.1%), 40.2% (38.5%, 41.1%), and 34.5% (32.7%, 36.2%) in JPEG, JPEG2000 2D, and JPEG2000 3D, respectively.Conclusions: In various CT examinations, the modified preprocessing technique can increase in the CR by 25% or more without concerning about degradation of diagnostic information.

  16. GEPAS, a web-based tool for microarray data analysis and interpretation

    PubMed Central

    Tárraga, Joaquín; Medina, Ignacio; Carbonell, José; Huerta-Cepas, Jaime; Minguez, Pablo; Alloza, Eva; Al-Shahrour, Fátima; Vegas-Azcárate, Susana; Goetz, Stefan; Escobar, Pablo; Garcia-Garcia, Francisco; Conesa, Ana; Montaner, David; Dopazo, Joaquín

    2008-01-01

    Gene Expression Profile Analysis Suite (GEPAS) is one of the most complete and extensively used web-based packages for microarray data analysis. During its more than 5 years of activity it has continuously been updated to keep pace with the state-of-the-art in the changing microarray data analysis arena. GEPAS offers diverse analysis options that include well established as well as novel algorithms for normalization, gene selection, class prediction, clustering and functional profiling of the experiment. New options for time-course (or dose-response) experiments, microarray-based class prediction, new clustering methods and new tests for differential expression have been included. The new pipeliner module allows automating the execution of sequential analysis steps by means of a simple but powerful graphic interface. An extensive re-engineering of GEPAS has been carried out which includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. GEPAS is nowadays the most quoted web tool in its field and it is extensively used by researchers of many countries and its records indicate an average usage rate of 500 experiments per day. GEPAS, is available at http://www.gepas.org. PMID:18508806

  17. Gene selection and classification for cancer microarray data based on machine learning and similarity measures

    PubMed Central

    2011-01-01

    Background Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. Results To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. Conclusions On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF. PMID:22369383

  18. Unsupervised assessment of microarray data quality using a Gaussian mixture model

    PubMed Central

    Howard, Brian E; Sick, Beate; Heber, Steffen

    2009-01-01

    Background Quality assessment of microarray data is an important and often challenging aspect of gene expression analysis. This task frequently involves the examination of a variety of summary statistics and diagnostic plots. The interpretation of these diagnostics is often subjective, and generally requires careful expert scrutiny. Results We show how an unsupervised classification technique based on the Expectation-Maximization (EM) algorithm and the naïve Bayes model can be used to automate microarray quality assessment. The method is flexible and can be easily adapted to accommodate alternate quality statistics and platforms. We evaluate our approach using Affymetrix 3' gene expression and exon arrays and compare the performance of this method to a similar supervised approach. Conclusion This research illustrates the efficacy of an unsupervised classification approach for the purpose of automated microarray data quality assessment. Since our approach requires only unannotated training data, it is easy to customize and to keep up-to-date as technology evolves. In contrast to other "black box" classification systems, this method also allows for intuitive explanations. PMID:19545436

  19. CMS Preprocessing Subsystem user`s guide: Software Version 2.0

    SciTech Connect

    Didier, B.T.; Gash, J.D.; Greitzer, F.L.; Havre, S.L.; Ramsdell, J.V.; Turney, C.R.

    1993-12-01

    The Common Mapping Standard (CMS) Data Production System (CDPS) produces and distributes CMS data in compliance with the Common Mapping Standard Interface Control Document. CDPS is composed of two subsystems, the CMS Preprocessing Subsystem (CPS) and the CMS Distribution Subsystem (CDS). This guide describes the operation of CPS. CPS is responsible for the management of source data and the production of CMS data from source data. The CPS system was developed for use on a workstation running Ultrix 4.2, the X Window System Version X11R4, and motif Version 1.1. This subsystem is organized into four major functional groups and supports production of CMS data from source chart, indose, and elevation data products.

  20. Analyzing ChIP-seq data: preprocessing, normalization, differential identification, and binding pattern characterization.

    PubMed

    Taslim, Cenny; Huang, Kun; Huang, Tim; Lin, Shili

    2012-01-01

    Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a high-throughput antibody-based method to study genome-wide protein-DNA binding interactions. ChIP-seq technology allows scientist to obtain more accurate data providing genome-wide coverage with less starting material and in shorter time compared to older ChIP-chip experiments. Herein we describe a step-by-step guideline in analyzing ChIP-seq data including data preprocessing, nonlinear normalization to enable comparison between different samples and experiments, statistical-based method to identify differential binding sites using mixture modeling and local false discovery rates (fdrs), and binding pattern characterization. In addition, we provide a sample analysis of ChIP-seq data using the steps provided in the guideline.

  1. Experimental design, preprocessing, normalization and differential expression analysis of small RNA sequencing experiments

    PubMed Central

    2011-01-01

    Prior to the advent of new, deep sequencing methods, small RNA (sRNA) discovery was dependent on Sanger sequencing, which was time-consuming and limited knowledge to only the most abundant sRNA. The innovation of large-scale, next-generation sequencing has exponentially increased knowledge of the biology, diversity and abundance of sRNA populations. In this review, we discuss issues involved in the design of sRNA sequencing experiments, including choosing a sequencing platform, inherent biases that affect sRNA measurements and replication. We outline the steps involved in preprocessing sRNA sequencing data and review both the principles behind and the current options for normalization. Finally, we discuss differential expression analysis in the absence and presence of biological replicates. While our focus is on sRNA sequencing experiments, many of the principles discussed are applicable to the sequencing of other RNA populations. PMID:21356093

  2. Analog signal pre-processing for the Fermilab Main Injector BPM upgrade

    SciTech Connect

    Saewert, A.L.; Rapisarda, S.M.; Wendt, M.; /Fermilab

    2006-05-01

    An analog signal pre-processing scheme was developed, in the framework of the Fermilab Main Injector Beam Position Monitor (BPM) Upgrade, to interface BPM pickup signals to the new digital receiver based read-out system. A key component is the 8-channel electronics module, which uses separate frequency selective gain stages to acquire 53 MHz bunched proton, and 2.5 MHz anti-proton signals. Related hardware includes a filter and combiner box to sum pickup electrode signals in the tunnel. A controller module allows local/remote control of gain settings and activation of gain stages, and supplies test signals. Theory of operation, system overview, and some design details are presented, as well as first beam measurements of the prototype hardware.

  3. Synthetic aperture radar image correlation by use of preprocessing for enhancement of scattering centers.

    PubMed

    Khoury, J; Gianino, P D; Woods, C L

    2000-10-15

    We demonstrate that a significant improvement can be obtained in the recognition of complicated synthetic aperture radar images taken from the Moving and Stationary Target Acquisitions and Recognition database. These images typically have a low number of scattering centers and high noise. We first preprocess the images and the templates formed from them so that their scattering centers are enhanced. Our technique can produce high-quality performance in several correlation criteria. For realistic automatic target recognition systems, our approach should make it easy to implement optical recognition systems with binarized data for many different types of correlation filter and should have a great effect on feeding data-compressed (binarized) information into either digital or optical processors.

  4. Advances in Software Tools for Pre-processing and Post-processing of Overset Grid Computations

    NASA Technical Reports Server (NTRS)

    Chan, William M.

    2004-01-01

    Recent developments in three pieces of software for performing pre-processing and post-processing work on numerical computations using overset grids are presented. The first is the OVERGRID graphical interface which provides a unified environment for the visualization, manipulation, generation and diagnostics of geometry and grids. Modules are also available for automatic boundary conditions detection, flow solver input preparation, multiple component dynamics input preparation and dynamics animation, simple solution viewing for moving components, and debris trajectory analysis input preparation. The second is a grid generation script library that enables rapid creation of grid generation scripts. A sample of recent applications will be described. The third is the OVERPLOT graphical interface for displaying and analyzing history files generated by the flow solver. Data displayed include residuals, component forces and moments, number of supersonic and reverse flow points, and various dynamics parameters.

  5. [Sample preprocessing method for residual quinolones in honey using immunoaffinity resin].

    PubMed

    Ihara, Yoshiharu; Kato, Mihoko; Kodaira, Tsukasa; Itoh, Shinji; Terakawa, Mika; Horie, Masakazu; Saito, Koichi; Nakazawa, Hiroyuki

    2009-06-01

    A sample preparation method was developed for determination of quinolones in honey using immunoaffinity resin. For this purpose, an immunoaffinity resin for quinolones was prepared by coupling a quinolone-specific monoclonal antibody to agarose resin. Honey samples diluted with phosphate buffer were reacted with immunoaffinity resin. After the resin was washed, quinolones were eluted with glycine-HCl. Quinolones in the eluate were determined by HPLC with fluorescence detection. No interfering peak was found on the chromatograms of honey samples. The recoveries of quinolones from samples were over 70% at fortification levels of 20 ng/g (for norfloxacin, ciprofloxacin and enrofloxacin) and 10 ng/g (for danofloxacin). The quantification limits of quinolones were 2 ng/g. This sample preprocessing method using immunoaffinity resin was found to be effective and suitable for determining residual quinolones in honey.

  6. Reductive leaching of low-grade manganese ore with pre-processed cornstalk

    NASA Astrophysics Data System (ADS)

    Yi, Ai-fei; Wu, Meng-ni; Liu, Peng-wei; Feng, Ya-li; Li, Hao-ran

    2015-12-01

    Cornstalk is usually directly used as a reductant in reductive leaching manganese. However, low utilization of cornstalk makes low manganese dissolution ratio. In the research, pretreatment for cornstalk was proposed to improve manganese dissolution ratio. Cornstalk was preprocessed by a heated sulfuric acid solution (1.2 M of sulfuric acid concentration) for 10 min at 80°C. Thereafter, both the pretreated solution and the residue were used as a reductant for manganese leaching. This method not only exhibited superior activity for hydrolyzing cornstalk but also enhanced manganese dissolution. These effects were attributed to an increase in the amount of reductive sugars resulting from lignin hydrolysis. Through acid pretreatment for cornstalk, the manganese dissolution ratio was improved from 50.14% to 83.46%. The present work demonstrates for the first time the effective acid pretreatment of cornstalk to provide a cost-effective reductant for manganese leaching.

  7. Intelligent Text Retrieval and Knowledge Acquisition from Texts for NASA Applications: Preprocessing Issues

    NASA Technical Reports Server (NTRS)

    2001-01-01

    In this contract, which is a component of a larger contract that we plan to submit in the coming months, we plan to study the preprocessing issues which arise in applying natural language processing techniques to NASA-KSC problem reports. The goals of this work will be to deal with the issues of: a) automatically obtaining the problem reports from NASA-KSC data bases, b) the format of these reports and c) the conversion of these reports to a format that will be adequate for our natural language software. At the end of this contract, we expect that these problems will be solved and that we will be ready to apply our natural language software to a text database of over 1000 KSC problem reports.

  8. 3D tissue culture substrates produced by microthermoforming of pre-processed polymer films.

    PubMed

    Giselbrecht, S; Gietzelt, T; Gottwald, E; Trautmann, C; Truckenmüller, R; Weibezahn, K F; Welle, A

    2006-09-01

    We describe a new technology based on thermoforming as a microfabrication process. It significantly enhances the tailoring of polymers for three dimensional tissue engineering purposes since for the first time highly resolved surface and bulk modifications prior to a microstructuring process can be realised. In contrast to typical micro moulding techniques, the melting phase is avoided and thus allows the forming of pre-processed polymer films. The polymer is formed in a thermoelastic state without loss of material coherence. Therefore, previously generated modifications can be preserved. To prove the feasibility of our newly developed technique, so called SMART = Substrate Modification And Replication by Thermoforming, polymer films treated by various polymer modification methods, like UV-based patterned films, and films modified by the bombardment with energetic heavy ions, were post-processed by microthermoforming. The preservation of locally applied specific surface and bulk features was demonstrated e.g. by the selective adhesion of cells to patterned microcavity walls.

  9. Feasibility investigation of integrated optics Fourier transform devices. [holographic subtraction for multichannel data preprocessing

    NASA Technical Reports Server (NTRS)

    Verber, C. M.; Vahey, D. W.; Wood, V. E.; Kenan, R. P.; Hartman, N. F.

    1977-01-01

    The possibility of producing an integrated optics data processing device based upon Fourier transformations or other parallel processing techniques, and the ways in which such techniques may be used to upgrade the performance of present and projected NASA systems were investigated. Activities toward this goal include; (1) production of near-diffraction-limited geodesic lenses in glass waveguides; (2) development of grinding and polishing techniques for the production of geodesic lenses in LiNbO3 waveguides; (3) development of a characterization technique for waveguide lenses; and (4) development of a theory for corrected aspheric geodesic lenses. A holographic subtraction system was devised which should be capable of rapid on-board preprocessing of a large number of parallel data channels. The principle involved is validated in three demonstrations.

  10. Image pre-processing method for near-wall PIV measurements over moving curved interfaces

    NASA Astrophysics Data System (ADS)

    Jia, L. C.; Zhu, Y. D.; Jia, Y. X.; Yuan, H. J.; Lee, C. B.

    2017-03-01

    PIV measurements near a moving interface are always difficult. This paper presents a PIV image pre-processing method that returns high spatial resolution velocity profiles near the interface. Instead of re-shaping or re-orientating the interrogation windows, interface tracking and an image transformation are used to stretch the particle image strips near a curved interface into rectangles. Then the adaptive structured interrogation windows can be arranged at specified distances from the interface. Synthetic particles are also added into the solid region to minimize interfacial effects and to restrict particles on both sides of the interface. Since a high spatial resolution is only required in high velocity gradient region, adaptive meshing and stretching of the image strips in the normal direction is used to improve the cross-correlation signal-to-noise ratio (SN) by reducing the velocity difference and the particle image distortion within the interrogation window. A two dimensional Gaussian fit is used to compensate for the effects of stretching particle images. The working hypothesis is that fluid motion near the interface is ‘quasi-tangential flow’, which is reasonable in most fluid-structure interaction scenarios. The method was validated against the window deformation iterative multi-grid scheme (WIDIM) using synthetic image pairs with different velocity profiles. The method was tested for boundary layer measurements of a supersonic turbulent boundary layer on a flat plate, near a rotating blade and near a flexible flapping flag. This image pre-processing method provides higher spatial resolution than conventional WIDIM and good robustness for measuring velocity profiles near moving interfaces.

  11. Evaluating the Effect of Image Preprocessing on an Information-Theoretic CAD System in Mammography

    PubMed Central

    Tourassi, Georgia D.; Ike, Robert; Singh, Swatee; Harrawood, Brian

    2008-01-01

    Rationale and Objectives In our earlier studies we reported an evidence-based Computer Assisted Decision (CAD) system for location-specific interrogation of mammograms. A content-based image retrieval framework with information theoretic (IT) similarity measures serves as the foundation for this system. Specifically, the normalized mutual information (NMI) was shown to be the most effective similarity measure for reduction of false positive marks generated by other, prescreening mass detection schemes. The objective of this work was to investigate the importance of image filtering as a possible preprocessing step in our IT-CAD system. Materials and Methods Different filters were applied, each one aiming to compensate for known limitations of the NMI similarity measure. The study was based on a region-of-interest database that included true masses and false positive regions from digitized mammograms. Results Receiver Operating Characteristics (ROC) analysis showed that IT-CAD is affected slightly by image filtering. Modest, yet statistically significant performance gain was observed with median filtering (overall ROC area index Az improved from 0.78 to 0.82). However, Gabor filtering improved performance for the high sensitivity portion of the ROC curve where a typical false positive reduction scheme should operate (partial ROC area index 0.90Az improved from 0.33 to 0.37). Fusion of IT-CAD decisions from different filtering schemes markedly improved performance (Az=0.90 and 0.90Az=0.55). At 95% sensitivity, the system’s specificity improved by 36.6%. Conclusion Additional improvement in false positive reduction can be achieved by incorporating image filtering as a preprocessing step in our information-theoretic CAD system. PMID:18423320

  12. Forecasting of preprocessed daily solar radiation time series using neural networks

    SciTech Connect

    Paoli, Christophe; Muselli, Marc; Nivet, Marie-Laure; Voyant, Cyril

    2010-12-15

    In this paper, we present an application of Artificial Neural Networks (ANNs) in the renewable energy domain. We particularly look at the Multi-Layer Perceptron (MLP) network which has been the most used of ANNs architectures both in the renewable energy domain and in the time series forecasting. We have used a MLP and an ad hoc time series pre-processing to develop a methodology for the daily prediction of global solar radiation on a horizontal surface. First results are promising with nRMSE {proportional_to} 21% and RMSE {proportional_to} 3.59 MJ/m{sup 2}. The optimized MLP presents predictions similar to or even better than conventional and reference methods such as ARIMA techniques, Bayesian inference, Markov chains and k-Nearest-Neighbors. Moreover we found that the data pre-processing approach proposed can reduce significantly forecasting errors of about 6% compared to conventional prediction methods such as Markov chains or Bayesian inference. The simulator proposed has been obtained using 19 years of available data from the meteorological station of Ajaccio (Corsica Island, France, 41 55'N, 8 44'E, 4 m above mean sea level). The predicted whole methodology has been validated on a 1.175 kWc mono-Si PV power grid. Six prediction methods (ANN, clear sky model, combination..) allow to predict the best daily DC PV power production at horizon d + 1. The cumulated DC PV energy on a 6-months period shows a great agreement between simulated and measured data (R{sup 2} > 0.99 and nRMSE < 2%). (author)

  13. A base composition analysis of natural patterns for the preprocessing of metagenome sequences

    PubMed Central

    2013-01-01

    Background On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects. Results Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy. Conclusions We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms

  14. Chang'E-3 data pre-processing system based on scientific workflow

    NASA Astrophysics Data System (ADS)

    tan, xu; liu, jianjun; wang, yuanyuan; yan, wei; zhang, xiaoxia; li, chunlai

    2016-04-01

    The Chang'E-3(CE3) mission have obtained a huge amount of lunar scientific data. Data pre-processing is an important segment of CE3 ground research and application system. With a dramatic increase in the demand of data research and application, Chang'E-3 data pre-processing system(CEDPS) based on scientific workflow is proposed for the purpose of making scientists more flexible and productive by automating data-driven. The system should allow the planning, conduct and control of the data processing procedure with the following possibilities: • describe a data processing task, include:1)define input data/output data, 2)define the data relationship, 3)define the sequence of tasks,4)define the communication between tasks,5)define mathematical formula, 6)define the relationship between task and data. • automatic processing of tasks. Accordingly, Describing a task is the key point whether the system is flexible. We design a workflow designer which is a visual environment for capturing processes as workflows, the three-level model for the workflow designer is discussed:1) The data relationship is established through product tree.2)The process model is constructed based on directed acyclic graph(DAG). Especially, a set of process workflow constructs, including Sequence, Loop, Merge, Fork are compositional one with another.3)To reduce the modeling complexity of the mathematical formulas using DAG, semantic modeling based on MathML is approached. On top of that, we will present how processed the CE3 data with CEDPS.

  15. An automated blood vessel segmentation algorithm using histogram equalization and automatic threshold selection.

    PubMed

    Saleh, Marwan D; Eswaran, C; Mueen, Ahmed

    2011-08-01

    This paper focuses on the detection of retinal blood vessels which play a vital role in reducing the proliferative diabetic retinopathy and for preventing the loss of visual capability. The proposed algorithm which takes advantage of the powerful preprocessing techniques such as the contrast enhancement and thresholding offers an automated segmentation procedure for retinal blood vessels. To evaluate the performance of the new algorithm, experiments are conducted on 40 images collected from DRIVE database. The results show that the proposed algorithm performs better than the other known algorithms in terms of accuracy. Furthermore, the proposed algorithm being simple and easy to implement, is best suited for fast processing applications.

  16. Experimental Approaches to Microarray Analysis of Tumor Samples

    ERIC Educational Resources Information Center

    Furge, Laura Lowe; Winter, Michael B.; Meyers, Jacob I.; Furge, Kyle A.

    2008-01-01

    Comprehensive measurement of gene expression using high-density nucleic acid arrays (i.e. microarrays) has become an important tool for investigating the molecular differences in clinical and research samples. Consequently, inclusion of discussion in biochemistry, molecular biology, or other appropriate courses of microarray technologies has…

  17. Demonstrating a Multi-drug Resistant Mycobacterium tuberculosis Amplification Microarray

    PubMed Central

    Linger, Yvonne; Kukhtin, Alexander; Golova, Julia; Perov, Alexander; Qu, Peter; Knickerbocker, Christopher; Cooney, Christopher G.; Chandler, Darrell P.

    2014-01-01

    Simplifying microarray workflow is a necessary first step for creating MDR-TB microarray-based diagnostics that can be routinely used in lower-resource environments. An amplification microarray combines asymmetric PCR amplification, target size selection, target labeling, and microarray hybridization within a single solution and into a single microfluidic chamber. A batch processing method is demonstrated with a 9-plex asymmetric master mix and low-density gel element microarray for genotyping multi-drug resistant Mycobacterium tuberculosis (MDR-TB). The protocol described here can be completed in 6 hr and provide correct genotyping with at least 1,000 cell equivalents of genomic DNA. Incorporating on-chip wash steps is feasible, which will result in an entirely closed amplicon method and system. The extent of multiplexing with an amplification microarray is ultimately constrained by the number of primer pairs that can be combined into a single master mix and still achieve desired sensitivity and specificity performance metrics, rather than the number of probes that are immobilized on the array. Likewise, the total analysis time can be shortened or lengthened depending on the specific intended use, research question, and desired limits of detection. Nevertheless, the general approach significantly streamlines microarray workflow for the end user by reducing the number of manually intensive and time-consuming processing steps, and provides a simplified biochemical and microfluidic path for translating microarray-based diagnostics into routine clinical practice. PMID:24796567

  18. Demonstrating a multi-drug resistant Mycobacterium tuberculosis amplification microarray.

    PubMed

    Linger, Yvonne; Kukhtin, Alexander; Golova, Julia; Perov, Alexander; Qu, Peter; Knickerbocker, Christopher; Cooney, Christopher G; Chandler, Darrell P

    2014-04-25

    Simplifying microarray workflow is a necessary first step for creating MDR-TB microarray-based diagnostics that can be routinely used in lower-resource environments. An amplification microarray combines asymmetric PCR amplification, target size selection, target labeling, and microarray hybridization within a single solution and into a single microfluidic chamber. A batch processing method is demonstrated with a 9-plex asymmetric master mix and low-density gel element microarray for genotyping multi-drug resistant Mycobacterium tuberculosis (MDR-TB). The protocol described here can be completed in 6 hr and provide correct genotyping with at least 1,000 cell equivalents of genomic DNA. Incorporating on-chip wash steps is feasible, which will result in an entirely closed amplicon method and system. The extent of multiplexing with an amplification microarray is ultimately constrained by the number of primer pairs that can be combined into a single master mix and still achieve desired sensitivity and specificity performance metrics, rather than the number of probes that are immobilized on the array. Likewise, the total analysis time can be shortened or lengthened depending on the specific intended use, research question, and desired limits of detection. Nevertheless, the general approach significantly streamlines microarray workflow for the end user by reducing the number of manually intensive and time-consuming processing steps, and provides a simplified biochemical and microfluidic path for translating microarray-based diagnostics into routine clinical practice.

  19. The Importance of Normalization on Large and Heterogeneous Microarray Datasets

    EPA Science Inventory

    DNA microarray technology is a powerful functional genomics tool increasingly used for investigating global gene expression in environmental studies. Microarrays can also be used in identifying biological networks, as they give insight on the complex gene-to-gene interactions, ne...

  20. Diagnostic biomarkers for renal cell carcinoma: selection using novel bioinformatics systems for microarray data analysis

    PubMed Central

    Osunkoya, Adeboye O; Yin-Goen, Qiqin; Phan, John H; Moffitt, Richard A; Stokes, Todd H; Wang, May D; Young, Andrew N

    2009-01-01

    Summary The differential diagnosis of clear cell, papillary and chromophobe renal cell carcinoma is clinically important, because these tumor subtypes are associated with different pathobiology and clinical behavior. For cases in which histopathology is equivocal, immunohistochemistry and quantitative RT-PCR can assist in the differential diagnosis by measuring expression of subtype-specific biomarkers. Several renal tumor biomarkers have been discovered in expression microarray studies. However, due to heterogeneity of gene and protein expression, additional biomarkers are needed for reliable diagnostic classification. We developed novel bioinformatics systems to identify candidate renal tumor biomarkers from the microarray profiles of 45 clear cell, 16 papillary and 10 chromophobe renal cell carcinoma; the microarray data was derived from two independent published studies. The ArrayWiki biocomputing system merged the microarray datasets into a single file, so gene expression could be analyzed from a larger number of tumors. The caCORRECT system removed non-random sources of error from the microarray data, and the omniBioMarker system analyzed data with several gene-ranking algorithms, in order to identify algorithms effective at recognizing previously described renal tumor biomarkers. We predicted these algorithms would also be effective at identifying unknown biomarkers that could be verified by independent methods. We selected six novel candidate biomakers from the omniBioMarker analysis, and verified their differential expression in formalin-fixed paraffin-embedded tissues by quantitative RT-PCR and immunohistochemistry. The candidate biomarkers were carbonic anhydrase IX, ceruloplasmin, schwannomin-interacting protein 1, E74-like factor 3, cytochrome c oxidase subunit 5a and acetyl-CoA acetyltransferase 1. Quantitative RT-PCR was performed on 17 clear cell, 13 papillary and 7 chromophobe renal cell carcinoma. Carbonic anhydrase IX and ceruloplasmin were

  1. Digital microarray analysis for digital artifact genomics

    NASA Astrophysics Data System (ADS)

    Jaenisch, Holger; Handley, James; Williams, Deborah

    2013-06-01

    We implement a Spatial Voting (SV) based analogy of microarray analysis for digital gene marker identification in malware code sections. We examine a famous set of malware formally analyzed by Mandiant and code named Advanced Persistent Threat (APT1). APT1 is a Chinese organization formed with specific intent to infiltrate and exploit US resources. Manidant provided a detailed behavior and sting analysis report for the 288 malware samples available. We performed an independent analysis using a new alternative to the traditional dynamic analysis and static analysis we call Spatial Analysis (SA). We perform unsupervised SA on the APT1 originating malware code sections and report our findings. We also show the results of SA performed on some members of the families associated by Manidant. We conclude that SV based SA is a practical fast alternative to dynamics analysis and static analysis.

  2. Giant Magnetoresistive Sensors for DNA Microarray

    PubMed Central

    Xu, Liang; Yu, Heng; Han, Shu-Jen; Osterfeld, Sebastian; White, Robert L.; Pourmand, Nader; Wang, Shan X.

    2009-01-01

    Giant magnetoresistive (GMR) sensors are developed for a DNA microarray. Compared with the conventional fluorescent sensors, GMR sensors are cheaper, more sensitive, can generate fully electronic signals, and can be easily integrated with electronics and microfluidics. The GMR sensor used in this work has a bottom spin valve structure with an MR ratio of 12%. The single-strand target DNA detected has a length of 20 bases. Assays with DNA concentrations down to 10 pM were performed, with a dynamic range of 3 logs. A double modulation technique was used in signal detection to reduce the 1/f noise in the sensor while circumventing electromagnetic interference. The logarithmic relationship between the magnetic signal and the target DNA concentration can be described by the Temkin isotherm. Furthermore, GMR sensors integrated with microfluidics has great potential of improving the sensitivity to 1 pM or below, and the total assay time can be reduced to less than 1 hour. PMID:20824116

  3. Uses of Dendrimers for DNA Microarrays

    PubMed Central

    Caminade, Anne-Marie; Padié, Clément; Laurent, Régis; Maraval, Alexandrine; Majoral, Jean-Pierre

    2006-01-01

    Biosensors such as DNA microarrays and microchips are gaining an increasing importance in medicinal, forensic, and environmental analyses. Such devices are based on the detection of supramolecular interactions called hybridizations that occur between complementary oligonucleotides, one linked to a solid surface (the probe), and the other one to be analyzed (the target). This paper focuses on the improvements that hyperbranched and perfectly defined nanomolecules called dendrimers can provide to this methodology. Two main uses of dendrimers for such purpose have been described up to now; either the dendrimer is used as linker between the solid surface and the probe oligonucleotide, or the dendrimer is used as a multilabeled entity linked to the target oligonucleotide. In the first case the dendrimer generally induces a higher loading of probes and an easier hybridization, due to moving away the solid phase. In the second case the high number of localized labels (generally fluorescent) induces an increased sensitivity, allowing the detection of small quantities of biological entities.

  4. Meta-analysis of incomplete microarray studies.

    PubMed

    Zollinger, Alix; Davison, Anthony C; Goldstein, Darlene R

    2015-10-01

    Meta-analysis of microarray studies to produce an overall gene list is relatively straightforward when complete data are available. When some studies lack information-providing only a ranked list of genes, for example-it is common to reduce all studies to ranked lists prior to combining them. Since this entails a loss of information, we consider a hierarchical Bayes approach to meta-analysis using different types of information from different studies: the full data matrix, summary statistics, or ranks. The model uses an informative prior for the parameter of interest to aid the detection of differentially expressed genes. Simulations show that the new approach can give substantial power gains compared with classical meta-analysis and list aggregation methods. A meta-analysis of 11 published studies with different data types identifies genes known to be involved in ovarian cancer and shows significant enrichment.

  5. Software and tools for microarray data analysis.

    PubMed

    Mehta, Jai Prakash; Rani, Sweta

    2011-01-01

    A typical microarray experiment results in series of images, depending on the experimental design and number of samples. Software analyses the images to obtain the intensity at each spot and quantify the expression for each transcript. This is followed by normalization, and then various data analysis techniques are applied on the data. The whole analysis pipeline requires a large number of software to accurately handle the massive amount of data. Fortunately, there are large number of freely available and commercial software to churn the massive amount of data to manageable sets of differentially expressed genes, functions, and pathways. This chapter describes the software and tools which can be used to analyze the gene expression data right from the image analysis to gene list, ontology, and pathways.

  6. Protein microarray applications: Autoantibody detection and posttranslational modification.

    PubMed

    Atak, Apurva; Mukherjee, Shuvolina; Jain, Rekha; Gupta, Shabarni; Singh, Vedita Anand; Gahoi, Nikita; K P, Manubhai; Srivastava, Sanjeeva

    2016-10-01

    The discovery of DNA microarrays was a major milestone in genomics; however, it could not adequately predict the structure or dynamics of underlying protein entities, which are the ultimate effector molecules in a cell. Protein microarrays allow simultaneous study of thousands of proteins/peptides, and various advancements in array technologies have made this platform suitable for several diagnostic and functional studies. Antibody arrays enable researchers to quantify the abundance of target proteins in biological fluids and assess PTMs by using the antibodies. Protein microarrays have been used to assess protein-protein interactions, protein-ligand interactions, and autoantibody profiling in various disease conditions. Here, we summarize different microarray platforms with focus on its biological and clinical applications in autoantibody profiling and PTM studies. We also enumerate the potential of tissue microarrays to validate findings from protein arrays as well as other approaches, highlighting their significance in proteomics.

  7. DNA microarray-based mutation discovery and genotyping.

    PubMed

    Gresham, David

    2011-01-01

    DNA microarrays provide an efficient means of identifying single-nucleotide polymorphisms (SNPs) in DNA samples and characterizing their frequencies in individual and mixed samples. We have studied the parameters that determine the sensitivity of DNA probes to SNPs and found that the melting temperature (T (m)) of the probe is the primary determinant of probe sensitivity. An isothermal-melting temperature DNA microarray design, in which the T (m) of all probes is tightly distributed, can be implemented by varying the length of DNA probes within a single DNA microarray. I describe guidelines for designing isothermal-melting temperature DNA microarrays and protocols for labeling and hybridizing DNA samples to DNA microarrays for SNP discovery, genotyping, and quantitative determination of allele frequencies in mixed samples.

  8. Nonspecific hybridization scaling of microarray expression estimates: a physicochemical approach for chip-to-chip normalization.

    PubMed

    Binder, Hans; Brücker, Jan; Burden, Conrad J

    2009-03-05

    nonspecific background, which effectively amplifies specific binding. The results emphasize the importance of physicochemical approaches for improving heuristic normalization algorithms to proceed toward quantitative microarray data analysis.

  9. Fluorescence detection in (sub-)nanoliter microarrays

    NASA Astrophysics Data System (ADS)

    van den Doel, L. Richard; Vellekoop, Michael J.; Sarro, Pasqualina M.; Picioreanu, S.; Moerman, R.; Frank, J.; van Dedem, G. W. K.; Hjelt, Kari H.; van Vliet, Lucas J.; Young, Ian T.

    1999-06-01

    The goal of our TU Delft interfaculty research program is to develop intelligent molecular diagnostic systems (IMDS) that can analyze liquid samples that contain a variety of biochemical compounds such as those associated with fermentation processes. One specific project within the IMDS program focuses on photon sensors. In order to analyze the liquid samples we use dedicated microarrays. At this stage, these are basically miniaturized micro titre plates. Typical dimensions of a vial are 200 X 200 X 20 micrometer3. These dimensions may be varied and the shape of the vials can be modified with a result that the volume of the vials varies from 0.5 to 1.6 nl. For all experiments, we have used vials with the shape of a truncated pyramid. These vials are fabricated in silicon by a wet etching process. For testing purposes the vials are filled with rhodamine solutions of various concentrations. To avoid evaporation glycerol-water (1:1, v/v) with a viscosity of 8.3 times the viscosity of water is used as solvent. We aim at wide field-of-view imaging at the expense of absolute sensitivity: the field-of-view increases quadratically with decreasing magnification. Small magnification, however, implies low Numerical Aperture (NA). The ability of a microscope objective to collect photons is proportional to the square of the NA. To image the entire microarray we have used an epi-illumination fluorescence microscope equipped with a low magnification (2.5 X/0.075) objective and a scientific CCD camera to integrate the photons emitted from the fluorescing particles in the solutions in the vials. From these experiments we found that for this setup the detection limit is on the order of micromolar concentrations of fluorescing particles. This translates to 108 molecules per vial.

  10. Lipid Microarray Biosensor for Biotoxin Detection.

    SciTech Connect

    Singh, Anup K.; Throckmorton, Daniel J.; Moran-Mirabal, Jose C.; Edel, Joshua B.; Meyer, Grant D.; Craighead, Harold G.

    2006-05-01

    We present the use of micron-sized lipid domains, patterned onto planar substrates and within microfluidic channels, to assay the binding of bacterial toxins via total internal reflection fluorescence microscopy (TIRFM). The lipid domains were patterned using a polymer lift-off technique and consisted of ganglioside-populated DSPC:cholesterol supported lipid bilayers (SLBs). Lipid patterns were formed on the substrates by vesicle fusion followed by polymer lift-off, which revealed micron-sized SLBs containing either ganglioside GT1b or GM1. The ganglioside-populated SLB arrays were then exposed to either Cholera toxin subunit B (CTB) or Tetanus toxin fragment C (TTC). Binding was assayed on planar substrates by TIRFM down to 1 nM concentration for CTB and 100 nM for TTC. Apparent binding constants extracted from three different models applied to the binding curves suggest that binding of a protein to a lipid-based receptor is strongly affected by the lipid composition of the SLB and by the substrate on which the bilayer is formed. Patterning of SLBs inside microfluidic channels also allowed the preparation of lipid domains with different compositions on a single device. Arrays within microfluidic channels were used to achieve segregation and selective binding from a binary mixture of the toxin fragments in one device. The binding and segregation within the microfluidic channels was assayed with epifluorescence as proof of concept. We propose that the method used for patterning the lipid microarrays on planar substrates and within microfluidic channels can be easily adapted to proteins or nucleic acids and can be used for biosensor applications and cell stimulation assays under different flow conditions. KEYWORDS. Microarray, ganglioside, polymer lift-off, cholera toxin, tetanus toxin, TIRFM, binding constant.4

  11. The recursive combination filter approach of pre-processing for the estimation of standard deviation of RR series.

    PubMed

    Mishra, Alok; Swati, D

    2015-09-01

    Variation in the interval between the R-R peaks of the electrocardiogram represents the modulation of the cardiac oscillations by the autonomic nervous system. This variation is contaminated by anomalous signals called ectopic beats, artefacts or noise which mask the true behaviour of heart rate variability. In this paper, we have proposed a combination filter of recursive impulse rejection filter and recursive 20% filter, with recursive application and preference of replacement over removal of abnormal beats to improve the pre-processing of the inter-beat intervals. We have tested this novel recursive combinational method with median method replacement to estimate the standard deviation of normal to normal (SDNN) beat intervals of congestive heart failure (CHF) and normal sinus rhythm subjects. This work discusses the improvement in pre-processing over single use of impulse rejection filter and removal of abnormal beats for heart rate variability for the estimation of SDNN and Poncaré plot descriptors (SD1, SD2, and SD1/SD2) in detail. We have found the 22 ms value of SDNN and 36 ms value of SD2 descriptor of Poincaré plot as clinical indicators in discriminating the normal cases from CHF cases. The pre-processing is also useful in calculation of Lyapunov exponent which is a nonlinear index as Lyapunov exponents calculated after proposed pre-processing modified in a way that it start following the notion of less complex behaviour of diseased states.

  12. Genetic algorithms

    NASA Technical Reports Server (NTRS)

    Wang, Lui; Bayer, Steven E.

    1991-01-01

    Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.

  13. Prognosis classification in glioblastoma multiforme using multimodal MRI derived heterogeneity textural features: impact of pre-processing choices

    NASA Astrophysics Data System (ADS)

    Upadhaya, Taman; Morvan, Yannick; Stindel, Eric; Le Reste, Pierre-Jean; Hatt, Mathieu

    2016-03-01

    Heterogeneity image-derived features of Glioblastoma multiforme (GBM) tumors from multimodal MRI sequences may provide higher prognostic value than standard parameters used in routine clinical practice. We previously developed a framework for automatic extraction and combination of image-derived features (also called "Radiomics") through support vector machines (SVM) for predictive model building. The results we obtained in a cohort of 40 GBM suggested these features could be used to identify patients with poorer outcome. However, extraction of these features is a delicate multi-step process and their values may therefore depend on the pre-processing of images. The original developed workflow included skull removal, bias homogeneity correction, and multimodal tumor segmentation, followed by textural features computation, and lastly ranking, selection and combination through a SVM-based classifier. The goal of the present work was to specifically investigate the potential benefit and respective impact of the addition of several MRI pre-processing steps (spatial resampling for isotropic voxels, intensities quantization and normalization) before textural features computation, on the resulting accuracy of the classifier. Eighteen patients datasets were also added for the present work (58 patients in total). A classification accuracy of 83% (sensitivity 79%, specificity 85%) was obtained using the original framework. The addition of the new pre-processing steps increased it to 93% (sensitivity 93%, specificity 93%) in identifying patients with poorer survival (below the median of 12 months). Among the three considered pre-processing steps, spatial resampling was found to have the most important impact. This shows the crucial importance of investigating appropriate image pre-processing steps to be used for methodologies based on textural features extraction in medical imaging.

  14. Scientific data products and the data pre-processing subsystem of the Chang'e-3 mission

    NASA Astrophysics Data System (ADS)

    Tan, Xu; Liu, Jian-Jun; Li, Chun-Lai; Feng, Jian-Qing; Ren, Xin; Wang, Fen-Fei; Yan, Wei; Zuo, Wei; Wang, Xiao-Qian; Zhang, Zhou-Bin

    2014-12-01

    The Chang'e-3 (CE-3) mission is China's first exploration mission on the surface of the Moon that uses a lander and a rover. Eight instruments that form the scientific payloads have the following objectives: (1) investigate the morphological features and geological structures at the landing site; (2) integrated in-situ analysis of minerals and chemical compositions; (3) integrated exploration of the structure of the lunar interior; (4) exploration of the lunar-terrestrial space environment, lunar surface environment and acquire Moon-based ultraviolet astronomical observations. The Ground Research and Application System (GRAS) is in charge of data acquisition and pre-processing, management of the payload in orbit, and managing the data products and their applications. The Data Pre-processing Subsystem (DPS) is a part of GRAS. The task of DPS is the pre-processing of raw data from the eight instruments that are part of CE-3, including channel processing, unpacking, package sorting, calibration and correction, identification of geographical location, calculation of probe azimuth angle, probe zenith angle, solar azimuth angle, and solar zenith angle and so on, and conducting quality checks. These processes produce Level 0, Level 1 and Level 2 data. The computing platform of this subsystem is comprised of a high-performance computing cluster, including a real-time subsystem used for processing Level 0 data and a post-time subsystem for generating Level 1 and Level 2 data. This paper describes the CE-3 data pre-processing method, the data pre-processing subsystem, data classification, data validity and data products that are used for scientific studies.

  15. Time-Frequency Analysis of Peptide Microarray Data: Application to Brain Cancer Immunosignatures

    PubMed Central

    O’Donnell, Brian; Maurer, Alexander; Papandreou-Suppappola, Antonia; Stafford, Phillip

    2015-01-01

    One of the gravest dangers facing cancer patients is an extended symptom-free lull between tumor initiation and the first diagnosis. Detection of tumors is critical for effective intervention. Using the body’s immune system to detect and amplify tumor-specific signals may enable detection of cancer using an inexpensive immunoassay. Immunosignatures are one such assay: they provide a map of antibody interactions with random-sequence peptides. They enable detection of disease-specific patterns using classic train/test methods. However, to date, very little effort has gone into extracting information from the sequence of peptides that interact with disease-specific antibodies. Because it is difficult to represent all possible antigen peptides in a microarray format, we chose to synthesize only 330,000 peptides on a single immunosignature microarray. The 330,000 random-sequence peptides on the microarray represent 83% of all tetramers and 27% of all pentamers, creating an unbiased but substantial gap in the coverage of total sequence space. We therefore chose to examine many relatively short motifs from these random-sequence peptides. Time-variant analysis of recurrent subsequences provided a means to dissect amino acid sequences from the peptides while simultaneously retaining the antibody–peptide binding intensities. We first used a simple experiment in which monoclonal antibodies with known linear epitopes were exposed to these random-sequence peptides, and their binding intensities were used to create our algorithm. We then demonstrated the performance of the proposed algorithm by examining immunosignatures from patients with Glioblastoma multiformae (GBM), an aggressive form of brain cancer. Eight different frameshift targets were identified from the random-sequence peptides using this technique. If immune-reactive antigens can be identified using a relatively simple immune assay, it might enable a diagnostic test with sufficient sensitivity to detect tumors

  16. The application of protein microarray assays in psychoneuroimmunology.

    PubMed

    Ayling, K; Bowden, T; Tighe, P; Todd, I; Dilnot, E M; Negm, O H; Fairclough, L; Vedhara, K

    2017-01-01

    Protein microarrays are miniaturized multiplex assays that exhibit many advantages over the commonly used enzyme-linked immunosorbent assay (ELISA). This article aims to introduce protein microarrays to readers of Brain, Behavior, and Immunity and demonstrate its utility and validity for use in psychoneuroimmunological research. As part of an ongoing investigation of psychological and behavioral influences on influenza vaccination responses, we optimized a novel protein microarray to quantify influenza-specific antibody levels in human sera. Reproducibility was assessed by calculating intra- and inter-assay coefficients of variance on serially diluted human IgG concentrations. A random selection of samples was analyzed by microarray and ELISA to establish validity of the assay. For IgG concentrations, intra-assay and inter-assay precision profiles demonstrated a mean coefficient of variance of 6.7% and 11.5% respectively. Significant correlations were observed between microarray and ELISA for all antigens, demonstrating the microarray is a valid alternative to ELISA. Protein microarrays are a highly robust, novel assay method that could be of significant benefit for researchers working in psychoneuroimmunology. They offer high throughput, fewer resources per analyte and can examine concurrent neuro-immune-endocrine mechanisms.

  17. Unimodal transform of variables selected by interval segmentation purity for classification tree modeling of high-dimensional microarray data.

    PubMed

    Du, Wen; Gu, Ting; Tang, Li-Juan; Jiang, Jian-Hui; Wu, Hai-Long; Shen, Guo-Li; Yu, Ru-Qin

    2011-09-15

    As a greedy search algorithm, classification and regression tree (CART) is easily relapsing into overfitting while modeling microarray gene expression data. A straightforward solution is to filter irrelevant genes via identifying significant ones. Considering some significant genes with multi-modal expression patterns exhibiting systematic difference in within-class samples are difficult to be identified by existing methods, a strategy that unimodal transform of variables selected by interval segmentation purity (UTISP) for CART modeling is proposed. First, significant genes exhibiting varied expression patterns can be properly identified by a variable selection method based on interval segmentation purity. Then, unimodal transform is implemented to offer unimodal featured variables for CART modeling via feature extraction. Because significant genes with complex expression patterns can be properly identified and unimodal feature extracted in advance, this developed strategy potentially improves the performance of CART in combating overfitting or underfitting while modeling microarray data. The developed strategy is demonstrated using two microarray data sets. The results reveal that UTISP-based CART provides superior performance to k-nearest neighbors or CARTs coupled with other gene identifying strategies, indicating UTISP-based CART holds great promise for microarray data analysis.

  18. Segment and fit thresholding: a new method for image analysis applied to microarray and immunofluorescence data.

    PubMed

    Ensink, Elliot; Sinha, Jessica; Sinha, Arkadeep; Tang, Huiyuan; Calderone, Heather M; Hostetter, Galen; Winter, Jordan; Cherba, David; Brand, Randall E; Allen, Peter J; Sempere, Lorenzo F; Haab, Brian B

    2015-10-06

    Experiments involving the high-throughput quantification of image data require algorithms for automation. A challenge in the development of such algorithms is to properly interpret signals over a broad range of image characteristics, without the need for manual adjustment of parameters. Here we present a new approach for locating signals in image data, called Segment and Fit Thresholding (SFT). The method assesses statistical characteristics of small segments of the image and determines the best-fit trends between the statistics. Based on the relationships, SFT identifies segments belonging to background regions; analyzes the background to determine optimal thresholds; and analyzes all segments to identify signal pixels. We optimized the initial settings for locating background and signal in antibody microarray and immunofluorescence data and found that SFT performed well over multiple, diverse image characteristics without readjustment of settings. When used for the automated analysis of multicolor, tissue-microarray images, SFT correctly found the overlap of markers with known subcellular localization, and it performed better than a fixed threshold and Otsu's method for selected images. SFT promises to advance the goal of full automation in image analysis.

  19. Imaging combined autoimmune and infectious disease microarrays

    NASA Astrophysics Data System (ADS)

    Ewart, Tom; Raha, Sandeep; Kus, Dorothy; Tarnopolsky, Mark

    2006-09-01

    Bacterial and viral pathogens are implicated in many severe autoimmune diseases, acting through such mechanisms as molecular mimicry, and superantigen activation of T-cells. For example, Helicobacter pylori, well known cause of stomach ulcers and cancers, is also identified in ischaemic heart disease (mimicry of heat shock protein 65), autoimmune pancreatitis, systemic sclerosis, autoimmune thyroiditis (HLA DRB1*0301 allele susceptibility), and Crohn's disease. Successful antibiotic eradication of H.pylori often accompanies their remission. Yet current diagnostic devices, and test-limiting cost containment, impede recognition of the linkage, delaying both diagnosis and therapeutic intervention until the chronic debilitating stage. We designed a 15 minute low cost 39 antigen microarray assay, combining autoimmune, viral and bacterial antigens1. This enables point-of-care serodiagnosis and cost-effective narrowly targeted concurrent antibiotic and monoclonal anti-T-cell and anti-cytokine immunotherapy. Arrays of 26 pathogen and 13 autoimmune antigens with IgG and IgM dilution series were printed in triplicate on epoxysilane covalent binding slides with Teflon well masks. Sera diluted 1:20 were incubated 10 minutes, washed off, anti-IgG-Cy3 (green) and anti-IgM-Dy647 (red) were incubated for 5 minutes, washed off and the slide was read in an ArrayWoRx(e) scanning CCD imager (Applied Precision, Issaquah, WA). As a preliminary model for the combined infectious disease-autoimmune diagnostic microarray we surveyed 98 unidentified, outdated sera that were discarded after Hepatitis B antibody testing. In these, significant IgG or IgM autoantibody levels were found: dsDNA 5, ssDNA 11, Ro 2, RNP 7, SSB 4, gliadin 2, thyroglobulin 13 cases. Since control sera showed no autoantibodies, the high frequency of anti-DNA and anti-thyroglobulin antibodies found in infected sera lend increased support for linkage of infection to subsequent autoimmune disease. Expansion of the antigen

  20. Protein Microarrays with Novel Microfluidic Methods: Current Advances.

    PubMed

    Dixit, Chandra K; Aguirre, Gerson R

    2014-07-01

    Microfluidic-based micromosaic technology has allowed the pattering of recognition elements in restricted micrometer scale areas with high precision. This controlled patterning enabled the development of highly multiplexed arrays multiple analyte detection. This arraying technology was first introduced in the beginning of 2001 and holds tremendous potential to revolutionize microarray development and analyte detection. Later, several microfluidic methods were developed for microarray application. In this review we discuss these novel methods and approaches which leverage the property of microfluidic technologies to significantly improve various physical aspects of microarray technology, such as enhanced imprinting homogeneity, stability of the immobilized biomolecules, decreasing assay times, and reduction of the costs and of the bulky instrumentation.

  1. Deciphering the glycosaminoglycan code with the help of microarrays.

    PubMed

    de Paz, Jose L; Seeberger, Peter H

    2008-07-01

    Carbohydrate microarrays have become a powerful tool to elucidate the biological role of complex sugars. Microarrays are particularly useful for the study of glycosaminoglycans (GAGs), a key class of carbohydrates. The high-throughput chip format enables rapid screening of large numbers of potential GAG sequences produced via a complex biosynthesis while consuming very little sample. Here, we briefly highlight the most recent advances involving GAG microarrays built with synthetic or naturally derived oligosaccharides. These chips are powerful tools for characterizing GAG-protein interactions and determining structure-activity relationships for specific sequences. Thereby, they contribute to decoding the information contained in specific GAG sequences.

  2. Effect of preprocessing olive storage conditions on virgin olive oil quality and composition.

    PubMed

    Inarejos-García, Antonio M; Gómez-Rico, Aurora; Desamparados Salvador, M; Fregapane, Giuseppe

    2010-04-28

    The quality of virgin olive oil (VOO) is intimately related to the characteristics and composition of the olive fruit at the moment of its milling. In this study, the determination of suitable olive storage conditions and feasibility of using this preprocessing operation to modulate the sensory taste of VOO are reported. Several olive batches were stored in different conditions (from monolayer up to 60 cm thickness, at 20 and 10 degrees C) for a period of up to three weeks, and the quality and composition of minor constituents, mainly phenols and volatiles, in the corresponding VOO were monitored. Cornicabra cultivar VOO obtained from drupes stored for 5 or 8 days at 20 or 10 degrees C, respectively, retained the "extra virgin" category, according to chemical quality indices, since only small increases in free acidity and peroxide values were observed, and the bitter index of this monovarietal oil was reduced by 30-40%. Storage under monolayer conditions at 10 degrees C for up to two weeks is also feasible because "off-odor" development was delayed, a 50% reduction in bitterness was obtained, and the overall good quality of the final product was preserved.

  3. Tools and Databases of the KOMICS Web Portal for Preprocessing, Mining, and Dissemination of Metabolomics Data

    PubMed Central

    Enomoto, Mitsuo; Morishita, Yoshihiko; Kurabayashi, Atsushi; Iijima, Yoko; Ogata, Yoshiyuki; Nakajima, Daisuke; Suzuki, Hideyuki; Shibata, Daisuke

    2014-01-01

    A metabolome—the collection of comprehensive quantitative data on metabolites in an organism—has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data. PMID:24949426

  4. Tools and databases of the KOMICS web portal for preprocessing, mining, and dissemination of metabolomics data.

    PubMed

    Sakurai, Nozomu; Ara, Takeshi; Enomoto, Mitsuo; Motegi, Takeshi; Morishita, Yoshihiko; Kurabayashi, Atsushi; Iijima, Yoko; Ogata, Yoshiyuki; Nakajima, Daisuke; Suzuki, Hideyuki; Shibata, Daisuke

    2014-01-01

    A metabolome--the collection of comprehensive quantitative data on metabolites in an organism--has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data.

  5. Pre-processing in sentence comprehension: Sensitivity to likely upcoming meaning and structure

    PubMed Central

    DeLong, Katherine A.; Troyer, Melissa; Kutas, Marta

    2016-01-01

    For more than a decade, views of sentence comprehension have been shifting toward wider acceptance of a role for linguistic pre-processing—that is, anticipation, expectancy, (neural) pre-activation, or prediction—of upcoming semantic content and syntactic structure. In this survey, we begin by examining the implications of each of these “brands” of predictive comprehension, including the issue of potential costs and consequences to not encountering highly constrained sentence input. We then describe a number of studies (many using online methodologies) that provide results consistent with prospective sensitivity to various grains and levels of semantic and syntactic information, acknowledging that such pre-processing is likely to occur in other linguistic and extralinguistic domains, as well. This review of anticipatory findings also includes some discussion on the relationship of priming to prediction. We conclude with a brief examination of some possible limits to prediction, and with a suggestion for future work to probe whether and how various strands of prediction may integrate during real-time comprehension. PMID:27525035

  6. The effects of physical and chemical preprocessing on the flowability of corn stover

    SciTech Connect

    Crawford, Nathan C.; Nagle, Nick; Sievers, David A.; Stickel, Jonathan J.

    2015-12-20

    Continuous and reliable feeding of biomass is essential for successful biofuel production. However, the challenges associated with biomass solids handling are commonly overlooked. In this study, we examine the effects of preprocessing (particle size reduction, moisture content, chemical additives, etc.) on the flow properties of corn stover. Compressibility, flow properties (interparticle friction, cohesion, unconfined yield stress, etc.), and wall friction were examined for five corn stover samples: ground, milled (dry and wet), acid impregnated, and deacetylated. The ground corn stover was found to be the least compressible and most flowable material. The water and acid impregnated stovers had similar compressibilities. Yet, the wet corn stover was less flowable than the acid impregnated sample, which displayed a flow index equivalent to the dry, milled corn stover. The deacetylated stover, on the other hand, was the most compressible and least flowable examined material. However, all of the tested stover samples had internal friction angles >30°, which could present additional feeding and handling challenges. All of the ''wetted'' materials (water, acid, and deacetylated) displayed reduced flowabilities (excluding the acid impregnated sample), and enhanced compressibilities and wall friction angles, indicating the potential for added handling issues; which was corroborated via theoretical hopper design calculations. All of the ''wetted'' corn stovers require larger theoretical hopper outlet diameters and steeper hopper walls than the examined ''dry'' stovers.

  7. The effects of physical and chemical preprocessing on the flowability of corn stover

    DOE PAGES

    Crawford, Nathan C.; Nagle, Nick; Sievers, David A.; ...

    2015-12-20

    Continuous and reliable feeding of biomass is essential for successful biofuel production. However, the challenges associated with biomass solids handling are commonly overlooked. In this study, we examine the effects of preprocessing (particle size reduction, moisture content, chemical additives, etc.) on the flow properties of corn stover. Compressibility, flow properties (interparticle friction, cohesion, unconfined yield stress, etc.), and wall friction were examined for five corn stover samples: ground, milled (dry and wet), acid impregnated, and deacetylated. The ground corn stover was found to be the least compressible and most flowable material. The water and acid impregnated stovers had similar compressibilities.more » Yet, the wet corn stover was less flowable than the acid impregnated sample, which displayed a flow index equivalent to the dry, milled corn stover. The deacetylated stover, on the other hand, was the most compressible and least flowable examined material. However, all of the tested stover samples had internal friction angles >30°, which could present additional feeding and handling challenges. All of the ''wetted'' materials (water, acid, and deacetylated) displayed reduced flowabilities (excluding the acid impregnated sample), and enhanced compressibilities and wall friction angles, indicating the potential for added handling issues; which was corroborated via theoretical hopper design calculations. All of the ''wetted'' corn stovers require larger theoretical hopper outlet diameters and steeper hopper walls than the examined ''dry'' stovers.« less

  8. The PREP pipeline: standardized preprocessing for large-scale EEG analysis.

    PubMed

    Bigdely-Shamlo, Nima; Mullen, Tim; Kothe, Christian; Su, Kyung-Min; Robbins, Kay A

    2015-01-01

    The technology to collect brain imaging and physiological measures has become portable and ubiquitous, opening the possibility of large-scale analysis of real-world human imaging. By its nature, such data is large and complex, making automated processing essential. This paper shows how lack of attention to the very early stages of an EEG preprocessing pipeline can reduce the signal-to-noise ratio and introduce unwanted artifacts into the data, particularly for computations done in single precision. We demonstrate that ordinary average referencing improves the signal-to-noise ratio, but that noisy channels can contaminate the results. We also show that identification of noisy channels depends on the reference and examine the complex interaction of filtering, noisy channel identification, and referencing. We introduce a multi-stage robust referencing scheme to deal with the noisy channel-reference interaction. We propose a standardized early-stage EEG processing pipeline (PREP) and discuss the application of the pipeline to more than 600 EEG datasets. The pipeline includes an automatically generated report for each dataset processed. Users can download the PREP pipeline as a freely available MATLAB library from http://eegstudy.org/prepcode.

  9. Parafoveal preprocessing of word initial trigrams during reading in adults and children.

    PubMed

    Pagán, Ascensión; Blythe, Hazel I; Liversedge, Simon P

    2016-03-01

    Although previous research has shown that letter position information for the first letter of a parafoveal word is encoded less flexibly than internal word beginning letters (Johnson, Perea & Rayner, 2007; White et al., 2008), it is not clear how positional encoding operates over the initial trigram in English. This experiment explored the preprocessing of letter identity and position information of a parafoveal word's initial trigram by adults and children using the boundary paradigm during normal sentence reading. Seven previews were generated: Identity (captain); transposed letter and substituted letter nonwords in Positions 1 and 2 (acptain-imptain); 1 and 3 (pactain-gartain), and 2 and 3 (cpatain-cgotain). Results showed a transposed letter effect (TLE) in Position 13 for gaze duration in the pretarget word; and TLE in Positions 12 and 23 but not in Position 13 in the target word for both adults and children. These findings suggest that children, similar to adults, extract letter identity and position information flexibly using a spatial coding mechanism; supporting isolated word recognition models such as SOLAR (Davis, 1999, 2010) and SERIOL (Whitney, 2001) models.

  10. Preprocessing: Geocoding of AVIRIS data using navigation, engineering, DEM, and radar tracking system data

    NASA Technical Reports Server (NTRS)

    Meyer, Peter; Larson, Steven A.; Hansen, Earl G.; Itten, Klaus I.

    1993-01-01

    Remotely sensed data have geometric characteristics and representation which depend on the type of the acquisition system used. To correlate such data over large regions with other real world representation tools like conventional maps or Geographic Information System (GIS) for verification purposes, or for further treatment within different data sets, a coregistration has to be performed. In addition to the geometric characteristics of the sensor there are two other dominating factors which affect the geometry: the stability of the platform and the topography. There are two basic approaches for a geometric correction on a pixel-by-pixel basis: (1) A parametric approach using the location of the airplane and inertial navigation system data to simulate the observation geometry; and (2) a non-parametric approach using tie points or ground control points. It is well known that the non-parametric approach is not reliable enough for the unstable flight conditions of airborne systems, and is not satisfying in areas with significant topography, e.g. mountains and hills. The present work describes a parametric preprocessing procedure which corrects effects of flight line and attitude variation as well as topographic influences and is described in more detail by Meyer.

  11. Microarrays (DNA chips) for the classroom laboratory.

    PubMed

    Barnard, Betsy; Sussman, Michael; Bondurant, Sandra Splinter; Nienhuis, James; Krysan, Patrick

    2006-09-01

    We have developed and optimized the necessary laboratory materials to make DNA microarray technology accessible to all high school students at a fraction of both cost and data size. The primary component is a DNA chip/array that students "print" by hand and then analyze using research tools that have been adapted for classroom use. The primary adaptation is the use of a simulated cDNA target. The low density DNA array we discuss here was used to demonstrate differential expression of several Arabidopsis thaliana genes related to photosynthesis and photomorphogenesis. The methods we present here can be used with any biological organism whose sequence is known. Furthermore, these methods can be adapted to exhibit a variety of differential gene expression patterns under different experimental conditions. The materials and tools we discuss have been applied in classrooms at West High School in Madison, WI. We have also shared these materials with high school teachers attending professional development courses at the University of Wisconsin-Madison.

  12. Tissue Microarray Analysis Applied to Bone Diagenesis

    PubMed Central

    Mello, Rafael Barrios; Silva, Maria Regina Regis; Alves, Maria Teresa Seixas; Evison, Martin Paul; Guimarães, Marco Aurelio; Francisco, Rafaella Arrabaca; Astolphi, Rafael Dias; Iwamura, Edna Sadayo Miazato

    2017-01-01

    Taphonomic processes affecting bone post mortem are important in forensic, archaeological and palaeontological investigations. In this study, the application of tissue microarray (TMA) analysis to a sample of femoral bone specimens from 20 exhumed individuals of known period of burial and age at death is described. TMA allows multiplexing of subsamples, permitting standardized comparative analysis of adjacent sections in 3-D and of representative cross-sections of a large number of specimens. Standard hematoxylin and eosin, periodic acid-Schiff and silver methenamine, and picrosirius red staining, and CD31 and CD34 immunohistochemistry were applied to TMA sections. Osteocyte and osteocyte lacuna counts, percent bone matrix loss, and fungal spheroid element counts could be measured and collagen fibre bundles observed in all specimens. Decalcification with 7% nitric acid proceeded more rapidly than with 0.5 M EDTA and may offer better preservation of histological and cellular structure. No endothelial cells could be detected using CD31 and CD34 immunohistochemistry. Correlation between osteocytes per lacuna and age at death may reflect reported age-related responses to microdamage. Methodological limitations and caveats, and results of the TMA analysis of post mortem diagenesis in bone are discussed, and implications for DNA survival and recovery considered. PMID:28051148

  13. Cell microarrays on photochemically modified polytetrafluoroethylene.

    PubMed

    Mikulikova, Regina; Moritz, Sieglinde; Gumpenberger, Thomas; Olbrich, Michael; Romanin, Christoph; Bacakova, Lucie; Svorcik, Vaclav; Heitz, Johannes

    2005-09-01

    We studied the adhesion, proliferation, and viability of human umbilical vein endothelial cells (HUVEC) and human embryonic kidney cells (HEK) on modified spots at polytetrafluoroethylene (PTFE) surfaces. The viability of the cells was assessed using an aqueous non-radioactive cell proliferation assay. Round spots with a diameter of 100 microm were modified by exposure to the ultraviolet (UV) light of a Xe(2)(*)-excimer lamp at a wavelength of 172 nm in an ammonia atmosphere employing a contact mask. The spots were arranged in a quadratic pattern with 300 microm center-to-center spot distances. With optimized degree of modification, the cells adhered to the modified spots with a high degree of selectivity (70-90%). The adhered cells on the spots proliferated. This resulted in a significant increase in the number of adhering HUVECS or HEK cells after seeding and in the formation of confluent cell clusters after 3-4 days. With higher start seeding density, these clusters were not only confined to the modified spots but extended several micrometer to the neighborhood. The high potential of the cell microarrays for gene analysis in living cells was demonstrated with HEK cells transfected by yellow fluorescent protein (YFP).

  14. Antibody microarrays for native toxin detection.

    PubMed

    Rucker, Victor C; Havenstrite, Karen L; Herr, Amy E

    2005-04-15

    We have developed antibody-based microarray techniques for the multiplexed detection of cholera toxin beta-subunit, diphtheria toxin, anthrax lethal factor and protective antigen, Staphylococcus aureus enterotoxin B, and tetanus toxin C fragment in spiked samples. Two detection schemes were investigated: (i) a direct assay in which fluorescently labeled toxins were captured directly by the antibody array and (ii) a competition assay that employed unlabeled toxins as reporters for the quantification of native toxin in solution. In the direct assay, fluorescence measured at each array element is correlated with labeled toxin concentration to yield baseline binding information (Langmuir isotherms and affinity constants). Extending from the direct assay, the competition assay yields information on the presence, identity, and concentration of toxins. A significant advantage of the competition assay over reported profiling assays is the minimal sample preparation required prior to analysis because the competition assay obviates the need to fluorescently label native proteins in the sample of interest. Sigmoidal calibration curves and detection limits were established for both assay formats. Although the sensitivity of the direct assay is superior to that of the competition assay, detection limits for unmodified toxins in the competition assay are comparable to values reported previously for sandwich-format immunoassays of antibodies arrayed on planar substrates. As a demonstration of the potential of the competition assay for unlabeled toxin detection, we conclude with a straightforward multiplexed assay for the differentiation and identification of both native S. aureus enterotoxin B and tetanus toxin C fragment in spiked dilute serum samples.

  15. Microarray analysis of DNA replication timing.

    PubMed

    Karnani, Neerja; Taylor, Christopher M; Dutta, Anindya

    2009-01-01

    Although all of the DNA in an eukaryotic cell replicates during the S-phase of cell cycle, there is a significant difference in the actual time in S-phase when a given chromosomal segment replicates. Methods are described here for generation of high-resolution temporal maps of DNA replication in synchronized human cells. This method does not require amplification of DNA before microarray hybridization and so avoids errors introduced during PCR. A major advantage of using this procedure is that it facilitates finer dissection of replication time in S-phase. Also, it helps delineate chromosomal regions that undergo biallelic or asynchronous replication, which otherwise are difficult to detect at a genome-wide scale by existing methods. The continuous TR50 (time of completion of 50% replication) maps of replication across chromosomal segments identify regions that undergo acute transitions in replication timing. These transition zones can play a significant role in identifying insulators that separate chromosomal domains with different chromatin modifications.

  16. Improved document image segmentation algorithm using multiresolution morphology

    NASA Astrophysics Data System (ADS)

    Bukhari, Syed Saqib; Shafait, Faisal; Breuel, Thomas M.

    2011-01-01

    Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper describes modifications to the text/non-text segmentation algorithm presented by Bloomberg,1 which is also available in his open-source Leptonica library.2The modifications result in significant improvements and achieved better segmentation accuracy than the original algorithm for UW-III, UNLV, ICDAR 2009 page segmentation competition test images and circuit diagram datasets.

  17. A Novel Binarization Algorithm for Ballistics Firearm Identification

    NASA Astrophysics Data System (ADS)

    Li, Dongguang

    The identification of ballistics specimens from imaging systems is of paramount importance in criminal investigation. Binarization plays a key role in preprocess of recognizing cartridges in the ballistic imaging systems. Unfortunately, it is very difficult to get the satisfactory binary image using existing binary algorithms. In this paper, we utilize the global and local thresholds to enhance the image binarization. Importantly, we present a novel criterion for effectively detecting edges in the images. Comprehensive experiments have been conducted over sample ballistic images. The empirical results demonstrate the proposed method can provide a better solution than existing binary algorithms.

  18. Cell-Based Microarrays for In Vitro Toxicology.

    PubMed

    Wegener, Joachim

    2015-01-01

    DNA/RNA and protein microarrays have proven their outstanding bioanalytical performance throughout the past decades, given the unprecedented level of parallelization by which molecular recognition assays can be performed and analyzed. Cell microarrays (CMAs) make use of similar construction principles. They are applied to profile a given cell population with respect to the expression of specific molecular markers and also to measure functional cell responses to drugs and chemicals. This review focuses on the use of cell-based microarrays for assessing the cytotoxicity of drugs, toxins, or chemicals in general. It also summarizes CMA construction principles with respect to the cell types that are used for such microarrays, the readout parameters to assess toxicity, and the various formats that have been established and applied. The review ends with a critical comparison of CMAs and well-established microtiter plate (MTP) approaches.

  19. Cell-Based Microarrays for In Vitro Toxicology

    NASA Astrophysics Data System (ADS)

    Wegener, Joachim

    2015-07-01

    DNA/RNA and protein microarrays have proven their outstanding bioanalytical performance throughout the past decades, given the unprecedented level of parallelization by which molecular recognition assays can be performed and analyzed. Cell microarrays (CMAs) make use of similar construction principles. They are applied to profile a given cell population with respect to the expression of specific molecular markers and also to measure functional cell responses to drugs and chemicals. This review focuses on the use of cell-based microarrays for assessing the cytotoxicity of drugs, toxins, or chemicals in general. It also summarizes CMA construction principles with respect to the cell types that are used for such microarrays, the readout parameters to assess toxicity, and the various formats that have been established and applied. The review ends with a critical comparison of CMAs and well-established microtiter plate (MTP) approaches.

  20. An examination of the regulatory mechanism of Pxdn mutation-induced eye disorders using microarray analysis

    PubMed Central

    YANG, YANG; XING, YIQIAO; LIANG, CHAOQUN; HU, LIYA; XU, FEI; MEI, QI

    2016-01-01

    The present study aimed to identify biomarkers for peroxidasin (Pxdn) mutation-induced eye disorders and study the underlying mechanisms involved in this process. The microarray dataset GSE49704 was used, which encompasses 4 mouse samples from embryos with Pxdn mutation and 4 samples from normal tissues. After data preprocessing, the differentially expressed genes (DEGs) between Pxdn mutation and normal tissues were identified using the t-test in the limma package, followed by functional enrichment analysis. The protein-protein interaction (PPI) network was constructed based on the STRING database, and the transcriptional regulatory (TR) network was established using the GeneCodis database. Subsequently, the overlapping DEGs with high degrees in two networks were identified, as well as the sub-network extracted from the TR network. In total, 121 (75 upregulated and 46 downregulated) DEGs were identified, and these DEGs play important roles in biological processes (BPs), including neuron development and differentiation. A PPI network containing 25 nodes such as actin, alpha 1, skeletal muscle (Acta1) and troponin C type 2 (fast) (Tnnc2), and a TR network including 120 nodes were built. By comparing the two networks, seven crucial genes which overlapped were identified, including cyclin-dependent kinase inhibitor 1B (Cdkn1b), Acta1 and troponin T type 3 (Tnnt3). In the sub-network, Cdkn1b was predicted as the target of miRNAs such as mmu-miR-24 and transcription factors (TFs) including forkhead box O4 (FOXO4) and activating enhancer binding protein 4 (AP4). Thus, we suggest that seven crucial genes, including Cdkn1b, Acta1 and Tnnt3, play important roles in the progression of eye disorders such as glaucoma. We suggest that Cdkn1b exert its effects via the inhibition of proliferation and is mediated by mmu-miR-24 and targeted by the TFs FOXO4 and AP4. PMID:27121343

  1. Screening for key genes associated with atopic dermatitis with DNA microarrays.

    PubMed

    Zhang, Zhong-Kui; Yang, Yong; Bai, Shu-Rong; Zhang, Gui-Zhen; Liu, Tai-Hua; Zhou, Zhou; Wang, Chun-Mei; Tang, Li-Jun; Wang, Jun; He, Si-Xian

    2014-03-01

    The aim of the present study was to identify key genes associated with atopic dermatitis (AD) using microarray data and bioinformatic analyses. The dataset GSE6012, downloaded from the Gene Expression Omnibus (GEO) database, contains gene expression data from 10 AD skin samples and 10 healthy skin samples. Following data preprocessing, differentially expressed genes (DEGs) were identified using the limma package of the R project. Interaction networks were constructed comprising DEGs that showed a degree of node of >3, >5 and >10, using the Osprey software. Functional enrichment and pathway enrichment analysis of the network comprising all DEGs and of the network comprising DEGs with a high degree of node, were performed with the DAVID and WebGestalt toolkits, respectively. A total of 337 DEGs were identified. The functional enrichment analysis revealed that the list of DEGs was significantly enriched for proteins related to epidermis development (P=2.95E-07), including loricrin (LOR), keratin 17 (KRT17), small proline-rich repeat proteins (SPRRs) and involucrin (IVL). The chemokine signaling pathway was the most significantly enriched pathway (P=0.0490978) in the network of all DEGs and in the network consisting of high degree‑node DEGs (>10), which comprised the genes coding for chemokine receptor 7 (CCR7), chemokine ligand (CCL19), signal transducer and activator of transcription 1 (STAT1), and phosphoinositide-3-kinase regulatory subunit 1 (PIK3R1). In conclusion, the list of AD-associated proteins identified in this study, including LOR, KRT17, SPRRs, IVL, CCR7, CCL19, PIK3R1 and STAT1 may prove useful for the development of methods to treat AD. From these proteins, PIK3R1 and KRT17 are novel and promising targets for AD therapy.

  2. Quantum Algorithms

    NASA Technical Reports Server (NTRS)

    Abrams, D.; Williams, C.

    1999-01-01

    This thesis describes several new quantum algorithms. These include a polynomial time algorithm that uses a quantum fast Fourier transform to find eigenvalues and eigenvectors of a Hamiltonian operator, and that can be applied in cases for which all know classical algorithms require exponential time.

  3. Emerging Use of Gene Expression Microarrays in Plant Physiology

    DOE PAGES

    Wullschleger, Stan D.; Difazio, Stephen P.

    2003-01-01

    Microarrays have become an important technology for the global analysis of gene expression in humans, animals, plants, and microbes. Implemented in the context of a well-designed experiment, cDNA and oligonucleotide arrays can provide highthroughput, simultaneous analysis of transcript abundance for hundreds, if not thousands, of genes. However, despite widespread acceptance, the use of microarrays as a tool to better understand processes of interest to the plant physiologist is still being explored. To help illustrate current uses of microarrays in the plant sciences, several case studies that we believe demonstrate the emerging application of gene expression arrays in plant physiology weremore » selected from among the many posters and presentations at the 2003 Plant and Animal Genome XI Conference. Based on this survey, microarrays are being used to assess gene expression in plants exposed to the experimental manipulation of air temperature, soil water content and aluminium concentration in the root zone. Analysis often includes characterizing transcript profiles for multiple post-treatment sampling periods and categorizing genes with common patterns of response using hierarchical clustering techniques. In addition, microarrays are also providing insights into developmental changes in gene expression associated with fibre and root elongation in cotton and maize, respectively. Technical and analytical limitations of microarrays are discussed and projects attempting to advance areas of microarray design and data analysis are highlighted. Finally, although much work remains, we conclude that microarrays are a valuable tool for the plant physiologist interested in the characterization and identification of individual genes and gene families with potential application in the fields of agriculture, horticulture and forestry.« less

  4. DNA Microarrays for Aptamer Identification and Structural Characterization

    DTIC Science & Technology

    2012-09-01

    AFRL-RH-WP-TR-2013-0130 DNA MICROARRAYS FOR APTAMER IDENTIFICATION AND STRUCTURAL CHARACTERIZATION Jennifer A. Martin National Research Council...Interim September 2010 to September 2012 4. TITLE AND SUBTITLE DNA Microarrays for Aptamer Identification and Structural Characterization 5a. CONTRACT... Aptamers are ideal recognition elements, but integrating aptamers onto a sensor platform has two main challenges: (1) aptamers are selected in

  5. Plant-pathogen interactions: what microarray tells about it?

    PubMed

    Lodha, T D; Basak, J

    2012-01-01

    Plant defense responses are mediated by elementary regulatory proteins that affect expression of thousands of genes. Over the last decade, microarray technology has played a key role in deciphering the underlying networks of gene regulation in plants that lead to a wide variety of defence responses. Microarray is an important tool to quantify and profile the expression of thousands of genes simultaneously, with two main aims: (1) gene discovery and (2) global expression profiling. Several microarray technologies are currently in use; most include a glass slide platform with spotted cDNA or oligonucleotides. Till date, microarray technology has been used in the identification of regulatory genes, end-point defence genes, to understand the signal transduction processes underlying disease resistance and its intimate links to other physiological pathways. Microarray technology can be used for in-depth, simultaneous profiling of host/pathogen genes as the disease progresses from infection to resistance/susceptibility at different developmental stages of the host, which can be done in different environments, for clearer understanding of the processes involved. A thorough knowledge of plant disease resistance using successful combination of microarray and other high throughput techniques, as well as biochemical, genetic, and cell biological experiments is needed for practical application to secure and stabilize yield of many crop plants. This review starts with a brief introduction to microarray technology, followed by the basics of plant-pathogen interaction, the use of DNA microarrays over the last decade to unravel the mysteries of plant-pathogen interaction, and ends with the future prospects of this technology.

  6. Analysis of Microarray and RNA-seq Expression Profiling Data.

    PubMed

    Hung, Jui-Hung; Weng, Zhiping

    2017-03-01

    Gene expression profiling refers to the simultaneous measurement of the expression levels of a large number of genes (often all genes in a genome), typically in multiple experiments spanning a variety of cell types, treatments, or environmental conditions. Expression profiling is accomplished by assaying mRNA levels with microarrays or next-generation sequencing technologies (RNA-seq). This introduction describes normalization and analysis of data generated from microarray or RNA-seq experiments.

  7. Multipathogen oligonucleotide microarray for environmental and biodefense applications.

    PubMed

    Sergeev, Nikolay; Distler, Margaret; Courtney, Shannon; Al-Khaldi, Sufian F; Volokhov, Dmitriy; Chizhikov, Vladimir; Rasooly, Avraham

    2004-11-01

    Food-borne pathogens are a major health problem. The large and diverse number of microbial pathogens and their virulence factors has fueled interest in technologies capable of detecting multiple pathogens and multiple virulence factors simultaneously. Some of these pathogens and their toxins have potential use as bioweapons. DNA microarray technology allows the simultaneous analysis of thousands of sequences of DNA in a relatively short time, making it appropriate for biodefense and for public health uses. This paper describes methods for using DNA microarrays to detect and analyze microbial pathogens. The FDA-1 microarray was developed for the simultaneous detection of several food-borne pathogens and their virulence factors including Listeria spp., Campylobacter spp., Staphylococcus aureus enterotoxin genes and Clostridium perfringens toxin genes. Three elements were incorporated to increase confidence in the microarray detection system: redundancy of genes, redundancy of oligonucleotide probes (oligoprobes) for a specific gene, and quality control oligoprobes to monitor array spotting and target DNA hybridization. These elements enhance the reliability of detection and reduce the chance of erroneous results due to the genetic variability of microbes or technical problems with the microarray. The results presented demonstrate the potential of oligonucleotide microarrays for detection of environmental and biodefense relevant microbial pathogens.

  8. Assessing Bacterial Interactions Using Carbohydrate-Based Microarrays

    PubMed Central

    Flannery, Andrea; Gerlach, Jared Q.; Joshi, Lokesh; Kilcoyne, Michelle

    2015-01-01

    Carbohydrates play a crucial role in host-microorganism interactions and many host glycoconjugates are receptors or co-receptors for microbial binding. Host glycosylation varies with species and location in the body, and this contributes to species specificity and tropism of commensal and pathogenic bacteria. Additionally, bacterial glycosylation is often the first bacterial molecular species encountered and responded to by the host system. Accordingly, characterising and identifying the exact structures involved in these critical interactions is an important priority in deciphering microbial pathogenesis. Carbohydrate-based microarray platforms have been an underused tool for screening bacterial interactions with specific carbohydrate structures, but they are growing in popularity in recent years. In this review, we discuss carbohydrate-based microarrays that have been profiled with whole bacteria, recombinantly expressed adhesins or serum antibodies. Three main types of carbohydrate-based microarray platform are considered; (i) conventional carbohydrate or glycan microarrays; (ii) whole mucin microarrays; and (iii) microarrays constructed from bacterial polysaccharides or their components. Determining the nature of the interactions between bacteria and host can help clarify the molecular mechanisms of carbohydrate-mediated interactions in microbial pathogenesis, infectious disease and host immune response and may lead to new strategies to boost therapeutic treatments. PMID:27600247

  9. A comparative analysis of DNA barcode microarray feature size

    PubMed Central

    Ammar, Ron; Smith, Andrew M; Heisler, Lawrence E; Giaever, Guri; Nislow, Corey

    2009-01-01

    Background Microarrays are an invaluable tool in many modern genomic studies. It is generally perceived that decreasing the size of microarray features leads to arrays with higher resolution (due to greater feature density), but this increase in resolution can compromise sensitivity. Results We demonstrate that barcode microarrays with smaller features are equally capable of detecting variation in DNA barcode intensity when compared to larger feature sizes within a specific microarray platform. The barcodes used in this study are the well-characterized set derived from the Yeast KnockOut (YKO) collection used for screens of pooled yeast (Saccharomyces cerevisiae) deletion mutants. We treated these pools with the glycosylation inhibitor tunicamycin as a test compound. Three generations of barcode microarrays at 30, 8 and 5 μm features sizes independently identified the primary target of tunicamycin to be ALG7. Conclusion We show that the data obtained with 5 μm feature size is of comparable quality to the 30 μm size and propose that further shrinking of features could yield barcode microarrays with equal or greater resolving power and, more importantly, higher density. PMID:19825181

  10. [Future aspect of cytogenetics using chromosomal microarray testing].

    PubMed

    Yamamoto, Toshiyuki

    2014-01-01

    With the advent of chromosomal microarray testing, microdeletions can be detected in approximately 17% of cases without any abnormality detectable by conventional karyotyping. Structural abnormalities frequently occur at the terminal regions of the chromosomes, called the subtelomeres, because of their structural features. Subtelomere deletions and unbalanced translocations between chromosomes are frequently observed. However, most microdeletions observed by chromosomal microarray testing are microdeletions in intermediate regions. Submicroscopic duplications reciprocal to the deletions seen in the microdeletion syndromes, such as the 16p11.2 region, have been revealed. Discovery of multi-hit chromosomal abnormalities is another achievement by chromosomal microarray testing. Chromosomal microarray testing can determine the ranges of chromosomal structural abnormalities at a DNA level. Thus, the effects of a specific gene deletion on symptoms can be revealed by comparing multiple patients with slightly different chromosomal deletions in the same region (genotype/phenotype correlation). Chromosomal microarray testing comprehensively determines the genomic copy number, but reveals no secondary structure, requiring verification by cytogenetics using FISH. To interpret the results, familial or benign copy number variations (CNV) should be taken into consideration. An appropriate system should be constructed to provide opportunities of chromosomal microarray testing for patients who need this examination and to facilitate the use of results for medical practice.

  11. Evaluating concentration estimation errors in ELISA microarray experiments

    SciTech Connect

    Daly, Don S.; White, Amanda M.; Varnum, Susan M.; Anderson, Kevin K.; Zangar, Richard C.

    2005-01-26

    Enzyme-linked immunosorbent assay (ELISA) is a standard immunoassay to predict a protein concentration in a sample. Deploying ELISA in a microarray format permits simultaneous prediction of the concentrations of numerous proteins in a small sample. These predictions, however, are uncertain due to processing error and biological variability. Evaluating prediction error is critical to interpreting biological significance and improving the ELISA microarray process. Evaluating prediction error must be automated to realize a reliable high-throughput ELISA microarray system. Methods: In this paper, we present a statistical method based on propagation of error to evaluate prediction errors in the ELISA microarray process. Although propagation of error is central to this method, it is effective only when comparable data are available. Therefore, we briefly discuss the roles of experimental design, data screening, normalization and statistical diagnostics when evaluating ELISA microarray prediction errors. We use an ELISA microarray investigation of breast cancer biomarkers to illustrate the evaluation of prediction errors. The illustration begins with a description of the design and resulting data, followed by a brief discussion of data screening and normalization. In our illustration, we fit a standard curve to the screened and normalized data, review the modeling diagnostics, and apply propagation of error.

  12. Design and analysis of mismatch probes for long oligonucleotide microarrays

    SciTech Connect

    Deng, Ye; He, Zhili; Van Nostrand, Joy D.; Zhou, Jizhong

    2008-08-15

    Nonspecific hybridization is currently a major concern with microarray technology. One of most effective approaches to estimating nonspecific hybridizations in oligonucleotide microarrays is the utilization of mismatch probes; however, this approach has not been used for longer oligonucleotide probes. Here, an oligonucleotide microarray was constructed to evaluate and optimize parameters for 50-mer mismatch probe design. A perfect match (PM) and 28 mismatch (MM) probes were designed for each of ten target genes selected from three microorganisms. The microarrays were hybridized with synthesized complementary oligonucleotide targets at different temperatures (e.g., 42, 45 and 50 C). In general, the probes with evenly distributed mismatches were more distinguishable than those with randomly distributed mismatches. MM probes with 3, 4 and 5 mismatched nucleotides were differentiated for 50-mer oligonucleotide probes hybridized at 50, 45 and 42 C, respectively. Based on the experimental data generated from this study, a modified positional dependent nearest neighbor (MPDNN) model was constructed to adjust the thermodynamic parameters of matched and mismatched dimer nucleotides in the microarray environment. The MM probes with four flexible positional mismatches were designed using the newly established MPDNN model and the experimental results demonstrated that the redesigned MM probes could yield more consistent hybridizations. Conclusions: This study provides guidance on the design of MM probes for long oligonucleotides (e.g., 50 mers). The novel MPDNN model has improved the consistency for long MM probes, and this modeling method can potentially be used for the prediction of oligonucleotide microarray hybridizations.

  13. DNA Microarray Characterization of Pathogens Associated with Sexually Transmitted Diseases

    PubMed Central

    Cao, Boyang; Wang, Suwei; Tian, Zhenyang; Hu, Pinliang; Feng, Lu; Wang, Lei

    2015-01-01

    This study established a multiplex PCR-based microarray to detect simultaneously a diverse panel of 17 sexually transmitted diseases (STDs)-associated pathogens including Neisseria gonorrhoeae, Chlamydia trachomatis, Mycoplasma genitalium, Mycoplasma hominis, Ureaplasma, Herpes simplex virus (HSV) types 1 and 2, and Human papillomavirus (HPV) types 6, 11, 16, 18, 31, 33, 35, 39, 54 and 58. The target genes are 16S rRNA gene for N. gonorrhoeae, M. genitalium, M. hominism, and Ureaplasma, the major outer membrane protein gene (ompA) for C. trachomatis, the glycoprotein B gene (gB) for HSV; and the L1 gene for HPV. A total of 34 probes were selected for the microarray including 31 specific probes, one as positive control, one as negative control, and one as positional control probe for printing reference. The microarray is specific as the commensal and pathogenic microbes (and closely related organisms) in the genitourinary tract did not cross-react with the microarray probes. The microarray is 10 times more sensitive than that of the multiplex PCR. Among the 158 suspected HPV specimens examined, the microarray showed that 49 samples contained HPV, 21 samples contained Ureaplasma, 15 contained M. hominis, four contained C. trachomatis, and one contained N. gonorrhoeae. This work reports the development of the first high through-put detection system that identifies common pathogens associated with STDs from clinical samples, and paves the way for establishing a time-saving, accurate and high-throughput diagnostic tool for STDs. PMID:26208181

  14. DNA Microarray Characterization of Pathogens Associated with Sexually Transmitted Diseases.

    PubMed

    Cao, Boyang; Wang, Suwei; Tian, Zhenyang; Hu, Pinliang; Feng, Lu; Wang, Lei

    2015-01-01

    This study established a multiplex PCR-based microarray to detect simultaneously a diverse panel of 17 sexually transmitted diseases (STDs)-associated pathogens including Neisseria gonorrhoeae, Chlamydia trachomatis, Mycoplasma genitalium, Mycoplasma hominis, Ureaplasma, Herpes simplex virus (HSV) types 1 and 2, and Human papillomavirus (HPV) types 6, 11, 16, 18, 31, 33, 35, 39, 54 and 58. The target genes are 16S rRNA gene for N. gonorrhoeae, M. genitalium, M. hominism, and Ureaplasma, the major outer membrane protein gene (ompA) for C. trachomatis, the glycoprotein B gene (gB) for HSV; and the L1 gene for HPV. A total of 34 probes were selected for the microarray including 31 specific probes, one as positive control, one as negative control, and one as positional control probe for printing reference. The microarray is specific as the commensal and pathogenic microbes (and closely related organisms) in the genitourinary tract did not cross-react with the microarray probes. The microarray is 10 times more sensitive than that of the multiplex PCR. Among the 158 suspected HPV specimens examined, the microarray showed that 49 samples contained HPV, 21 samples contained Ureaplasma, 15 contained M. hominis, four contained C. trachomatis, and one contained N. gonorrhoeae. This work reports the development of the first high through-put detection system that identifies common pathogens associated with STDs from clinical samples, and paves the way for establishing a time-saving, accurate and high-throughput diagnostic tool for STDs.

  15. Basic Concepts of Microarrays and Potential Applications in Clinical Microbiology

    PubMed Central

    Miller, Melissa B.; Tang, Yi-Wei

    2009-01-01

    Summary: The introduction of in vitro nucleic acid amplification techniques, led by real-time PCR, into the clinical microbiology laboratory has transformed the laboratory detection of viruses and select bacterial pathogens. However, the progression of the molecular diagnostic revolution currently relies on the ability to efficiently and accurately offer multiplex detection and characterization for a variety of infectious disease pathogens. Microarray analysis has the capability to offer robust multiplex detection but has just started to enter the diagnostic microbiology laboratory. Multiple microarray platforms exist, including printed double-stranded DNA and oligonucleotide arrays, in situ-synthesized arrays, high-density bead arrays, electronic microarrays, and suspension bead arrays. One aim of this paper is to review microarray technology, highlighting technical differences between them and each platform's advantages and disadvantages. Although the use of microarrays to generate gene expression data has become routine, applications pertinent to clinical microbiology continue to rapidly expand. This review highlights uses of microarray technology that impact diagnostic microbiology, including the detection and identification of pathogens, determination of antimicrobial resistance, epidemiological strain typing, and analysis of microbial infections using host genomic expression and polymorphism profiles. PMID:19822891

  16. Conservative Patch Algorithm and Mesh Sequencing for PAB3D

    NASA Technical Reports Server (NTRS)

    Pao, S. P.; Abdol-Hamid, K. S.

    2005-01-01

    A mesh-sequencing algorithm and a conservative patched-grid-interface algorithm (hereafter Patch Algorithm ) have been incorporated into the PAB3D code, which is a computer program that solves the Navier-Stokes equations for the simulation of subsonic, transonic, or supersonic flows surrounding an aircraft or other complex aerodynamic shapes. These algorithms are efficient, flexible, and have added tremendously to the capabilities of PAB3D. The mesh-sequencing algorithm makes it possible to perform preliminary computations using only a fraction of the grid cells (provided the original cell count is divisible by an integer) along any grid coordinate axis, independently of the other axes. The patch algorithm addresses another critical need in multi-block grid situation where the cell faces of adjacent grid blocks may not coincide, leading to errors in calculating fluxes of conserved physical quantities across interfaces between the blocks. The patch algorithm, based on the Stokes integral formulation of the applicable conservation laws, effectively matches each of the interfacial cells on one side of the block interface to the corresponding fractional cell area pieces on the other side. This approach is comprehensive and unified such that all interface topology is automatically processed without user intervention. This algorithm is implemented in a preprocessing code that creates a cell-by-cell database that will maintain flux conservation at any level of full or reduced grid density as the user may choose by way of the mesh-sequencing algorithm. These two algorithms have enhanced the numerical accuracy of the code, reduced the time and effort for grid preprocessing, and provided users with the flexibility of performing computations at any desired full or reduced grid resolution to suit their specific computational requirements.

  17. Directional-cosine and related pre-processing techniques - Possibilities and problems in earth-resources surveys

    NASA Technical Reports Server (NTRS)

    Quiel, F.

    1975-01-01

    The possibilities of using various pre-processing techniques (directional-cosine, ratios and ratio/sum) have been investigated in relation to an urban land-use problem in Marion County, Indiana (USA) and for geologic applications in the San Juan Mountains of Colorado. For Marion County, it proved possible to classify directional-cosine data from September 1972 into different land uses by applying statistics developed with data from a May 1973 ERTS frame, thereby demonstrating the possibilities of using this type of data for signature-extension purposes. In the Silverton (Colorado) area pre-processed data proved superior to original data when extracting useful information in mountainous areas without corresponding ground observations. This approach allowed meaningful classification and interpretation of the data. The main problems encountered as a result of atmospheric effects, mixing of different surface materials, and the performance characteristics of ERTS are elucidated.

  18. Studies of patterned surfaces for biological microarrays

    NASA Astrophysics Data System (ADS)

    Gillmor, Susan Dale

    Over the past 10 years, biological microarrays have developed into an invaluable tool for genetic and protein research. The task to draw meaningful conclusions between variations of genes and their expression requires millions of comparisons between standard and stressed samples, usually the cDNA, RNA, or proteins within cells. For such a project, high-information-density, highly pure arrays are required. In fabricating an array on a uniform or an unpatterned substrate, droplets of solution, if placed too closely, can bleed into each other and can cross-contaminate several array sites. Therefore, a uniform surface limits the density of droplets that can be placed to create an array. When the surface is patterned with a barrier between the droplets, then the density of array sites can be significantly larger (uniform surface, ˜200--500mum center-to-center; patterned surface, 100mum center-to-center and less with present loading technology). We have explored the patterning of surfaces to construct biological microarrays, via altering the surface chemically to create array sites with gold-thiol chemistry, and via a template placed on the surface to outline the elements. In the template strategy, we have investigated poly(dimethyl siloxane) (PDMS) films (5--10mum) with holes in a regular array. However, the hydrophobic PDMS repels water to such an extent that the droplets do not wet the template and cannot travel down the wall of the PDMS hole to interact with the surface. As a consequence, if not accurately placed in the array sites, they also do not load into the holes to form filled features. Our current studies focus on altering the surface of the PDMS to allow the droplets to fall into the PDMS holes. To alter the surface and not the bulk, we have experimented with plasma chemistry. To create a temporary contact angle change, oxygen plasma has been employed. However, the PDMS recovers and reverts to it characteristically hydrophobic surface. When we expose PDMS

  19. Photopatterning of Hydrogel Microarrays in Closed Microchips.

    PubMed

    Gumuscu, Burcu; Bomer, Johan G; van den Berg, Albert; Eijkel, Jan C T

    2015-12-14

    To date, optical lithography has been extensively used for in situ patterning of hydrogel structures in a scale range from hundreds of microns to a few millimeters. The two main limitations which prevent smaller feature sizes of hydrogel structures are (1) the upper glass layer of a microchip maintains a large spacing (typically 525 μm) between the photomask and hydrogel precursor, leading to diffraction of UV light at the edges of mask patterns, (2) diffusion of free radicals and monomers results in irregular polymerization near the illumination interface. In this work, we present a simple approach to enable the use of optical lithography to fabricate hydrogel arrays with a minimum feature size of 4 μm inside closed microchips. To achieve this, we combined two different techniques. First, the upper glass layer of the microchip was thinned by mechanical polishing to reduce the spacing between the photomask and hydrogel precursor, and thereby the diffraction of UV light at the edges of mask patterns. The polishing process reduces the upper layer thickness from ∼525 to ∼100 μm, and the mean surface roughness from 20 to 3 nm. Second, we developed an intermittent illumination technique consisting of short illumination periods followed by relatively longer dark periods, which decrease the diffusion of monomers. Combination of these two methods allows for fabrication of 0.4 × 10(6) sub-10 μm sized hydrogel patterns over large areas (cm(2)) with high reproducibility (∼98.5% patterning success). The patterning method is tested with two different types of photopolymerizing hydrogels: polyacrylamide and polyethylene glycol diacrylate. This method enables in situ fabrication of well-defined hydrogel patterns and presents a simple approach to fabricate 3-D hydrogel matrices for biomolecule separation, biosensing, tissue engineering, and immobilized protein microarray applications.

  20. An intelligent pre-processing framework for standardizing medical images for CAD and other post-processing applications

    NASA Astrophysics Data System (ADS)

    Raghupathi, Lakshminarasimhan; Devarakota, Pandu R.; Wolf, Matthias

    2012-03-01

    There is an increasing need to provide end-users with seamless and secure access to healthcare information acquired from a diverse range of sources. This might include local and remote hospital sites equipped with different vendors and practicing varied acquisition protocols and also heterogeneous external sources such as the Internet cloud. In such scenarios, image post-processing tools such as CAD (computer-aided diagnosis) which were hitherto developed using a smaller set of images may not always work optimally on newer set of images having entirely different characteristics. In this paper, we propose a framework that assesses the quality of a given input image and automatically applies an appropriate pre-processing method in such a manner that the image characteristics are normalized regardless of its source. We focus mainly on medical images, and the objective of the said preprocessing method is to standardize the performance of various image processing and workflow applications like CAD to perform in a consistent manner. First, our system consists of an assessment step wherein an image is evaluated based on criteria such as noise, image sharpness, etc. Depending on the measured characteristic, we then apply an appropriate normalization technique thus giving way to our overall pre-processing framework. A systematic evaluation of the proposed scheme is carried out on large set of CT images acquired from various vendors including images reconstructed with next generation iterative methods. Results demonstrate that the images are normalized and thus suitable for an existing LungCAD prototype1.

  1. Hardware Design and Implementation of a Wavelet De-Noising Procedure for Medical Signal Preprocessing

    PubMed Central

    Chen, Szi-Wen; Chen, Yuan-Ho

    2015-01-01

    In this paper, a discrete wavelet transform (DWT) based de-noising with its applications into the noise reduction for medical signal preprocessing is introduced. This work focuses on the hardware realization of a real-time wavelet de-noising procedure. The proposed de-noising circuit mainly consists of three modules: a DWT, a thresholding, and an inverse DWT (IDWT) modular circuits. We also proposed a novel adaptive thresholding scheme and incorporated it into our wavelet de-noising procedure. Performance was then evaluated on both the architectural designs of the software and. In addition, the de-noising circuit was also implemented by downloading the Verilog codes to a field programmable gate array (FPGA) based platform so that its ability in noise reduction may be further validated in actual practice. Simulation experiment results produced by applying a set of simulated noise-contaminated electrocardiogram (ECG) signals into the de-noising circuit showed that the circuit could not only desirably meet the requirement of real-time processing, but also achieve satisfactory performance for noise reduction, while the sharp features of the ECG signals can be well preserved. The proposed de-noising circuit was further synthesized using the Synopsys Design Compiler with an Artisan Taiwan Semiconductor Manufacturing Company (TSMC, Hsinchu, Taiwan) 40 nm standard cell library. The integrated circuit (IC) synthesis simulation results showed that the proposed design can achieve a clock frequency of 200 MHz and the power consumption was only 17.4 mW, when operated at 200 MHz. PMID:26501290

  2. Novel low-power ultrasound digital preprocessing architecture for wireless display.

    PubMed

    Levesque, Philippe; Sawan, Mohamad

    2010-03-01

    A complete hardware-based ultrasound preprocessing unit (PPU) is presented as an alternative to available power-hungry devices. Intended to expand the ultrasonic applications, the proposed unit allows replacement of the cable of the ultrasonic probe by a wireless link to transfer data from the probe to a remote monitor. The digital back-end architecture of this PPU is fully pipelined, which permits sampling of ultrasonic signals at a frequency equal to the field-programmable gate array-based system clock, up to 100 MHz. Experimental results show that the proposed processing unit has an excellent performance, an equivalent 53.15 Dhrystone 2.1 MIPS/ MHz (DMIPS/MHz), compared with other software-based architectures that allow a maximum of 1.6 DMIPS/MHz. In addition, an adaptive subsampling method is proposed to operate the pixel compressor, which allows real-time image zooming and, by removing high-frequency noise, the lateral and axial resolutions are enhanced by 25% and 33%, respectively. Realtime images, acquired from a reference phantom, validated the feasibility of the proposed architecture. For a display rate of 15 frames per second, and a 5-MHz single-element piezoelectric transducer, the proposed digital PPU requires a dynamic power of only 242 mW, which represents around 20% of the best-available software-based system. Furthermore, composed by the ultrasound processor and the image interpolation unit, the digital processing core of the PPU presents good power-performance ratios of 26 DMIPS/mW and 43.9 DMIPS/mW at a 20-MHz and 100-MHz sample frequency, respectively.

  3. FITPix data preprocessing pipeline for the Timepix single particle pixel detector

    NASA Astrophysics Data System (ADS)

    Kraus, V.; Holik, M.; Jakubek, J.; Georgiev, V.

    2012-04-01

    The semiconductor pixel detector Timepix contains an array of 256 × 256 square pixels with a pitch of 55 μm. The single quantum counting detector Timepix can also provide information about the energy or arrival time of a particle from every single pixel. This device is a powerful tool for radiation imaging and ionizing particle tracking. The Timepix device can be read-out via a serial or parallel interface enabling speeds of 100 fps or 3200 fps, respectively. The device can be connected to a PC via the USB 2.0 based interface FITPix, which currently supports the serial output of Timepix reaching a speed of 90 fps. FITPix supports adjustable clock frequency and hardware triggering which is a useful tool for the synchronized operation of multiple devices. The FITPix interface can handle up to 16 detectors in daisy chain. The complete system including the FITPix interface and Timepix detector is controlled from the PC by the Pixelman software package. A pipeline structure is now implemented in the new version of the readout interface of FITPix. This version also supports parallel Timepix readout. The pipeline architecture brings the possibility of data preprocessing directly in the hardware. The first pipeline stage converts the raw Timepix data into the form of a matrix or stream of pixel values. Another stage performs further data processing such as event thresholding and data compression. Complex data processing currently performed by Pixelman in the PC is significantly reduced in this way. The described architecture together with the parallel readout increases data throughput reaching a higher frame-rate and reducing the dead time. Significant data compression is performed directly in the hardware especially for sparse data sets from particle tracking applications. The data frame size is typically compressed by factor of 10-100.

  4. Hardware design and implementation of a wavelet de-noising procedure for medical signal preprocessing.

    PubMed

    Chen, Szi-Wen; Chen, Yuan-Ho

    2015-10-16

    In this paper, a discrete wavelet transform (DWT) based de-noising with its applications into the noise reduction for medical signal preprocessing is introduced. This work focuses on the hardware realization of a real-time wavelet de-noising procedure. The proposed de-noising circuit mainly consists of three modules: a DWT, a thresholding, and an inverse DWT (IDWT) modular circuits. We also proposed a novel adaptive thresholding scheme and incorporated it into our wavelet de-noising procedure. Performance was then evaluated on both the architectural designs of the software and. In addition, the de-noising circuit was also implemented by downloading the Verilog codes to a field programmable gate array (FPGA) based platform so that its ability in noise reduction may be further validated in actual practice. Simulation experiment results produced by applying a set of simulated noise-contaminated electrocardiogram (ECG) signals into the de-noising circuit showed that the circuit could not only desirably meet the requirement of real-time processing, but also achieve satisfactory performance for noise reduction, while the sharp features of the ECG signals can be well preserved. The proposed de-noising circuit was further synthesized using the Synopsys Design Compiler with an Artisan Taiwan Semiconductor Manufacturing Company (TSMC, Hsinchu, Taiwan) 40 nm standard cell library. The integrated circuit (IC) synthesis simulation results showed that the proposed design can achieve a clock frequency of 200 MHz and the power consumption was only 17.4 mW, when operated at 200 MHz.

  5. Probabilistic non-negative matrix factorization: theory and application to microarray data analysis.

    PubMed

    Bayar, Belhassen; Bouaynaya, Nidhal; Shterenberg, Roman

    2014-02-01

    Non-negative matrix factorization (NMF) has proven to be a useful decomposition technique for multivariate data, where the non-negativity constraint is necessary to have a meaningful physical interpretation. NMF reduces the dimensionality of non-negative data by decomposing it into two smaller non-negative factors with physical interpretation for class discovery. The NMF algorithm, however, assumes a deterministic framework. In particular, the effect of the data noise on the stability of the factorization and the convergence of the algorithm are unknown. Collected data, on the other hand, is stochastic in nature due to measurement noise and sometimes inherent variability in the physical process. This paper presents new theoretical and applied developments to the problem of non-negative matrix factorization (NMF). First, we generalize the deterministic NMF algorithm to include a general class of update rules that converges towards an optimal non-negative factorization. Second, we extend the NMF framework to the probabilistic case (PNMF). We show that the Maximum a posteriori (MAP) estimate of the non-negative factors is the solution to a weighted regularized non-negative matrix factorization problem. We subsequently derive update rules that converge towards an optimal solution. Third, we apply the PNMF to cluster and classify DNA microarrays data. The proposed PNMF is shown to outperform the deterministic NMF and the sparse NMF algorithms in clustering stability and classification accuracy.

  6. Rough-fuzzy clustering for grouping functionally similar genes from microarray data.

    PubMed

    Maji, Pradipta; Paul, Sushmita

    2013-01-01

    Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets.

  7. SU-E-J-261: The Importance of Appropriate Image Preprocessing to Augment the Information of Radiomics Image Features

    SciTech Connect

    Zhang, L; Fried, D; Fave, X; Mackin, D; Yang, J; Court, L

    2015-06-15

    Purpose: To investigate how different image preprocessing techniques, their parameters, and the different boundary handling techniques can augment the information of features and improve feature’s differentiating capability. Methods: Twenty-seven NSCLC patients with a solid tumor volume and no visually obvious necrotic regions in the simulation CT images were identified. Fourteen of these patients had a necrotic region visible in their pre-treatment PET images (necrosis group), and thirteen had no visible necrotic region in the pre-treatment PET images (non-necrosis group). We investigated how image preprocessing can impact the ability of radiomics image features extracted from the CT to differentiate between two groups. It is expected the histogram in the necrosis group is more negatively skewed, and the uniformity from the necrosis group is less. Therefore, we analyzed two first order features, skewness and uniformity, on the image inside the GTV in the intensity range [−20HU, 180HU] under the combination of several image preprocessing techniques: (1) applying the isotropic Gaussian or anisotropic diffusion smoothing filter with a range of parameter(Gaussian smoothing: size=11, sigma=0:0.1:2.3; anisotropic smoothing: iteration=4, kappa=0:10:110); (2) applying the boundaryadapted Laplacian filter; and (3) applying the adaptive upper threshold for the intensity range. A 2-tailed T-test was used to evaluate the differentiating capability of CT features on pre-treatment PT necrosis. Result: Without any preprocessing, no differences in either skewness or uniformity were observed between two groups. After applying appropriate Gaussian filters (sigma>=1.3) or anisotropic filters(kappa >=60) with the adaptive upper threshold, skewness was significantly more negative in the necrosis group(p<0.05). By applying the boundary-adapted Laplacian filtering after the appropriate Gaussian filters (0.5 <=sigma<=1.1) or anisotropic filters(20<=kappa <=50), the uniformity was

  8. DNA Microarray Detection of 18 Important Human Blood Protozoan Species

    PubMed Central

    Chen, Jun-Hu; Feng, Xin-Yu; Chen, Shao-Hong; Cai, Yu-Chun; Lu, Yan; Zhou, Xiao-Nong; Chen, Jia-Xu; Hu, Wei

    2016-01-01

    Background Accurate detection of blood protozoa from clinical samples is important for diagnosis, treatment and control of related diseases. In this preliminary study, a novel DNA microarray system was assessed for the detection of Plasmodium, Leishmania, Trypanosoma, Toxoplasma gondii and Babesia in humans, animals, and vectors, in comparison with microscopy and PCR data. Developing a rapid, simple, and convenient detection method for protozoan detection is an urgent need. Methodology/Principal Findings The microarray assay simultaneously identified 18 species of common blood protozoa based on the differences in respective target genes. A total of 20 specific primer pairs and 107 microarray probes were selected according to conserved regions which were designed to identify 18 species in 5 blood protozoan genera. The positive detection rate of the microarray assay was 91.78% (402/438). Sensitivity and specificity for blood protozoan detection ranged from 82.4% (95%CI: 65.9% ~ 98.8%) to 100.0% and 95.1% (95%CI: 93.2% ~ 97.0%) to 100.0%, respectively. Positive predictive value (PPV) and negative predictive value (NPV) ranged from 20.0% (95%CI: 2.5% ~ 37.5%) to 100.0% and 96.8% (95%CI: 95.0% ~ 98.6%) to 100.0%, respectively. Youden index varied from 0.82 to 0.98. The detection limit of the DNA microarrays ranged from 200 to 500 copies/reaction, similar to PCR findings. The concordance rate between microarray data and DNA sequencing results was 100%. Conclusions/Significance Overall, the newly developed microarray platform provides a convenient, highly accurate, and reliable clinical assay for the determination of blood protozoan species. PMID:27911895

  9. A critical comparison of protein microarray fabrication technologies.

    PubMed

    Romanov, Valentin; Davidoff, S Nikki; Miles, Adam R; Grainger, David W; Gale, Bruce K; Brooks, Benjamin D

    2014-03-21

    Of the diverse analytical tools used in proteomics, protein microarrays possess the greatest potential for providing fundamental information on protein, ligand, analyte, receptor, and antibody affinity-based interactions, binding partners and high-throughput analysis. Microarrays have been used to develop tools for drug screening, disease diagnosis, biochemical pathway mapping, protein-protein interaction analysis, vaccine development, enzyme-substrate profiling, and immuno-profiling. While the promise of the technology is intriguing, it is yet to be realized. Many challenges remain to be addressed to allow these methods to meet technical and research expectations, provide reliable assay answers, and to reliably diversify their capabilities. Critical issues include: (1) inconsistent printed microspot morphologies and uniformities, (2) low signal-to-noise ratios due to factors such as complex surface capture protocols, contamination, and static or no-flow mass transport conditions, (3) inconsistent quantification of captured signal due to spot uniformity issues, (4) non-optimal protocol conditions such as pH, temperature, drying that promote variability in assay kinetics, and lastly (5) poor protein (e.g., antibody) printing, storage, or shelf-life compatibility with common microarray assay fabrication methods, directly related to microarray protocols. Conventional printing approaches, including contact (e.g., quill and solid pin), non-contact (e.g., piezo and inkjet), microfluidics-based, microstamping, lithography, and cell-free protein expression microarrays, have all been used with varying degrees of success with figures of merit often defined arbitrarily without comparisons to standards, or analytical or fiduciary controls. Many microarray performance reports use bench top analyte preparations lacking real-world relevance, akin to "fishing in a barrel", for proof of concept and determinations of figures of merit. This review critiques current protein

  10. Microarray studies of psychostimulant-induced changes in gene expression.

    PubMed

    Yuferov, Vadim; Nielsen, David; Butelman, Eduardo; Kreek, Mary Jeanne

    2005-03-01

    Alterations in the expression of multiple genes in many brain regions are likely to contribute to psychostimulant-induced behaviours. Microarray technology provides a powerful tool for the simultaneous interrogation of gene expression levels of a large number of genes. Several recent experimental studies, reviewed here, demonstrate the power, limitations and progress of microarray technology in the field of psychostimulant addiction. These studies vary in the paradigms of cocaine or amphetamine administration, drug doses, route and also mode of administration, duration of treatment, animal species, brain regions studied and time of tissue collection after final drug administration. The studies also utilize different microarray platforms and statistical techniques for analysis of differentially expressed genes. These variables influence substantially the results of these studies. It is clear that current microarray techniques cannot detect small changes reliably in gene expression of genes with low expression levels, including functionally significant changes in components of major neurotransmission systems such as glutamate, dopamine, opioid and GABA receptors, especially those that may occur after chronic drug administration or drug withdrawal. However, the microarray studies reviewed here showed cocaine- or amphetamine-induced alterations in the expression of numerous genes involved in the modulation of neuronal growth, cytoskeletal structures, synaptogenesis, signal transduction, apoptosis and cell metabolism. Application of laser capture microdissection and single-cell cDNA amplification may greatly enhance microarray studies of gene expression profiling. The combination of rapidly evolving microarray technology with established methods of neuroscience, molecular biology and genetics, as well as appropriate behavioural models of drug reinforcement, may provide a productive approach for delineating the neurobiological underpinnings of drug responses that lead to

  11. A graphical method to evaluate spectral preprocessing in multivariate regression calibrations: example with Savitzky-Golay filters and partial least squares regression

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly ...

  12. Multicategory classification of 11 neuromuscular diseases based on microarray data using support vector machine.

    PubMed

    Choi, Soo Beom; Park, Jee Soo; Chung, Jai Won; Yoo, Tae Keun; Kim, Deok Won

    2014-01-01

    We applied multicategory machine learning methods to classify 11 neuromuscular disease groups and one control group based on microarray data. To develop multicategory classification models with optimal parameters and features, we performed a systematic evaluation of three machine learning algorithms and four feature selection methods using three-fold cross validation and a grid search. This study included 114 subjects of 11 neuromuscular diseases and 31 subjects of a control group using microarray data with 22,283 probe sets from the National Center for Biotechnology Information (NCBI). We obtained an accuracy of 100%, relative classifier information (RCI) of 1.0, and a kappa index of 1.0 by applying the models of support vector machines one-versus-one (SVM-OVO), SVM one-versus-rest (OVR), and directed acyclic graph SVM (DAGSVM), using the ratio of genes between categories to within-category sums of squares (BW) feature selection method. Each of these three models selected only four features to categorize the 12 groups, resulting in a time-saving and cost-effective strategy for diagnosing neuromuscular diseases. In addition, a gene symbol, SPP1 was selected as the top-ranked gene by the BW method. We confirmed relationships between the gene (SPP1) and Duchenne muscular dystrophy (DMD) from a previous study. With our models as clinically helpful tools, neuromuscular diseases could be classified quickly using a computer, thereby giving a time-saving, cost-effective, and accurate diagnosis.

  13. Molecular characterization of multidrug resistant hospital isolates using the antimicrobial resistance determinant microarray.

    PubMed

    Leski, Tomasz A; Vora, Gary J; Barrows, Brian R; Pimentel, Guillermo; House, Brent L; Nicklasson, Matilda; Wasfy, Momtaz; Abdel-Maksoud, Mohamed; Taitt, Chris Rowe

    2013-01-01

    Molecular methods that enable the detection of antimicrobial resistance determinants are critical surveillance tools that are necessary to aid in curbing the spread of antibiotic resistance. In this study, we describe the use of the Antimicrobial Resistance Determinant Microarray (ARDM) that targets 239 unique genes that confer resistance to 12 classes of antimicrobial compounds, quaternary amines and streptothricin for the determination of multidrug resistance (MDR) gene profiles. Fourteen reference MDR strains, which either were genome, sequenced or possessed well characterized drug resistance profiles were used to optimize detection algorithms and threshold criteria to ensure the microarray's effectiveness for unbiased characterization of antimicrobial resistance determinants in MDR strains. The subsequent testing of Acinetobacter baumannii, Escherichia coli and Klebsiella pneumoniae hospital isolates revealed the presence of several antibiotic resistance genes [e.g. belonging to TEM, SHV, OXA and CTX-M classes (and OXA and CTX-M subfamilies) of β-lactamases] and their assemblages which were confirmed by PCR and DNA sequence analysis. When combined with results from the reference strains, ~25% of the ARDM content was confirmed as effective for representing allelic content from both Gram-positive and -negative species. Taken together, the ARDM identified MDR assemblages containing six to 18 unique resistance genes in each strain tested, demonstrating its utility as a powerful tool for molecular epidemiological investigations of antimicrobial resistance in clinically relevant bacterial pathogens.

  14. Identification of Iron Homeostasis Genes Dysregulation Potentially Involved in Retinopathy of Prematurity Pathogenicity by Microarray Analysis

    PubMed Central

    Luo, Xian-qiong; Zhang, Chun-yi; Zhang, Jia-wen; Jiang, Jing-bo; Yin, Ai-hua; Guo, Li; Nie, Chuan; Lu, Xu-zai; Deng, Hua; Zhang, Liang

    2015-01-01

    Retinopathy of prematurity (ROP) is a serious disease of preterm neonates and there are limited systematic studies of the molecular mechanisms underlying ROP. Therefore, here we performed global gene expression profiling in human fetal retinal microvascular endothelial cells (RMECs) under hypoxic conditions in vitro. Aborted fetuses were enrolled and primary RMECs were isolated from eyeballs. Cultivated cells were treated with CoCl2 to induce hypoxia. The dual-color microarray approach was adopted to compare gene expression profiling between treated RMECs and the paired untreated control. The one-class algorithm in significance analysis of microarray (SAM) software was used to screen the differentially expressed genes (DEGs) and quantitative RT-PCR (qRT-PCR) was conducted to validate the results. Gene Ontology was employed for functional enrichment analysis. There were 326 DEGs between the hypoxia-induced group and untreated group. Of these genes, 198 were upregulated in hypoxic RMECs, while the other 128 hits were downregulated. In particular, genes in the iron ion homeostasis pathway were highly enriched under hypoxic conditions. Our study indicates that dysregulation of genes involved in iron homeostasis mediating oxidative damage may be responsible for the mechanisms underlying ROP. The “oxygen plus iron” hypothesis may improve our understanding of ROP pathogenesis. PMID:26557385

  15. Classification of serous ovarian tumors based on microarray data using multicategory support vector machines.

    PubMed

    Park, Jee Soo; Choi, Soo Beom; Chung, Jai Won; Kim, Sung Woo; Kim, Deok Won

    2014-01-01

    Ovarian cancer, the most fatal of reproductive cancers, is the fifth leading cause of death in women in the United States. Serous borderline ovarian tumors (SBOTs) are considered to be earlier or less malignant forms of serous ovarian carcinomas (SOCs). SBOTs are asymptomatic and progression to advanced stages is common. Using DNA microarray technology, we designed multicategory classification models to discriminate ovarian cancer subclasses. To develop multicategory classification models with optimal parameters and features, we systematically evaluated three machine learning algorithms and three feature selection methods using five-fold cross validation and a grid search. The study included 22 subjects with normal ovarian surface epithelial cells, 12 with SBOTs, and 79 with SOCs according to microarray data with 54,675 probe sets obtained from the National Center for Biotechnology Information gene expression omnibus repository. Application of the optimal model of support vector machines one-versus-rest with signal-to-noise as a feature selection method gave an accuracy of 97.3%, relative classifier information of 0.916, and a kappa index of 0.941. In addition, 5 features, including the expression of putative biomarkers SNTN and AOX1, were selected to differentiate between normal, SBOT, and SOC groups. An accurate diagnosis of ovarian tumor subclasses by application of multicategory machine learning would be cost-effective and simple to perform, and would ensure more effective subclass-targeted therapy.

  16. The end of the microarray Tower of Babel: will universal standards lead the way?

    PubMed

    Kawasaki, Ernest S

    2006-07-01

    Microarrays are the most common method of studying global gene expression, and may soon enter the realm of FDA-approved clinical/diagnostic testing of cancer and other diseases. However, the acceptance of array data has been made difficult by the proliferation of widely different array platforms with gene probes ranging in size from 25 bases (oligonucleotides) to several kilobases (complementary DNAs or cDNAs). The algorithms applied for image and data analysis are also as varied as the microarray platforms, perhaps more so. In addition, there is a total lack of universally accepted standards for use among the different platforms and even within the same array types. Due to this lack of coherency in array technologies, confusion in interpretation of data within and across platforms has often been the norm, and studies of the same biological phenomena have, in many cases, led to contradictory results. In this commentary/review, some of the causes of this confusion will be summarized, and progress in overcoming these obstacles will be described, with the goal of providing an optimistic view of the future for the use of array technologies in global expression profiling and other applications.

  17. High-content single-cell analysis on-chip using a laser microarray scanner.

    PubMed

    Zhou, Jing; Wu, Yu; Lee, Sang-Kwon; Fan, Rong

    2012-12-07

    High-content cellomic analysis is a powerful tool for rapid screening of cellular responses to extracellular cues and examination of intracellular signal transduction pathways at the single-cell level. In conjunction with microfluidics technology that provides unique advantages in sample processing and precise control of fluid delivery, it holds great potential to transform lab-on-a-chip systems for high-throughput cellular analysis. However, high-content imaging instruments are expensive, sophisticated, and not readily accessible. Herein, we report on a laser scanning cytometry approach that exploits a bench-top microarray scanner as an end-point reader to perform rapid and automated fluorescence imaging of cells cultured on a chip. Using high-content imaging analysis algorithms, we demonstrated multiplexed measurements of morphometric and proteomic parameters from all single cells. Our approach shows the improvement of both sensitivity and dynamic range by two orders of magnitude as compared to conventional epifluorescence microscopy. We applied this technology to high-throughput analysis of mesenchymal stem cells on an extracellular matrix protein array and characterization of heterotypic cell populations. This work demonstrates the feasibility of a laser microarray scanner for high-content cellomic analysis and opens up new opportunities to conduct informative cellular analysis and cell-based screening in the lab-on-a-chip systems.

  18. Molecular Characterization of Multidrug Resistant Hospital Isolates Using the Antimicrobial Resistance Determinant Microarray

    PubMed Central

    Leski, Tomasz A.; Vora, Gary J.; Barrows, Brian R.; Pimentel, Guillermo; House, Brent L.; Nicklasson, Matilda; Wasfy, Momtaz; Abdel-Maksoud, Mohamed; Taitt, Chris Rowe

    2013-01-01

    Molecular methods that enable the detection of antimicrobial resistance determinants are critical surveillance tools that are necessary to aid in curbing the spread of antibiotic resistance. In this study, we describe the use of the Antimicrobial Resistance Determinant Microarray (ARDM) that targets 239 unique genes that confer resistance to 12 classes of antimicrobial compounds, quaternary amines and streptothricin for the determination of multidrug resistance (MDR) gene profiles. Fourteen reference MDR strains, which either were genome, sequenced or possessed well characterized drug resistance profiles were used to optimize detection algorithms and threshold criteria to ensure the microarray's effectiveness for unbiased characterization of antimicrobial resistance determinants in MDR strains. The subsequent testing of Acinetobacter baumannii, Escherichia coli and Klebsiella pneumoniae hospital isolates revealed the presence of several antibiotic resistance genes [e.g. belonging to TEM, SHV, OXA and CTX-M classes (and OXA and CTX-M subfamilies) of β-lactamases] and their assemblages which were confirmed by PCR and DNA sequence analysis. When combined with results from the reference strains, ∼25% of the ARDM content was confirmed as effective for representing allelic content from both Gram-positive and –negative species. Taken together, the ARDM identified MDR assemblages containing six to 18 unique resistance genes in each strain tested, demonstrating its utility as a powerful tool for molecular epidemiological investigations of antimicrobial resistance in clinically relevant bacterial pathogens. PMID:23936031

  19. Robust and efficient synthetic method for forming DNA microarrays.

    PubMed

    Dolan, P L; Wu, Y; Ista, L K; Metzenberg, R L; Nelson, M A; Lopez, G P

    2001-11-01

    The field of DNA microarray technology has necessitated the cooperative efforts of interdisciplinary scientific teams to achieve its primary goal of rapidly measuring global gene expression patterns. A collaborative effort was established to produce a chemically reactive surface on glass slide substrates to which unmodified DNA will covalently bind for improvement of cDNA microarray technology. Using the p-aminophenyl trimethoxysilane (ATMS)/diazotization chemistry that was developed, microarrays were fabricated and analyzed. This immobilization method produced uniform spots containing equivalent or greater amounts of DNA than commercially available immobilization techniques. In addition, hybridization analyses of microarrays made with ATMS/diazotization chemistry showed very sensitive detection of the target sequence, two to three orders of magnitude more sensitive than the commercial chemistries. Repeated stripping and re-hybridization of these slides showed that DNA loss was minimal, allowing multiple rounds of hybridization. Thus, the ATMS/diazotization chemistry facilitated covalent binding of unmodified DNA, and the reusable microarrays that were produced showed enhanced levels of hybridization and very low background fluorescence.

  20. Revealing Transcriptome Landscape of Mouse Spermatogonial Cells by Tiling Microarray

    PubMed Central

    Lee, Tin-Lap.; Rennert, Owen M.; Chan, Wai-Yee.

    2014-01-01

    Summary Spermatogenesis is a highly regulated developmental process by which spermatogonia develop into mature spermatozoa. This process involves many testis- or male germ cell-specific events through tightly regulated gene expression programs. In the past decade the advent of microarray technologies has allowed functional genomic studies of male germ cell development, resulting in the identification of genes governing various processes. A major limitation with conventional gene expression microarray is that there is a bias from gene probe design. The gene probes for expression microarrays are usually represented by a small number probes located at the 3’ end of a transcirpt. Tiling microarrays eliminate such issue by interrogating the genome in an unbiased fashion through probes tiled for the entire genome. These arrays provide a higher genomic resolution and allow identification of novel transcripts. To reveal the complexity of the genomic landscape of developing male germ cells, we applied tiling microarray to evaluate the transcriptome in spermatogonial cells. Over 50% of the mouse and rat genome are expressed during testicular development. More than 47% of transcripts are uncharacterized. The results suggested the transcription machinery in spermaotogonial cells are more complex than previously envisioned. PMID:22144238

  1. Do DNA Microarrays Tell the Story of Gene Expression?

    PubMed Central

    Rosenfeld, Simon

    2010-01-01

    Poor reproducibility of microarray measurements is a major obstacle to their application as an instrument for clinical diagnostics. In this paper, several aspects of poor reproducibility are analyzed. All of them belong to the category of interpretive weaknesses of DNA microarray technology. First, the attention is drawn to the fact that absence of the information regarding post-transcriptional mRNA stability makes it impossible to evaluate the level of gene activity from the relative mRNA abundances, the quantities available from microarray measurements. Second, irreducible intracellular variability with persistent patterns of stochasticity and burstiness put natural limits to reproducibility. Third, strong interactions within intracellular biomolecular networks make it highly problematic to build a bridge between transcription rates of individual genes and structural fidelity of their genetic codes. For these reasons, the microarray measurements of relative mRNA abundances are more appropriate in laboratory settings as a tool for scientific research, hypotheses generating and producing the leads for subsequent validation through more sophisticated technologies. As to clinical settings, where firm conclusive diagnoses, not the leads for further experimentation, are required, microarrays still have a long way to go until they become a reliable instrument in patient-related decision making. PMID:20628535

  2. PNA microarrays for hybridisation of unlabelled DNA samples

    PubMed Central

    Brandt, Ole; Feldner, Julia; Stephan, Achim; Schröder, Markus; Schnölzer, Martina; Arlinghaus, Heinrich F.; Hoheisel, Jörg D.; Jacob, Anette

    2003-01-01

    Several strategies have been developed for the production of peptide nucleic acid (PNA) microarrays by parallel probe synthesis and selective coupling of full-length molecules. Such microarrays were used for direct detection of the hybridisation of unlabelled DNA by time-of-flight secondary ion mass spectrometry. PNAs were synthesised by an automated process on filter-bottom microtitre plates. The resulting molecules were released from the solid support and attached without any purification to microarray surfaces via the terminal amino group itself or via modifications, which had been chemically introduced during synthesis. Thus, only full-length PNA oligomers were attached whereas truncated molecules, produced during synthesis because of incomplete condensation reactions, did not bind. Different surface chemistries and fitting modifications of the PNA terminus were tested. For an examination of coupling selectivity, bound PNAs were cleaved off microarray surfaces and analysed by MALDI-TOF mass spectrometry. Additionally, hybridisation experiments were performed to compare the attachment chemistries, with fully acetylated PNAs spotted as controls. Upon hybridisation of unlabelled DNA to such microarrays, binding events could be detected by visualisation of phosphates, which are an integral part of nucleic acids but missing entirely in PNA probes. Overall best results in terms of selectivity and sensitivity were obtained with thiol-modified PNAs on maleimide surfaces. PMID:14500847

  3. Optimized T7 amplification system for microarray analysis.

    PubMed

    Pabón, C; Modrusan, Z; Ruvolo, M V; Coleman, I M; Daniel, S; Yue, H; Arnold, L J

    2001-10-01

    Glass cDNA microarray technologies offer a highly parallel approach for profiling expressed gene sequences in disease-relevant tissues. However, standard hybridization and detection protocols are insufficient for milligram quantities of tissue, such as those derived from needle biopsies. Amplification systems utilizing T7 RNA polymerase can provide multiple cRNA copies from mRNA transcripts, permitting microarray studies with reduced sample inputs. Here, we describe an optimized T7-based amplification system for microarray analysis that yields between 200- and 700-fold amplification. This system was evaluated with both mRNA and total RNA samples and provided microarray sensitivity and precision that are comparable to our standard production process without amplification. The size distributions of amplified cRNA ranged from 200 bp to 4 kb and were similar to original mRNA profiles. These amplified cRNA samples were fluorescently labeled by reverse transcription and hybridized to microarrays comprising approximately 10,000 cDNA targets using a dual-channel format. Replicate hybridization experiments were conducted with the same and different tissues in each channel to assess the sensitivity and precision of differential expression ratios. Statistical analysis of differential expression ratios showed the lower limit of detection to be about 2-fold within and between amplified data sets, and about 3-fold when comparing amplified data to unamplified data (99.5% confidence).

  4. A facile method for the construction of oligonucleotide microarrays.

    PubMed

    Sethi, Dalip; Kumar, A; Gupta, K C; Kumar, P

    2008-11-19

    In recent years, the oligonucleotide-based microarray technique has emerged as a powerful and promising tool for various molecular biological studies. Here, a facile protocol for the construction of an oligonucleotide microarray is demonstrated that involves immobilization of oligonucleotide-trimethoxysilyl conjugates onto virgin glass microslides. The projected immobilization strategy reflects high immobilization efficiency ( approximately 36-40%) and signal-to-noise ratio ( approximately 98), and hybridization efficiency ( approximately 32-35%). Using the proposed protocol, aminoalkyl, mercaptoalkyl, and phosphorylated oligonucleotides were immobilized onto virgin glass microslides. Briefly, modified oligonucleotides were reacted first with 3-glycidyloxypropyltriethoxysilane (GOPTS), and subsequently, the resultant conjugates were directly immobilized onto the virgin glass surface by making use of silanization chemistry. The constructed microarrays were then used for discrimination of base mismatches. On subjecting to different pH and thermal conditions, the microarray showed sufficient stability. Application of this chemistry to manufacture oligonucleotide probe-based microarrays for detection of bacterial meningitis is demonstrated. Single-step reaction for the formation of conjugates with the commercially available reagent (GOPTS), omission of capping step and surface modification, and efficient immobilization of oligonucleotides onto the virgin glass surface are the key features of the proposed strategy.

  5. Identification of immunodominant antigens of Chlamydia trachomatis using proteome microarrays

    PubMed Central

    Molina, Douglas M.; Pal, Sukumar; Kayala, Mathew A.; Teng, Andy; Kim, Paul J.; Baldi, Pierre; Felgner, Philip L.; Liang, Xiaowu; de la Maza, Luis M.

    2011-01-01

    Chlamydia trachomatis is the most common bacterial sexually transmitted pathogen in the world. In order to control this infection, there is an urgent need to formulate a vaccine. Identification of protective antigens is required to implement a subunit vaccine. To identify potential antigen vaccine candidates, three strains of mice, BALB/c, C3H/HeN and C57BL/6, were inoculated with live and inactivated C. trachomatis mouse pneumonitis (MoPn) by different routes of immunization. Using a protein microarray, serum samples collected after immunization were tested for the presence of antibodies against specific chlamydial antigens. A total of 225 open reading frames (ORF) of the C. trachomatis genome were cloned, expressed, and printed in the microarray. Using this protein microarray, a total of seven C. trachomatis dominant antigens were identified (TC0052, TC0189, TC0582, TC0660, TC0726, TC0816 and, TC0828) as recognized by IgG antibodies from all three strains of animals after immunization. In addition, the microarray was probed to determine if the antibody response exhibited a Th1 or Th2 bias. Animals immunized with live organisms mounted a predominant Th1 response against most of the chlamydial antigens while mice immunized with inactivated Chlamydia mounted a Th2-biased response. In conclusion, using a high throughput protein microarray we have identified a set of novel proteins that can be tested for their ability to protect against a chlamydial infection. PMID:20044059

  6. Polymer microfluidic chip for online monitoring of microarray hybridizations.

    PubMed

    Noerholm, Mikkel; Bruus, Henrik; Jakobsen, Mogens H; Telleman, Pieter; Ramsing, Niels B

    2004-02-01

    A disposable single use polymer microfluidics chip has been developed and manufactured by micro injection molding. The chip has the same outer dimensions as a standard microscope slide (25 x 76 x 1.1 mm) and is designed to be compatible with existing microscope slide handling equipment like microarray scanners. The chip contains an inlet, a 10 microL hybridization chamber capable of holding a 1000 spot array, a waste chamber and a vent to allow air to escape when sample is injected. The hybridization chamber ensures highly homogeneous hybridization conditions across the microarray. We describe the use of this chip in a flexible setup with fluorescence based detection, temperature control and liquid handling by computer controlled syringe pumps. The chip and the setup presented in this article provide a powerful tool for highly parallel studies of kinetics and thermodynamics of duplex formation in DNA microarrays. The experimental setup presented in this article enables the on-chip microarray to be hybridized and monitored at several different stringency conditions during a single assay. The performance of the chip and the setup is demonstrated by on-line measurements of a hybridization of a DNA target solution to a microarray. A presented numerical model indicates that the hybridization process in microfluidic hybridization assays is diffusion limited, due to the low values of the diffusion coefficients D of the DNA and RNA molecules involved.

  7. Supervised and unsupervised discretization methods for evolutionary algorithms

    SciTech Connect

    Cantu-Paz, E

    2001-01-24

    This paper introduces simple model-building evolutionary algorithms (EAs) that operate on continuous domains. The algorithms are based on supervised and unsupervised discretization methods that have been used as preprocessing steps in machine learning. The basic idea is to discretize the continuous variables and use the discretization as a simple model of the solutions under consideration. The model is then used to generate new solutions directly, instead of using the usual operators based on sexual recombination and mutation. The algorithms presented here have fewer parameters than traditional and other model-building EAs. They expect that the proposed algorithms that use multivariate models scale up better to the dimensionality of the problem than existing EAs.

  8. Comparing Binaural Pre-processing Strategies II: Speech Intelligibility of Bilateral Cochlear Implant Users.

    PubMed

    Baumgärtel, Regina M; Hu, Hongmei; Krawczyk-Becker, Martin; Marquardt, Daniel; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Bomke, Katrin; Plotz, Karsten; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias

    2015-12-30

    Several binaural audio signal enhancement algorithms were evaluated with respect to their potential to improve speech intelligibility in noise for users of bilateral cochlear implants (CIs). 50% speech reception thresholds (SRT50) were assessed using an adaptive procedure in three distinct, realistic noise scenarios. All scenarios were highly nonstationary, complex, and included a significant amount of reverberation. Other aspects, such as the perfectly frontal target position, were idealized laboratory settings, allowing the algorithms to perform better than in corresponding real-world conditions. Eight bilaterally implanted CI users, wearing devices from three manufacturers, participated in the study. In all noise conditions, a substantial improvement in SRT50 compared to the unprocessed signal was observed for most of the algorithms tested, with the largest improvements generally provided by binaural minimum variance distortionless response (MVDR) beamforming algorithms. The largest overall improvement in speech intelligibility was achieved by an adaptive binaural MVDR in a spatially separated, single competing talker noise scenario. A no-pre-processing condition and adaptive differential microphones without a binaural link served as the two baseline conditions. SRT50 improvements provided by the binaural MVDR beamformers surpassed the performance of the adaptive differential microphones in most cases. Speech intelligibility improvements predicted by instrumental measures were shown to account for some but not all aspects of the perceptually obtained SRT50 improvements measured in bilaterally implanted CI users.

  9. Visual natural feature tracking for autonomous spacecraft guidance by symbolic preprocessing and associative memories

    NASA Astrophysics Data System (ADS)

    Poelzleitner, W.; Paar, G.; Schwingshakl, G.

    1991-10-01

    The current status of development to design an autonomous vision system for spacecraft guidance in special scenarios is described. Algorithms developed are especially promising in a novel approach to detect potential landmark areas to guide a spacecraft in the descent and landing phases. The overall structure of the system was designed, and special break points for crucial parts have been determined in simulation.

  10. Research on registration algorithm for check seal verification

    NASA Astrophysics Data System (ADS)

    Wang, Shuang; Liu, Tiegen

    2008-03-01

    Nowadays seals play an important role in China. With the development of social economy, the traditional method of manual check seal identification can't meet the need s of banking transactions badly. This paper focus on pre-processing and registration algorithm for check seal verification using theory of image processing and pattern recognition. First of all, analyze the complex characteristics of check seals. To eliminate the difference of producing conditions and the disturbance caused by background and writing in check image, many methods are used in the pre-processing of check seal verification, such as color components transformation, linearity transform to gray-scale image, medium value filter, Otsu, close calculations and labeling algorithm of mathematical morphology. After the processes above, the good binary seal image can be obtained. On the basis of traditional registration algorithm, a double-level registration method including rough and precise registration method is proposed. The deflection angle of precise registration method can be precise to 0.1°. This paper introduces the concepts of difference inside and difference outside and use the percent of difference inside and difference outside to judge whether the seal is real or fake. The experimental results of a mass of check seals are satisfied. It shows that the methods and algorithmic presented have good robustness to noise sealing conditions and satisfactory tolerance of difference within class.

  11. A Versatile Microarray Platform for Capturing Rare Cells

    NASA Astrophysics Data System (ADS)

    Brinkmann, Falko; Hirtz, Michael; Haller, Anna; Gorges, Tobias M.; Vellekoop, Michael J.; Riethdorf, Sabine; Müller, Volkmar; Pantel, Klaus; Fuchs, Harald

    2015-10-01

    Analyses of rare events occurring at extremely low frequencies in body fluids are still challenging. We established a versatile microarray-based platform able to capture single target cells from large background populations. As use case we chose the challenging application of detecting circulating tumor cells (CTCs) - about one cell in a billion normal blood cells. After incubation with an antibody cocktail, targeted cells are extracted on a microarray in a microfluidic chip. The accessibility of our platform allows for subsequent recovery of targets for further analysis. The microarray facilitates exclusion of false positive capture events by co-localization allowing for detection without fluorescent labelling. Analyzing blood samples from cancer patients with our platform reached and partly outreached gold standard performance, demonstrating feasibility for clinical application. Clinical researchers free choice of antibody cocktail without need for altered chip manufacturing or incubation protocol, allows virtual arbitrary targeting of capture species and therefore wide spread applications in biomedical sciences.

  12. Short time-series microarray analysis: Methods and challenges

    PubMed Central

    Wang, Xuewei; Wu, Ming; Li, Zheng; Chan, Christina

    2008-01-01

    The detection and analysis of steady-state gene expression has become routine. Time-series microarrays are of growing interest to systems biologists for deciphering the dynamic nature and complex regulation of biosystems. Most temporal microarray data only contain a limited number of time points, giving rise to short-time-series data, which imposes challenges for traditional methods of extracting meaningful information. To obtain useful information from the wealth of short-time series data requires addressing the problems that arise due to limited sampling. Current efforts have shown promise in improving the analysis of short time-series microarray data, although challenges remain. This commentary addresses recent advances in methods for short-time series analysis including simplification-based approaches and the integration of multi-source information. Nevertheless, further studies and development of computational methods are needed to provide practical solutions to fully exploit the potential of this data. PMID:18605994

  13. A novel ensemble machine learning for robust microarray data classification.

    PubMed

    Peng, Yonghong

    2006-06-01

    Microarray data analysis and classification has demonstrated convincingly that it provides an effective methodology for the effective diagnosis of diseases and cancers. Although much research has been performed on applying machine learning techniques for microarray data classification during the past years, it has been shown that conventional machine learning techniques have intrinsic drawbacks in achieving accurate and robust classifications. This paper presents a novel ensemble machine learning approach for the development of robust microarray data classification. Different from the conventional ensemble learning techniques, the approach presented begins with generating a pool of candidate base classifiers based on the gene sub-sampling and then the selection of a sub-set of appropriate base classifiers to construct the classification committee based on classifier clustering. Experimental results have demonstrated that the classifiers constructed by the proposed method outperforms not only the classifiers generated by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods (bagging and boosting).

  14. Protein Microarrays with Novel Microfluidic Methods: Current Advances

    PubMed Central

    Dixit, Chandra K.; Aguirre, Gerson R.

    2014-01-01

    Microfluidic-based micromosaic technology has allowed the pattering of recognition elements in restricted micrometer scale areas with high precision. This controlled patterning enabled the development of highly multiplexed arrays multiple analyte detection. This arraying technology was first introduced in the beginning of 2001 and holds tremendous potential to revolutionize microarray development and analyte detection. Later, several microfluidic methods were developed for microarray application. In this review we discuss these novel methods and approaches which leverage the property of microfluidic technologies to significantly improve various physical aspects of microarray technology, such as enhanced imprinting homogeneity, stability of the immobilized biomolecules, decreasing assay times, and reduction of the costs and of the bulky instrumentation. PMID:27600343

  15. Comparative analysis of genomic signal processing for microarray data clustering.

    PubMed

    Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe

    2011-12-01

    Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods.

  16. A Protein Microarray ELISA for Screening Biological Fluids

    SciTech Connect

    Varnum, Susan M.; Woodbury, Ronald L.; Zangar, Richard C.

    2004-02-01

    Protein microarrays permit the simultaneous measurement of many proteins in a small sample volume and therefore provide an attractive approach for the quantitative measurement of proteins in biological fluids, including serum. This chapter describes a microarray ELISA assay. Capture antibodies are immobilized onto a glass surface, the covalently attached antibodies bind a specific antigen from a sample overlaying the array. A second, biotinylated antibody that recognizes the same antigen as the first antibody but at a different epitope is then used for detection. Detection is based upon an enzymatic signal enhancement method known as tyramide signal amplification (TSA). By coupling a microarray-ELISA format with the signal amplification of tyramide deposition, the assay sensitivity is as low as sub-pg/ml.

  17. Hydrogel micro-arrays for multi-analyte detection

    NASA Astrophysics Data System (ADS)

    Rounds, Rebecca M.; Lee, Seungjoon; Jeffords, Sarah; Ibey, Bennett L.; Pishko, Michael V.; Coté, Gerard L.

    2007-02-01

    Fluorescent microarrays have the ability to detect and monitor multiple analytes simultaneously and noninvasively, following initial placement. This versatility is advantageous for several biological applications including drug discovery, biohazard detection, transplant organ preservation and cell culture monitoring. In this work, poly(ethylene glycol) hydrogel microarrays are described that can be used to measure multiple analytes, including H+ and dissolved oxygen. The array elements are created by filling micro-channels with a hydrogel precursor solution containing analyte specific fluorescent sensors. A photomask is used to create the microarray through UV polymerization of the PEG precursor solution. A compact imaging system composed of a CCD camera, high powered LED, and two optical filters is used to measure the change in fluorescence emission corresponding to analyte concentration. The proposed system was tested in aqueous solution by altering relevant analyte concentrations across their biological ranges.

  18. Microarray-based maps of copy-number variant regions in European and sub-Saharan populations.

    PubMed

    Vogler, Christian; Gschwind, Leo; Röthlisberger, Benno; Huber, Andreas; Filges, Isabel; Miny, Peter; Auschra, Bianca; Stetak, Attila; Demougin, Philippe; Vukojevic, Vanja; Kolassa, Iris-Tatjana; Elbert, Thomas; de Quervain, Dominique J-F; Papassotiropoulos, Andreas

    2010-12-16

    The genetic basis of phenotypic variation can be partially explained by the presence of copy-number variations (CNVs). Currently available methods for CNV assessment include high-density single-nucleotide polymorphism (SNP) microarrays that have become an indispensable tool in genome-wide association studies (GWAS). However, insufficient concordance rates between different CNV assessment methods call for cautious interpretation of results from CNV-based genetic association studies. Here we provide a cross-population, microarray-based map of copy-number variant regions (CNVRs) to enable reliable interpretation of CNV association findings. We used the Affymetrix Genome-Wide Human SNP Array 6.0 to scan the genomes of 1167 individuals from two ethnically distinct populations (Europe, N=717; Rwanda, N=450). Three different CNV-finding algorithms were tested and compared for sensitivity, specificity, and feasibility. Two algorithms were subsequently used to construct CNVR maps, which were also validated by processing subsamples with additional microarray platforms (Illumina 1M-Duo BeadChip, Nimblegen 385K aCGH array) and by comparing our data with publicly available information. Both algorithms detected a total of 42669 CNVs, 74% of which clustered in 385 CNVRs of a cross-population map. These CNVRs overlap with 862 annotated genes and account for approximately 3.3% of the haploid human genome.We created comprehensive cross-populational CNVR-maps. They represent an extendable framework that can leverage the detection of common CNVs and additionally assist in interpreting CNV-based association studies.

  19. Component Labeling Algorithm For Video Rate Processing

    NASA Astrophysics Data System (ADS)

    Gotoh, Toshiyuki; Ohta, Yoshiyuki; Yoshida, Masumi; Shirai, Yoshio

    1987-10-01

    In this paper, we propose a raster scanning algorithm for component labeling, which enables processing under pipeline architecture. In the raster scanning algorithm, labels are provisionally assigned to each pixel of components and, at the same time, the connectivities of labels are detected at first scan. Those labels are classified into groups based on the connectivities. Finally provisional labels are updated using the result of classification and a unique label is assigned to each pixel of components. However, in the conventional algorithm, the classification process needs a vast number of operations. This prevents realizing pipeline processing. We have developed a method of preprocessing to reduce the number of provisional labels, which limits the number of label connectivities. We have also developed a new classification method whose operation is proportionate to only the number of label connectivities itself. We have made experiments with computer simulation to verify this algorithm. The experimental results show that we can process 512 x 512 x 8 bit images at video rate(1/30 sec. per 1 image) when this algorithm is implemented on hardware.

  20. Identification of moisture content in tobacco plant leaves using outlier sample eliminating algorithms and hyperspectral data.

    PubMed

    Sun, Jun; Zhou, Xin; Wu, Xiaohong; Zhang, Xiaodong; Li, Qinglin

    2016-02-26

    Fast identification of moisture content in tobacco plant leaves plays a key role in the tobacco cultivation industry and benefits the management of tobacco plant in the farm. In order to identify moisture content of tobacco plant leaves in a fast and nondestructive way, a method involving Mahalanobis distance coupled with Monte Carlo cross validation(MD-MCCV) was proposed to eliminate outlier sample in this study. The hyperspectral data of 200 tobacco plant leaf samples of 20 moisture gradients were obtained using FieldSpc(®) 3 spectrometer. Savitzky-Golay smoothing(SG), roughness penalty smoothing(RPS), kernel smoothing(KS) and median smoothing(MS) were used to preprocess the raw spectra. In addition, Mahalanobis distance(MD), Monte Carlo cross validation(MCCV) and Mahalanobis distance coupled to Monte Carlo cross validation(MD-MCCV) were applied to select the outlier sample of the raw spectrum and four smoothing preprocessing spectra. Successive projections algorithm (SPA) was used to extract the most influential wavelengths. Multiple Linear Regression (MLR) was applied to build the prediction models based on preprocessed spectra feature in characteristic wavelengths. The results showed that the preferably four prediction model were MD-MCCV-SG (Rp(2) = 0.8401 and RMSEP = 0.1355), MD-MCCV-RPS (Rp(2) = 0.8030 and RMSEP = 0.1274), MD-MCCV-KS (Rp(2) = 0.8117 and RMSEP = 0.1433), MD-MCCV-MS (Rp(2) = 0.9132 and RMSEP = 0.1162). MD-MCCV algorithm performed best among MD algorithm, MCCV algorithm and the method without sample pretreatment algorithm in the eliminating outlier sample from 20 different moisture gradients of tobacco plant leaves and MD-MCCV can be used to eliminate outlier sample in the spectral preprocessing.

  1. Cross-platform normalization of microarray and RNA-seq data for machine learning applications

    PubMed Central

    Thompson, Jeffrey A.; Tan, Jie

    2016-01-01

    Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language. PMID:26844019

  2. Cross-platform normalization of microarray and RNA-seq data for machine learning applications.

    PubMed

    Thompson, Jeffrey A; Tan, Jie; Greene, Casey S

    2016-01-01

    Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log 2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language.

  3. Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data

    PubMed Central

    Zhao, Xin; Cheung, Leo Wang-Kit

    2007-01-01

    Background Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal. Results A hierarchical statistical model named kernel-imbedded Gaussian process (KIGP) is developed under a unified Bayesian framework for binary disease classification problems using microarray gene expression data. In particular, based on a probit regression setting, an adaptive algorithm with a cascading structure is designed to find the appropriate kernel, to discover the potentially significant genes, and to make the optimal class prediction accordingly. A Gibbs sampler is built as the core of the algorithm to make Bayesian inferences

  4. Understanding the physics of oligonucleotide microarrays: the Affymetrix spike-in data reanalysed

    NASA Astrophysics Data System (ADS)

    Burden, Conrad J.

    2008-03-01

    The Affymetrix U95 and U133 Latin-Square spike-in datasets are reanalysed, together with a dataset from a version of the U95 spike-in experiment without a complex non-specific background. The approach uses a physico-chemical model which includes the effects of the specific and non-specific hybridization and probe folding at the microarray surface, target folding and hybridization in the bulk RNA target solution and duplex dissociation during the post-hybridization washing phase. The model predicts a three-parameter hyperbolic response function that fits well with fluorescence intensity data from all the three datasets. The importance of the various hybridization and washing effects in determining each of the three parameters is examined, and some guidance is given as to how a practical algorithm for determining specific target concentrations might be developed.

  5. An efficient coding algorithm for the compression of ECG signals using the wavelet transform.

    PubMed

    Rajoub, Bashar A

    2002-04-01

    A wavelet-based electrocardiogram (ECG) data compression algorithm is proposed in this paper. The ECG signal is first preprocessed, the discrete wavelet transform (DWT) is then applied to the preprocessed signal. Preprocessing guarantees that the magnitudes of the wavelet coefficients be less than one, and reduces the reconstruction errors near both ends of the compressed signal. The DWT coefficients are divided into three groups, each group is thresholded using a threshold based on a desired energy packing efficiency. A binary significance map is then generated by scanning the wavelet decomposition coefficients and outputting a binary one if the scanned coefficient is significant, and a binary zero if it is insignificant. Compression is achieved by 1) using a variable length code based on run length encoding to compress the significance map and 2) using direct binary representation for representing the significant coefficients. The ability of the coding algorithm to compress ECG signals is investigated, the results were obtained by compressing and decompressing the test signals. The proposed algorithm is compared with direct-based and wavelet-based compression algorithms and showed superior performance. A compression ratio of 24:1 was achieved for MIT-BIH record 117 with a percent root mean square difference as low as 1.08%.

  6. Nanodroplet chemical microarrays and label-free assays.

    PubMed

    Gosalia, Dhaval; Diamond, Scott L

    2010-01-01

    The microarraying of chemicals or biomolecules on a glass surface allows for dense storage and miniaturized screening experiments and can be deployed in chemical-biology research or drug discovery. Microarraying allows the production of scores of replicate slides. Small molecule libraries are typically stored as 10 mM DMSO stock solutions, whereas libraries of biomolecules are typically stored in high percentages of glycerol. Thus, a method is required to print such libraries on microarrays, and then assay them against biological targets. By printing either small molecule libraries or biomolecule libraries in an aqueous solvent containing glycerol, each adherent nanodroplet remains fixed at a position on the microarray by surface tension without the use of wells, without evaporating, and without the need for chemically linking the compound to the surface. Importantly, glycerol is a high boiling point solvent that is fully miscible with DMSO and water and has the additional property of stabilizing various enzymes. The nanoliter volume of the droplet forms the reaction compartment once additional reagents are metered onto the microarray, either by aerosol spray deposition or by addressable acoustic dispensing. Incubation of the nanodroplet microarray in a high humidity environment controls the final water content of the reaction. This platform has been validated for fluorescent HTS assays of protease and kinases as well as for fluorogenic substrate profiling of proteases. Label-free HTS is also possible by running nanoliter HTS reactions on a MALDI target for mass spectrometry (MS) analysis without the need for desalting of the samples. A method is described for running nanoliter-scale multicomponent homogeneous reactions followed by label-free MALDI MS spectrometry analysis of the reactions.

  7. Karyotype versus Microarray Testing for Genetic Abnormalities after Stillbirth

    PubMed Central

    Reddy, Uma M.; Page, Grier P.; Saade, George R.; Silver, Robert M.; Thorsten, Vanessa R.; Parker, Corette B.; Pinar, Halit; Willinger, Marian; Stoll, Barbara J.; Heim-Hall, Josefine; Varner, Michael W.; Goldenberg, Robert L.; Bukowski, Radek; Wapner, Ronald J.; Drews-Botsch, Carolyn D.; O’Brien, Barbara M.; Dudley, Donald J.; Levy, Brynn

    2015-01-01

    Background Genetic abnormalities have been associated with 6 to 13% of stillbirths, but the true prevalence may be higher. Unlike karyotype analysis, microarray analysis does not require live cells, and it detects small deletions and duplications called copy-number variants. Methods The Stillbirth Collaborative Research Network conducted a population-based study of stillbirth in five geographic catchment areas. Standardized postmortem examinations and karyotype analyses were performed. A single-nucleotide polymorphism array was used to detect copy-number variants of at least 500 kb in placental or fetal tissue. Variants that were not identified in any of three databases of apparently unaffected persons were then classified into three groups: probably benign, clinical significance unknown, or pathogenic. We compared the results of karyotype and microarray analyses of samples obtained after delivery. Results In our analysis of samples from 532 stillbirths, microarray analysis yielded results more often than did karyotype analysis (87.4% vs. 70.5%, P<0.001) and provided better detection of genetic abnormalities (aneuploidy or pathogenic copy-number variants, 8.3% vs. 5.8%; P = 0.007). Microarray analysis also identified more genetic abnormalities among 443 antepartum stillbirths (8.8% vs. 6.5%, P = 0.02) and 67 stillbirths with congenital anomalies (29.9% vs. 19.4%, P = 0.008). As compared with karyotype analysis, microarray analysis provided a relative increase in the diagnosis of genetic abnormalities of 41.9% in all stillbirths, 34.5% in antepartum stillbirths, and 53.8% in stillbirths with anomalies. Conclusions Microarray analysis is more likely than karyotype analysis to provide a genetic diagnosis, primarily because of its success with nonviable tissue, and is especially valuable in analyses of stillbirths with congenital anomalies or in cases in which karyotype results cannot be obtained. (Funded by the Eunice Kennedy Shriver National Institute of Child Health

  8. Are glycan biosensors an alternative to glycan microarrays?

    PubMed Central

    Hushegyi, A.

    2016-01-01

    Complex carbohydrates (glycans) play an important role in nature and study of their interaction with proteins or intact cells can be useful for understanding many physiological and pathological processes. Such interactions have been successfully interrogated in a highly parallel way using glycan microarrays, but this technique has some limitations. Thus, in recent years glycan biosensors in numerous progressive configurations have been developed offering distinct advantages compared to glycan microarrays. Thus, in this review advances achieved in the field of label-free glycan biosensors are discussed. PMID:27231487

  9. Reply to 'Linking probe thermodynamics to microarray quantification'