Sample records for gene selection based

  1. A Cancer Gene Selection Algorithm Based on the K-S Test and CFS.

    PubMed

    Su, Qiang; Wang, Yina; Jiang, Xiaobing; Chen, Fuxue; Lu, Wen-Cong

    2017-01-01

    To address the challenging problem of selecting distinguished genes from cancer gene expression datasets, this paper presents a gene subset selection algorithm based on the Kolmogorov-Smirnov (K-S) test and correlation-based feature selection (CFS) principles. The algorithm selects distinguished genes first using the K-S test, and then, it uses CFS to select genes from those selected by the K-S test. We adopted support vector machines (SVM) as the classification tool and used the criteria of accuracy to evaluate the performance of the classifiers on the selected gene subsets. This approach compared the proposed gene subset selection algorithm with the K-S test, CFS, minimum-redundancy maximum-relevancy (mRMR), and ReliefF algorithms. The average experimental results of the aforementioned gene selection algorithms for 5 gene expression datasets demonstrate that, based on accuracy, the performance of the new K-S and CFS-based algorithm is better than those of the K-S test, CFS, mRMR, and ReliefF algorithms. The experimental results show that the K-S test-CFS gene selection algorithm is a very effective and promising approach compared to the K-S test, CFS, mRMR, and ReliefF algorithms.

  2. Statistical approach for selection of biologically informative genes.

    PubMed

    Das, Samarendra; Rai, Anil; Mishra, D C; Rai, Shesh N

    2018-05-20

    Selection of informative genes from high dimensional gene expression data has emerged as an important research area in genomics. Many gene selection techniques have been proposed so far are either based on relevancy or redundancy measure. Further, the performance of these techniques has been adjudged through post selection classification accuracy computed through a classifier using the selected genes. This performance metric may be statistically sound but may not be biologically relevant. A statistical approach, i.e. Boot-MRMR, was proposed based on a composite measure of maximum relevance and minimum redundancy, which is both statistically sound and biologically relevant for informative gene selection. For comparative evaluation of the proposed approach, we developed two biological sufficient criteria, i.e. Gene Set Enrichment with QTL (GSEQ) and biological similarity score based on Gene Ontology (GO). Further, a systematic and rigorous evaluation of the proposed technique with 12 existing gene selection techniques was carried out using five gene expression datasets. This evaluation was based on a broad spectrum of statistically sound (e.g. subject classification) and biological relevant (based on QTL and GO) criteria under a multiple criteria decision-making framework. The performance analysis showed that the proposed technique selects informative genes which are more biologically relevant. The proposed technique is also found to be quite competitive with the existing techniques with respect to subject classification and computational time. Our results also showed that under the multiple criteria decision-making setup, the proposed technique is best for informative gene selection over the available alternatives. Based on the proposed approach, an R Package, i.e. BootMRMR has been developed and available at https://cran.r-project.org/web/packages/BootMRMR. This study will provide a practical guide to select statistical techniques for selecting informative genes from high dimensional expression data for breeding and system biology studies. Published by Elsevier B.V.

  3. Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification

    PubMed Central

    2012-01-01

    Background Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. Results This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. Conclusions It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network. PMID:22830977

  4. Recursive feature selection with significant variables of support vectors.

    PubMed

    Tsai, Chen-An; Huang, Chien-Hsun; Chang, Ching-Wei; Chen, Chun-Houh

    2012-01-01

    The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.

  5. The Cross-Entropy Based Multi-Filter Ensemble Method for Gene Selection.

    PubMed

    Sun, Yingqiang; Lu, Chengbo; Li, Xiaobo

    2018-05-17

    The gene expression profile has the characteristics of a high dimension, low sample, and continuous type, and it is a great challenge to use gene expression profile data for the classification of tumor samples. This paper proposes a cross-entropy based multi-filter ensemble (CEMFE) method for microarray data classification. Firstly, multiple filters are used to select the microarray data in order to obtain a plurality of the pre-selected feature subsets with a different classification ability. The top N genes with the highest rank of each subset are integrated so as to form a new data set. Secondly, the cross-entropy algorithm is used to remove the redundant data in the data set. Finally, the wrapper method, which is based on forward feature selection, is used to select the best feature subset. The experimental results show that the proposed method is more efficient than other gene selection methods and that it can achieve a higher classification accuracy under fewer characteristic genes.

  6. Gene Selection and Cancer Classification: A Rough Sets Based Approach

    NASA Astrophysics Data System (ADS)

    Sun, Lijun; Miao, Duoqian; Zhang, Hongyun

    Indentification of informative gene subsets responsible for discerning between available samples of gene expression data is an important task in bioinformatics. Reducts, from rough sets theory, corresponding to a minimal set of essential genes for discerning samples, is an efficient tool for gene selection. Due to the compuational complexty of the existing reduct algoritms, feature ranking is usually used to narrow down gene space as the first step and top ranked genes are selected . In this paper,we define a novel certierion based on the expression level difference btween classes and contribution to classification of the gene for scoring genes and present a algorithm for generating all possible reduct from informative genes.The algorithm takes the whole attribute sets into account and find short reduct with a significant reduction in computational complexity. An exploration of this approach on benchmark gene expression data sets demonstrates that this approach is successful for selecting high discriminative genes and the classification accuracy is impressive.

  7. Gene selection for tumor classification using neighborhood rough sets and entropy measures.

    PubMed

    Chen, Yumin; Zhang, Zunjun; Zheng, Jianzhong; Ma, Ying; Xue, Yu

    2017-03-01

    With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Multi-level gene/MiRNA feature selection using deep belief nets and active learning.

    PubMed

    Ibrahim, Rania; Yousri, Noha A; Ismail, Mohamed A; El-Makky, Nagwa M

    2014-01-01

    Selecting the most discriminative genes/miRNAs has been raised as an important task in bioinformatics to enhance disease classifiers and to mitigate the dimensionality curse problem. Original feature selection methods choose genes/miRNAs based on their individual features regardless of how they perform together. Considering group features instead of individual ones provides a better view for selecting the most informative genes/miRNAs. Recently, deep learning has proven its ability in representing the data in multiple levels of abstraction, allowing for better discrimination between different classes. However, the idea of using deep learning for feature selection is not widely used in the bioinformatics field yet. In this paper, a novel multi-level feature selection approach named MLFS is proposed for selecting genes/miRNAs based on expression profiles. The approach is based on both deep and active learning. Moreover, an extension to use the technique for miRNAs is presented by considering the biological relation between miRNAs and genes. Experimental results show that the approach was able to outperform classical feature selection methods in hepatocellular carcinoma (HCC) by 9%, lung cancer by 6% and breast cancer by around 10% in F1-measure. Results also show the enhancement in F1-measure of our approach over recently related work in [1] and [2].

  9. MicroRNA-integrated and network-embedded gene selection with diffusion distance.

    PubMed

    Huang, Di; Zhou, Xiaobo; Lyon, Christopher J; Hsueh, Willa A; Wong, Stephen T C

    2010-10-29

    Gene network information has been used to improve gene selection in microarray-based studies by selecting marker genes based both on their expression and the coordinate expression of genes within their gene network under a given condition. Here we propose a new network-embedded gene selection model. In this model, we first address the limitations of microarray data. Microarray data, although widely used for gene selection, measures only mRNA abundance, which does not always reflect the ultimate gene phenotype, since it does not account for post-transcriptional effects. To overcome this important (critical in certain cases) but ignored-in-almost-all-existing-studies limitation, we design a new strategy to integrate together microarray data with the information of microRNA, the major post-transcriptional regulatory factor. We also handle the challenges led by gene collaboration mechanism. To incorporate the biological facts that genes without direct interactions may work closely due to signal transduction and that two genes may be functionally connected through multi paths, we adopt the concept of diffusion distance. This concept permits us to simulate biological signal propagation and therefore to estimate the collaboration probability for all gene pairs, directly or indirectly-connected, according to multi paths connecting them. We demonstrate, using type 2 diabetes (DM2) as an example, that the proposed strategies can enhance the identification of functional gene partners, which is the key issue in a network-embedded gene selection model. More importantly, we show that our gene selection model outperforms related ones. Genes selected by our model 1) have improved classification capability; 2) agree with biological evidence of DM2-association; and 3) are involved in many well-known DM2-associated pathways.

  10. Identification of Genes Involved in Breast Cancer Metastasis by Integrating Protein-Protein Interaction Information with Expression Data.

    PubMed

    Tian, Xin; Xin, Mingyuan; Luo, Jian; Liu, Mingyao; Jiang, Zhenran

    2017-02-01

    The selection of relevant genes for breast cancer metastasis is critical for the treatment and prognosis of cancer patients. Although much effort has been devoted to the gene selection procedures by use of different statistical analysis methods or computational techniques, the interpretation of the variables in the resulting survival models has been limited so far. This article proposes a new Random Forest (RF)-based algorithm to identify important variables highly related with breast cancer metastasis, which is based on the important scores of two variable selection algorithms, including the mean decrease Gini (MDG) criteria of Random Forest and the GeneRank algorithm with protein-protein interaction (PPI) information. The new gene selection algorithm can be called PPIRF. The improved prediction accuracy fully illustrated the reliability and high interpretability of gene list selected by the PPIRF approach.

  11. A Filter Feature Selection Method Based on MFA Score and Redundancy Excluding and It's Application to Tumor Gene Expression Data Analysis.

    PubMed

    Li, Jiangeng; Su, Lei; Pang, Zenan

    2015-12-01

    Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.

  12. A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data.

    PubMed

    Baur, Brittany; Bozdag, Serdar

    2016-01-01

    DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.

  13. Classification of early-stage non-small cell lung cancer by weighing gene expression profiles with connectivity information.

    PubMed

    Zhang, Ao; Tian, Suyan

    2018-05-01

    Pathway-based feature selection algorithms, which utilize biological information contained in pathways to guide which features/genes should be selected, have evolved quickly and become widespread in the field of bioinformatics. Based on how the pathway information is incorporated, we classify pathway-based feature selection algorithms into three major categories-penalty, stepwise forward, and weighting. Compared to the first two categories, the weighting methods have been underutilized even though they are usually the simplest ones. In this article, we constructed three different genes' connectivity information-based weights for each gene and then conducted feature selection upon the resulting weighted gene expression profiles. Using both simulations and a real-world application, we have demonstrated that when the data-driven connectivity information constructed from the data of specific disease under study is considered, the resulting weighted gene expression profiles slightly outperform the original expression profiles. In summary, a big challenge faced by the weighting method is how to estimate pathway knowledge-based weights more accurately and precisely. Only until the issue is conquered successfully will wide utilization of the weighting methods be impossible. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm.

    PubMed

    Mao, Yong; Zhou, Xiao-Bo; Pi, Dao-Ying; Sun, You-Xian; Wong, Stephen T C

    2005-10-01

    In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.

  15. A Comparison of Selective Pressures in Plant X-Linked and Autosomal Genes.

    PubMed

    Krasovec, Marc; Nevado, Bruno; Filatov, Dmitry A

    2018-05-03

    Selection is expected to work differently in autosomal and X-linked genes because of their ploidy difference and the exposure of recessive X-linked mutations to haploid selection in males. However, it is not clear whether these expectations apply to recently evolved sex chromosomes, where many genes retain functional X- and Y-linked gametologs. We took advantage of the recently evolved sex chromosomes in the plant Silene latifolia and its closely related species to compare the selective pressures between hemizygous and non-hemizygous X-linked genes as well as between X-linked genes and autosomal genes. Our analysis, based on over 1000 genes, demonstrated that, similar to animals, X-linked genes in Silene evolve significantly faster than autosomal genes—the so-called faster-X effect. Contrary to expectations, faster-X divergence was detectable only for non-hemizygous X-linked genes. Our phylogeny-based analyses of selection revealed no evidence for faster adaptation in X-linked genes compared to autosomal genes. On the other hand, partial relaxation of purifying selection was apparent on the X-chromosome compared to the autosomes, consistent with a smaller genetic diversity in S. latifolia X-linked genes (π x = 0.016; π aut = 0.023). Thus, the faster-X divergence in S. latifolia appears to be a consequence of the smaller effective population size rather than of a faster adaptive evolution on the X-chromosome. We argue that this may be a general feature of “young” sex chromosomes, where the majority of X-linked genes are not hemizygous, preventing haploid selection in heterogametic sex.

  16. Entropy-based gene ranking without selection bias for the predictive classification of microarray data.

    PubMed

    Furlanello, Cesare; Serafini, Maria; Merler, Stefano; Jurman, Giuseppe

    2003-11-06

    We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process). With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.

  17. A Comparison of Selective Pressures in Plant X-Linked and Autosomal Genes

    PubMed Central

    Krasovec, Marc; Filatov, Dmitry A.

    2018-01-01

    Selection is expected to work differently in autosomal and X-linked genes because of their ploidy difference and the exposure of recessive X-linked mutations to haploid selection in males. However, it is not clear whether these expectations apply to recently evolved sex chromosomes, where many genes retain functional X- and Y-linked gametologs. We took advantage of the recently evolved sex chromosomes in the plant Silene latifolia and its closely related species to compare the selective pressures between hemizygous and non-hemizygous X-linked genes as well as between X-linked genes and autosomal genes. Our analysis, based on over 1000 genes, demonstrated that, similar to animals, X-linked genes in Silene evolve significantly faster than autosomal genes—the so-called faster-X effect. Contrary to expectations, faster-X divergence was detectable only for non-hemizygous X-linked genes. Our phylogeny-based analyses of selection revealed no evidence for faster adaptation in X-linked genes compared to autosomal genes. On the other hand, partial relaxation of purifying selection was apparent on the X-chromosome compared to the autosomes, consistent with a smaller genetic diversity in S. latifolia X-linked genes (πx = 0.016; πaut = 0.023). Thus, the faster-X divergence in S. latifolia appears to be a consequence of the smaller effective population size rather than of a faster adaptive evolution on the X-chromosome. We argue that this may be a general feature of “young” sex chromosomes, where the majority of X-linked genes are not hemizygous, preventing haploid selection in heterogametic sex. PMID:29751495

  18. Identifying osteosarcoma metastasis associated genes by weighted gene co-expression network analysis (WGCNA).

    PubMed

    Tian, Honglai; Guan, Donghui; Li, Jianmin

    2018-06-01

    Osteosarcoma (OS), the most common malignant bone tumor, accounts for the heavy healthy threat in the period of children and adolescents. OS occurrence usually correlates with early metastasis and high death rate. This study aimed to better understand the mechanism of OS metastasis.Based on Gene Expression Omnibus (GEO) database, we downloaded 4 expression profile data sets associated with OS metastasis, and selected differential expressed genes. Weighted gene co-expression network analysis (WGCNA) approach allowed us to investigate the most OS metastasis-correlated module. Gene Ontology functional and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were used to give annotation of selected OS metastasis-associated genes.We select 897 differential expressed genes from OS metastasis and OS non-metastasis groups. Based on these selected genes, WGCNA further explored 142 genes included in the most OS metastasis-correlated module. Gene Ontology functional and KEGG pathway enrichment analyses showed that significantly OS metastasis-associated genes were involved in pathway correlated with insulin-like growth factor binding.Our research figured out several potential molecules participating in metastasis process and factors acting as biomarker. With this study, we could better explore the mechanism of OS metastasis and further discover more therapy targets.

  19. On-the-fly selection of cell-specific enhancers, genes, miRNAs and proteins across the human body using SlideBase

    PubMed Central

    Ienasescu, Hans; Li, Kang; Andersson, Robin; Vitezic, Morana; Rennie, Sarah; Chen, Yun; Vitting-Seerup, Kristoffer; Lagoni, Emil; Boyd, Mette; Bornholdt, Jette; de Hoon, Michiel J. L.; Kawaji, Hideya; Lassmann, Timo; Hayashizaki, Yoshihide; Forrest, Alistair R. R.; Carninci, Piero; Sandelin, Albin

    2016-01-01

    Genomics consortia have produced large datasets profiling the expression of genes, micro-RNAs, enhancers and more across human tissues or cells. There is a need for intuitive tools to select subsets of such data that is the most relevant for specific studies. To this end, we present SlideBase, a web tool which offers a new way of selecting genes, promoters, enhancers and microRNAs that are preferentially expressed/used in a specified set of cells/tissues, based on the use of interactive sliders. With the help of sliders, SlideBase enables users to define custom expression thresholds for individual cell types/tissues, producing sets of genes, enhancers etc. which satisfy these constraints. Changes in slider settings result in simultaneous changes in the selected sets, updated in real time. SlideBase is linked to major databases from genomics consortia, including FANTOM, GTEx, The Human Protein Atlas and BioGPS. Database URL: http://slidebase.binf.ku.dk PMID:28025337

  20. A genomic scan for selection reveals candidates for genes involved in the evolution of cultivated sunflower (Helianthus annuus).

    PubMed

    Chapman, Mark A; Pashley, Catherine H; Wenzler, Jessica; Hvala, John; Tang, Shunxue; Knapp, Steven J; Burke, John M

    2008-11-01

    Genomic scans for selection are a useful tool for identifying genes underlying phenotypic transitions. In this article, we describe the results of a genome scan designed to identify candidates for genes targeted by selection during the evolution of cultivated sunflower. This work involved screening 492 loci derived from ESTs on a large panel of wild, primitive (i.e., landrace), and improved sunflower (Helianthus annuus) lines. This sampling strategy allowed us to identify candidates for selectively important genes and investigate the likely timing of selection. Thirty-six genes showed evidence of selection during either domestication or improvement based on multiple criteria, and a sequence-based test of selection on a subset of these loci confirmed this result. In view of what is known about the structure of linkage disequilibrium across the sunflower genome, these genes are themselves likely to have been targeted by selection, rather than being merely linked to the actual targets. While the selection candidates showed a broad range of putative functions, they were enriched for genes involved in amino acid synthesis and protein catabolism. Given that a similar pattern has been detected in maize (Zea mays), this finding suggests that selection on amino acid composition may be a general feature of the evolution of crop plants. In terms of genomic locations, the selection candidates were significantly clustered near quantitative trait loci (QTL) that contribute to phenotypic differences between wild and cultivated sunflower, and specific instances of QTL colocalization provide some clues as to the roles that these genes may have played during sunflower evolution.

  1. A Gene-Oriented Haplotype Comparison Reveals Recently Selected Genomic Regions in Temperate and Tropical Maize Germplasm

    PubMed Central

    Zhang, Jie; Li, Yongxiang; Zheng, Jun; Zhang, Hongwei; Yang, Xiaohong; Wang, Jianhua; Wang, Guoying

    2017-01-01

    The extensive genetic variation present in maize (Zea mays) germplasm makes it possible to detect signatures of positive artificial selection that occurred during temperate and tropical maize improvement. Here we report an analysis of 532,815 polymorphisms from a maize association panel consisting of 368 diverse temperate and tropical inbred lines. We developed a gene-oriented approach adapting exonic polymorphisms to identify recently selected alleles by comparing haplotypes across the maize genome. This analysis revealed evidence of selection for more than 1100 genomic regions during recent improvement, and included regulatory genes and key genes with visible mutant phenotypes. We find that selected candidate target genes in temperate maize are enriched in biosynthetic processes, and further examination of these candidates highlights two cases, sucrose flux and oil storage, in which multiple genes in a common pathway can be cooperatively selected. Finally, based on available parallel gene expression data, we hypothesize that some genes were selected for regulatory variations, resulting in altered gene expression. PMID:28099470

  2. Possible Diversifying Selection in the Imprinted Gene, MEDEA, in Arabidopsis

    PubMed Central

    Miyake, Takashi; Takebayashi, Naoki

    2009-01-01

    Coevolutionary conflict among imprinted genes that influence traits such as offspring growth may arise when maternal and paternal genomes have different evolutionary optima. This conflict is expected in outcrossing taxa with multiple paternity, but not self-fertilizing taxa. MEDEA (MEA) is an imprinted plant gene that influences seed growth. Disagreement exists regarding the type of selection acting on this gene. We present new data and analyses of sequence diversity of MEA in self-fertilizing and outcrossing Arabidopsis and its relatives, to help clarify the form of selection acting on this gene. Codon-based branch analysis among taxa (PAML) suggests that selection on the coding region is changing over time, and nonsynonymous substitution is elevated in at least one outcrossing branch. Codon-based analysis of diversity within outcrossing Arabidopsis lyrata ssp. petraea (OmegaMap) suggests that diversifying selection is acting on a portion of the gene, to cause elevated nonsynonymous polymorphism. Providing further support for balancing selection in A. lyrata, Hudson, Kreitman and Aguadé analysis indicates that diversity/divergence at silent sites in the MEA promoter and genic region is elevated relative to reference genes, and there are deviations from the neutral frequency spectrum. This combination of positive selection as well as balancing and diversifying selection in outcrossing lineages is consistent with other genes influence by evolutionary conflict, such as disease resistance genes. Consistent with predictions that conflict would be eliminated in self-fertilizing taxa, we found no evidence of positive, balancing, or diversifying selection in A. thaliana promoter or genic region. PMID:19126870

  3. Selection of low-variance expressed Malus x domestica (apple) genes for use as quantitative PCR reference genes (housekeepers)

    USDA-ARS?s Scientific Manuscript database

    To accurately measure gene expression using PCR-based approaches, there is the need for reference genes that have low variance in expression (housekeeping genes) to normalise the data for RNA quantity and quality. For non-model species such as Malus x domestica (apples), previously, the selection of...

  4. Mutation-profile-based methods for understanding selection forces in cancer somatic mutations: a comparative analysis.

    PubMed

    Zhou, Zhan; Zou, Yangyun; Liu, Gangbiao; Zhou, Jingqi; Wu, Jingcheng; Zhao, Shimin; Su, Zhixi; Gu, Xun

    2017-08-29

    Human genes exhibit different effects on fitness in cancer and normal cells. Here, we present an evolutionary approach to measure the selection pressure on human genes, using the well-known ratio of the nonsynonymous to synonymous substitution rate in both cancer genomes ( C N / C S ) and normal populations ( p N / p S ). A new mutation-profile-based method that adopts sample-specific mutation rate profiles instead of conventional substitution models was developed. We found that cancer-specific selection pressure is quite different from the selection pressure at the species and population levels. Both the relaxation of purifying selection on passenger mutations and the positive selection of driver mutations may contribute to the increased C N / C S values of human genes in cancer genomes compared with the p N / p S values in human populations. The C N / C S values also contribute to the improved classification of cancer genes and a better understanding of the onco-functionalization of cancer genes during oncogenesis. The use of our computational pipeline to identify cancer-specific positively and negatively selected genes may provide useful information for understanding the evolution of cancers and identifying possible targets for therapeutic intervention.

  5. Gene-assisted selection: applications of association genetics for forest tree breeding

    Treesearch

    Philip L. Wilcox; Craig E. Echt; Rowland D. Burdon

    2007-01-01

    This chapter describes application of association genetics in forest tree species for the purposes of selection. We use the term gene-assisted selection (GAS) to denote application of marker-trait associations determined via association genetics, which we anticipate will be based on poly morph isms associated with expressed genes. The salient features of forest trees...

  6. Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.

    PubMed

    Mallik, Saurav; Bhadra, Tapas; Maulik, Ujjwal

    2017-01-01

    Epigenetic Biomarker discovery is an important task in bioinformatics. In this article, we develop a new framework of identifying statistically significant epigenetic biomarkers using maximal-relevance and minimal-redundancy criterion based feature (gene) selection for multi-omics dataset. Firstly, we determine the genes that have both expression as well as methylation values, and follow normal distribution. Similarly, we identify the genes which consist of both expression and methylation values, but do not follow normal distribution. For each case, we utilize a gene-selection method that provides maximal-relevant, but variable-weighted minimum-redundant genes as top ranked genes. For statistical validation, we apply t-test on both the expression and methylation data consisting of only the normally distributed top ranked genes to determine how many of them are both differentially expressed andmethylated. Similarly, we utilize Limma package for performing non-parametric Empirical Bayes test on both expression and methylation data comprising only the non-normally distributed top ranked genes to identify how many of them are both differentially expressed and methylated. We finally report the top-ranking significant gene-markerswith biological validation. Moreover, our framework improves positive predictive rate and reduces false positive rate in marker identification. In addition, we provide a comparative analysis of our gene-selection method as well as othermethods based on classificationperformances obtained using several well-known classifiers.

  7. Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data

    PubMed Central

    2013-01-01

    Background High–throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score–based or requires tunable parameters as well, limiting its power. Results We present Knowledge Driven Variable Selection (KDVS), a framework that uses a priori biological knowledge in HT data analysis. The expression data matrix is transformed, according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, unlike most approaches, does not exclude a priori any function or process potentially relevant for the biological question under investigation. Differently from the standard approach where gene selection and functional assessment are applied independently, KDVS embeds these two steps into a unified statistical framework, decreasing the variability derived from the threshold–dependent selection, the mapping to the biological concepts, and the signature coverage. We present three case studies to assess the usefulness of the method. Conclusions We showed that KDVS not only enables the selection of known biological functionalities with accuracy, but also identification of new ones. An efficient implementation of KDVS was devised to obtain results in a fast and robust way. Computing time is drastically reduced by the effective use of distributed resources. Finally, integrated visualization techniques immediately increase the interpretability of results. Overall, KDVS approach can be considered as a viable alternative to enrichment–based approaches. PMID:23302187

  8. An ensemble of SVM classifiers based on gene pairs.

    PubMed

    Tong, Muchenxuan; Liu, Kun-Hong; Xu, Chungui; Ju, Wenbin

    2013-07-01

    In this paper, a genetic algorithm (GA) based ensemble support vector machine (SVM) classifier built on gene pairs (GA-ESP) is proposed. The SVMs (base classifiers of the ensemble system) are trained on different informative gene pairs. These gene pairs are selected by the top scoring pair (TSP) criterion. Each of these pairs projects the original microarray expression onto a 2-D space. Extensive permutation of gene pairs may reveal more useful information and potentially lead to an ensemble classifier with satisfactory accuracy and interpretability. GA is further applied to select an optimized combination of base classifiers. The effectiveness of the GA-ESP classifier is evaluated on both binary-class and multi-class datasets. Copyright © 2013 Elsevier Ltd. All rights reserved.

  9. TINAGL1 and B3GALNT1 are potential therapy target genes to suppress metastasis in non-small cell lung cancer

    PubMed Central

    2014-01-01

    Background Non-small cell lung cancer (NSCLC) remains lethal despite the development of numerous drug therapy technologies. About 85% to 90% of lung cancers are NSCLC and the 5-year survival rate is at best still below 50%. Thus, it is important to find drugable target genes for NSCLC to develop an effective therapy for NSCLC. Results Integrated analysis of publically available gene expression and promoter methylation patterns of two highly aggressive NSCLC cell lines generated by in vivo selection was performed. We selected eleven critical genes that may mediate metastasis using recently proposed principal component analysis based unsupervised feature extraction. The eleven selected genes were significantly related to cancer diagnosis. The tertiary protein structure of the selected genes was inferred by Full Automatic Modeling System, a profile-based protein structure inference software, to determine protein functions and to specify genes that could be potential drug targets. Conclusions We identified eleven potentially critical genes that may mediate NSCLC metastasis using bioinformatic analysis of publically available data sets. These genes are potential target genes for the therapy of NSCLC. Among the eleven genes, TINAGL1 and B3GALNT1 are possible candidates for drug compounds that inhibit their gene expression. PMID:25521548

  10. Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification

    PubMed Central

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2015-01-01

    This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice. PMID:25823003

  11. Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification.

    PubMed

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2015-01-01

    This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice.

  12. Expression analysis in response to drought stress in soybean: Shedding light on the regulation of metabolic pathway genes.

    PubMed

    Guimarães-Dias, Fábia; Neves-Borges, Anna Cristina; Viana, Antonio Americo Barbosa; Mesquita, Rosilene Oliveira; Romano, Eduardo; de Fátima Grossi-de-Sá, Maria; Nepomuceno, Alexandre Lima; Loureiro, Marcelo Ehlers; Alves-Ferreira, Márcio

    2012-06-01

    Metabolomics analysis of wild type Arabidopsis thaliana plants, under control and drought stress conditions revealed several metabolic pathways that are induced under water deficit. The metabolic response to drought stress is also associated with ABA dependent and independent pathways, allowing a better understanding of the molecular mechanisms in this model plant. Through combining an in silico approach and gene expression analysis by quantitative real-time PCR, the present work aims at identifying genes of soybean metabolic pathways potentially associated with water deficit. Digital expression patterns of Arabidopsis genes, which were selected based on the basis of literature reports, were evaluated under drought stress condition by Genevestigator. Genes that showed strong induction under drought stress were selected and used as bait to identify orthologs in the soybean genome. This allowed us to select 354 genes of putative soybean orthologs of 79 Arabidopsis genes belonging to 38 distinct metabolic pathways. The expression pattern of the selected genes was verified in the subtractive libraries available in the GENOSOJA project. Subsequently, 13 genes from different metabolic pathways were selected for validation by qPCR experiments. The expression of six genes was validated in plants undergoing drought stress in both pot-based and hydroponic cultivation systems. The results suggest that the metabolic response to drought stress is conserved in Arabidopsis and soybean plants.

  13. Advances in metaheuristics for gene selection and classification of microarray data.

    PubMed

    Duval, Béatrice; Hao, Jin-Kao

    2010-01-01

    Gene selection aims at identifying a (small) subset of informative genes from the initial data in order to obtain high predictive accuracy for classification. Gene selection can be considered as a combinatorial search problem and thus be conveniently handled with optimization methods. In this article, we summarize some recent developments of using metaheuristic-based methods within an embedded approach for gene selection. In particular, we put forward the importance and usefulness of integrating problem-specific knowledge into the search operators of such a method. To illustrate the point, we explain how ranking coefficients of a linear classifier such as support vector machine (SVM) can be profitably used to reinforce the search efficiency of Local Search and Evolutionary Search metaheuristic algorithms for gene selection and classification.

  14. Reranking candidate gene models with cross-species comparison for improved gene prediction

    PubMed Central

    Liu, Qian; Crammer, Koby; Pereira, Fernando CN; Roos, David S

    2008-01-01

    Background Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models. PMID:18854050

  15. Polymorphism and selection in the major histocompatibility complex DRA and DQA genes in the family Equidae.

    PubMed

    Janova, Eva; Matiasovic, Jan; Vahala, Jiri; Vodicka, Roman; Van Dyk, Enette; Horin, Petr

    2009-07-01

    The major histocompatibility complex genes coding for antigen binding and presenting molecules are the most polymorphic genes in the vertebrate genome. We studied the DRA and DQA gene polymorphism of the family Equidae. In addition to 11 previously reported DRA and 24 DQA alleles, six new DRA sequences and 13 new DQA alleles were identified in the genus Equus. Phylogenetic analysis of both DRA and DQA sequences provided evidence for trans-species polymorphism in the family Equidae. The phylogenetic trees differed from species relationships defined by standard taxonomy of Equidae and from trees based on mitochondrial or neutral gene sequence data. Analysis of selection showed differences between the less variable DRA and more variable DQA genes. DRA alleles were more often shared by more species. The DQA sequences analysed showed strong amongst-species positive selection; the selected amino acid positions mostly corresponded to selected positions in rodent and human DQA genes.

  16. Detecting short spatial scale local adaptation and epistatic selection in climate-related candidate genes in European beech (Fagus sylvatica) populations.

    PubMed

    Csilléry, Katalin; Lalagüe, Hadrien; Vendramin, Giovanni G; González-Martínez, Santiago C; Fady, Bruno; Oddou-Muratorio, Sylvie

    2014-10-01

    Detecting signatures of selection in tree populations threatened by climate change is currently a major research priority. Here, we investigated the signature of local adaptation over a short spatial scale using 96 European beech (Fagus sylvatica L.) individuals originating from two pairs of populations on the northern and southern slopes of Mont Ventoux (south-eastern France). We performed both single and multilocus analysis of selection based on 53 climate-related candidate genes containing 546 SNPs. FST outlier methods at the SNP level revealed a weak signal of selection, with three marginally significant outliers in the northern populations. At the gene level, considering haplotypes as alleles, two additional marginally significant outliers were detected, one on each slope. To account for the uncertainty of haplotype inference, we averaged the Bayes factors over many possible phase reconstructions. Epistatic selection offers a realistic multilocus model of selection in natural populations. Here, we used a test suggested by Ohta based on the decomposition of the variance of linkage disequilibrium. Overall populations, 0.23% of the SNP pairs (haplotypes) showed evidence of epistatic selection, with nearly 80% of them being within genes. One of the between gene epistatic selection signals arose between an FST outlier and a nonsynonymous mutation in a drought response gene. Additionally, we identified haplotypes containing selectively advantageous allele combinations which were unique to high or low elevations and northern or southern populations. Several haplotypes contained nonsynonymous mutations situated in genes with known functional importance for adaptation to climatic factors. © 2014 John Wiley & Sons Ltd.

  17. Genome-wide evidence for divergent selection between populations of a major agricultural pathogen.

    PubMed

    Hartmann, Fanny E; McDonald, Bruce A; Croll, Daniel

    2018-06-01

    The genetic and environmental homogeneity in agricultural ecosystems is thought to impose strong and uniform selection pressures. However, the impact of this selection on plant pathogen genomes remains largely unknown. We aimed to identify the proportion of the genome and the specific gene functions under positive selection in populations of the fungal wheat pathogen Zymoseptoria tritici. First, we performed genome scans in four field populations that were sampled from different continents and on distinct wheat cultivars to test which genomic regions are under recent selection. Based on extended haplotype homozygosity and composite likelihood ratio tests, we identified 384 and 81 selective sweeps affecting 4% and 0.5% of the 35 Mb core genome, respectively. We found differences both in the number and the position of selective sweeps across the genome between populations. Using a XtX-based outlier detection approach, we identified 51 extremely divergent genomic regions between the allopatric populations, suggesting that divergent selection led to locally adapted pathogen populations. We performed an outlier detection analysis between two sympatric populations infecting two different wheat cultivars to identify evidence for host-driven selection. Selective sweep regions harboured genes that are likely to play a role in successfully establishing host infections. We also identified secondary metabolite gene clusters and an enrichment in genes encoding transporter and protein localization functions. The latter gene functions mediate responses to environmental stress, including interactions with the host. The distinct gene functions under selection indicate that both local host genotypes and abiotic factors contributed to local adaptation. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  18. Compartmentalized partnered replication for the directed evolution of genetic parts and circuits.

    PubMed

    Abil, Zhanar; Ellefson, Jared W; Gollihar, Jimmy D; Watkins, Ella; Ellington, Andrew D

    2017-12-01

    Compartmentalized partnered replication (CPR) is an emulsion-based directed evolution method based on a robust and modular phenotype-genotype linkage. In contrast to other in vivo directed evolution approaches, CPR largely mitigates host fitness effects due to a relatively short expression time of the gene of interest. CPR is based on gene circuits in which the selection of a 'partner' function from a library leads to the production of a thermostable polymerase. After library preparation, bacteria produce partner proteins that can potentially lead to enhancement of transcription, translation, gene regulation, and other aspects of cellular metabolism that reinforce thermostable polymerase production. Individual cells are then trapped in water-in-oil emulsion droplets in the presence of primers and dNTPs, followed by the recovery of the partner genes via emulsion PCR. In this step, droplets with cells expressing partner proteins that promote polymerase production will produce higher copy numbers of the improved partner gene. The resulting partner genes can subsequently be recloned for the next round of selection. Here, we present a step-by-step guideline for the procedure by providing examples of (i) selection of T7 RNA polymerases that recognize orthogonal promoters and (ii) selection of tRNA for enhanced amber codon suppression. A single round of CPR should take ∼3-5 d, whereas a whole directed evolution can be performed in 3-10 rounds, depending on selection efficiency.

  19. A comparative analysis of swarm intelligence techniques for feature selection in cancer classification.

    PubMed

    Gunavathi, Chellamuthu; Premalatha, Kandasamy

    2014-01-01

    Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.

  20. Development of a Plant Transformation Selection System Based on Expression of Genes Encoding Gentamicin Acetyltransferases

    PubMed Central

    Hayford, Maria B.; Medford, June I.; Hoffman, Nancy L.; Rogers, Stephen G.; Klee, Harry J.

    1988-01-01

    The development of selectable markers for transformation has been a major factor in the successful genetic manipulation of plants. A new selectable marker system has been developed based on bacterial gentamicin-3-N-acetyltransferases [AAC(3)]. These enzymes inactivate aminoglycoside antibiotics by acetylation. Two examples of AAC(3) enzymes have been manipulated to be expressed in plants. Chimeric AAC(3)-III and AAC(3)-IV genes were assembled using the constitutively expressed cauliflower mosaic virus 35S promoter and the nopaline synthase 3′ nontranslated region. These chimeric genes were engineered into vectors for Agrobacterium-mediated plant transformation. Petunia hybrida and Arabidopsis thaliana tissue transformed with these vectors grew in the presence of normally lethal levels of gentamicin. The transformed nature of regenerated Arabidopsis plants was confirmed by DNA hybridization analysis and inheritance of the selectable phenotype in progeny. The chimeric AAC(3)-IV gene has also been used to select transformants in several additional plant species. These results show that the bacterial AAC(3) genes will serve as useful selectable markers in plant tissue culture. Images Fig. 3 Fig. 4 Fig. 5 PMID:16666057

  1. Microarray-based cancer prediction using soft computing approach.

    PubMed

    Wang, Xiaosheng; Gotoh, Osamu

    2009-05-26

    One of the difficulties in using gene expression profiles to predict cancer is how to effectively select a few informative genes to construct accurate prediction models from thousands or ten thousands of genes. We screen highly discriminative genes and gene pairs to create simple prediction models involved in single genes or gene pairs on the basis of soft computing approach and rough set theory. Accurate cancerous prediction is obtained when we apply the simple prediction models for four cancerous gene expression datasets: CNS tumor, colon tumor, lung cancer and DLBCL. Some genes closely correlated with the pathogenesis of specific or general cancers are identified. In contrast with other models, our models are simple, effective and robust. Meanwhile, our models are interpretable for they are based on decision rules. Our results demonstrate that very simple models may perform well on cancerous molecular prediction and important gene markers of cancer can be detected if the gene selection approach is chosen reasonably.

  2. ptxD gene in combination with phosphite serves as a highly effective selection system to generate transgenic cotton (Gossypium hirsutum L.).

    PubMed

    Pandeya, Devendra; Campbell, LeAnne M; Nunes, Eugenia; Lopez-Arredondo, Damar L; Janga, Madhusudhana R; Herrera-Estrella, Luis; Rathore, Keerti S

    2017-12-01

    This report demonstrates the usefulness of ptxD/phosphite as a selection system that not only provides a highly efficient and simple means to generate transgenic cotton plants, but also helps address many of the concerns related to the use of antibiotic and herbicide resistance genes in the production of transgenic crops. Two of the most popular dominant selectable marker systems for plant transformation are based on either antibiotic or herbicide resistance genes. Due to concerns regarding their safety and in order to stack multiple traits in a single plant, there is a need for alternative selectable marker genes. The ptxD gene, derived from Pseudomonas stutzeri WM88, that confers to cells the ability to convert phosphite (Phi) into orthophosphate (Pi) offers an alternative selectable marker gene as demonstrated for tobacco and maize. Here, we show that the ptxD gene in combination with a protocol based on selection medium containing Phi, as the sole source of phosphorus (P), can serve as an effective and efficient system to select for transformed cells and generate transgenic cotton plants. Fluorescence microscopy examination of the cultures under selection and molecular analyses on the regenerated plants demonstrate the efficacy of the system in recovering cotton transformants following Agrobacterium-mediated transformation. Under the ptxD/Phi selection, an average of 3.43 transgenic events per 100 infected explants were recovered as opposed to only 0.41% recovery when bar/phosphinothricin (PPT) selection was used. The event recovery rates for nptII/kanamycin and hpt/hygromycin systems were 2.88 and 2.47%, respectively. Molecular analysis on regenerated events showed a selection efficiency of ~ 97% under the ptxD/Phi system. Thus, ptxD/Phi has proven to be a very efficient, positive selection system for the generation of transgenic cotton plants with equal or higher transformation efficiencies compared to the commonly used, negative selection systems.

  3. Inference of Evolutionary Forces Acting on Human Biological Pathways

    PubMed Central

    Daub, Josephine T.; Dupanloup, Isabelle; Robinson-Rechavi, Marc; Excoffier, Laurent

    2015-01-01

    Because natural selection is likely to act on multiple genes underlying a given phenotypic trait, we study here the potential effect of ongoing and past selection on the genetic diversity of human biological pathways. We first show that genes included in gene sets are generally under stronger selective constraints than other genes and that their evolutionary response is correlated. We then introduce a new procedure to detect selection at the pathway level based on a decomposition of the classical McDonald–Kreitman test extended to multiple genes. This new test, called 2DNS, detects outlier gene sets and takes into account past demographic effects and evolutionary constraints specific to gene sets. Selective forces acting on gene sets can be easily identified by a mere visual inspection of the position of the gene sets relative to their two-dimensional null distribution. We thus find several outlier gene sets that show signals of positive, balancing, or purifying selection but also others showing an ancient relaxation of selective constraints. The principle of the 2DNS test can also be applied to other genomic contrasts. For instance, the comparison of patterns of polymorphisms private to African and non-African populations reveals that most pathways show a higher proportion of nonsynonymous mutations in non-Africans than in Africans, potentially due to different demographic histories and selective pressures. PMID:25971280

  4. Hybrid genetic algorithm-neural network: feature extraction for unpreprocessed microarray data.

    PubMed

    Tong, Dong Ling; Schierz, Amanda C

    2011-09-01

    Suitable techniques for microarray analysis have been widely researched, particularly for the study of marker genes expressed to a specific type of cancer. Most of the machine learning methods that have been applied to significant gene selection focus on the classification ability rather than the selection ability of the method. These methods also require the microarray data to be preprocessed before analysis takes place. The objective of this study is to develop a hybrid genetic algorithm-neural network (GANN) model that emphasises feature selection and can operate on unpreprocessed microarray data. The GANN is a hybrid model where the fitness value of the genetic algorithm (GA) is based upon the number of samples correctly labelled by a standard feedforward artificial neural network (ANN). The model is evaluated by using two benchmark microarray datasets with different array platforms and differing number of classes (a 2-class oligonucleotide microarray data for acute leukaemia and a 4-class complementary DNA (cDNA) microarray dataset for SRBCTs (small round blue cell tumours)). The underlying concept of the GANN algorithm is to select highly informative genes by co-evolving both the GA fitness function and the ANN weights at the same time. The novel GANN selected approximately 50% of the same genes as the original studies. This may indicate that these common genes are more biologically significant than other genes in the datasets. The remaining 50% of the significant genes identified were used to build predictive models and for both datasets, the models based on the set of genes extracted by the GANN method produced more accurate results. The results also suggest that the GANN method not only can detect genes that are exclusively associated with a single cancer type but can also explore the genes that are differentially expressed in multiple cancer types. The results show that the GANN model has successfully extracted statistically significant genes from the unpreprocessed microarray data as well as extracting known biologically significant genes. We also show that assessing the biological significance of genes based on classification accuracy may be misleading and though the GANN's set of extra genes prove to be more statistically significant than those selected by other methods, a biological assessment of these genes is highly recommended to confirm their functionality. Copyright © 2011 Elsevier B.V. All rights reserved.

  5. A Tightly Regulated Genetic Selection System with Signaling-Active Alleles of Phytochrome B.

    PubMed

    Hu, Wei; Lagarias, J Clark

    2017-01-01

    Selectable markers derived from plant genes circumvent the potential risk of antibiotic/herbicide-resistance gene transfer into neighboring plant species, endophytic bacteria, and mycorrhizal fungi. Toward this goal, we have engineered and validated signaling-active alleles of phytochrome B (eYHB) as plant-derived selection marker genes in the model plant Arabidopsis (Arabidopsis thaliana). By probing the relationship of construct size and induction conditions to optimal phenotypic selection, we show that eYHB-based alleles are robust substitutes for antibiotic/herbicide-dependent marker genes as well as surprisingly sensitive reporters of off-target transgene expression. © 2017 American Society of Plant Biologists. All Rights Reserved.

  6. A Tightly Regulated Genetic Selection System with Signaling-Active Alleles of Phytochrome B1[OPEN

    PubMed Central

    2017-01-01

    Selectable markers derived from plant genes circumvent the potential risk of antibiotic/herbicide-resistance gene transfer into neighboring plant species, endophytic bacteria, and mycorrhizal fungi. Toward this goal, we have engineered and validated signaling-active alleles of phytochrome B (eYHB) as plant-derived selection marker genes in the model plant Arabidopsis (Arabidopsis thaliana). By probing the relationship of construct size and induction conditions to optimal phenotypic selection, we show that eYHB-based alleles are robust substitutes for antibiotic/herbicide-dependent marker genes as well as surprisingly sensitive reporters of off-target transgene expression. PMID:27881727

  7. confFuse: High-Confidence Fusion Gene Detection across Tumor Entities.

    PubMed

    Huang, Zhiqin; Jones, David T W; Wu, Yonghe; Lichter, Peter; Zapatka, Marc

    2017-01-01

    Background: Fusion genes play an important role in the tumorigenesis of many cancers. Next-generation sequencing (NGS) technologies have been successfully applied in fusion gene detection for the last several years, and a number of NGS-based tools have been developed for identifying fusion genes during this period. Most fusion gene detection tools based on RNA-seq data report a large number of candidates (mostly false positives), making it hard to prioritize candidates for experimental validation and further analysis. Selection of reliable fusion genes for downstream analysis becomes very important in cancer research. We therefore developed confFuse, a scoring algorithm to reliably select high-confidence fusion genes which are likely to be biologically relevant. Results: confFuse takes multiple parameters into account in order to assign each fusion candidate a confidence score, of which score ≥8 indicates high-confidence fusion gene predictions. These parameters were manually curated based on our experience and on certain structural motifs of fusion genes. Compared with alternative tools, based on 96 published RNA-seq samples from different tumor entities, our method can significantly reduce the number of fusion candidates (301 high-confidence from 8,083 total predicted fusion genes) and keep high detection accuracy (recovery rate 85.7%). Validation of 18 novel, high-confidence fusions detected in three breast tumor samples resulted in a 100% validation rate. Conclusions: confFuse is a novel downstream filtering method that allows selection of highly reliable fusion gene candidates for further downstream analysis and experimental validations. confFuse is available at https://github.com/Zhiqin-HUANG/confFuse.

  8. In silico selection of expression reference genes with demonstrated stability in barley among a diverse set of tissues and cultivars

    USDA-ARS?s Scientific Manuscript database

    Premise of the study: Reference genes are selected based on the assumption of temporal and spatial expression stability and on their widespread use in model species. They are often used in new target species without validation, presumed as stable. For barley, reference gene validation is lacking, bu...

  9. A genome-wide scan for signatures of directional selection in domesticated pigs.

    PubMed

    Moon, Sunjin; Kim, Tae-Hun; Lee, Kyung-Tai; Kwak, Woori; Lee, Taeheon; Lee, Si-Woo; Kim, Myung-Jick; Cho, Kyuho; Kim, Namshin; Chung, Won-Hyong; Sung, Samsun; Park, Taesung; Cho, Seoae; Groenen, Martien Am; Nielsen, Rasmus; Kim, Yuseob; Kim, Heebal

    2015-02-25

    Animal domestication involved drastic phenotypic changes driven by strong artificial selection and also resulted in new populations of breeds, established by humans. This study aims to identify genes that show evidence of recent artificial selection during pig domestication. Whole-genome resequencing of 30 individual pigs from domesticated breeds, Landrace and Yorkshire, and 10 Asian wild boars at ~16-fold coverage was performed resulting in over 4.3 million SNPs for 19,990 genes. We constructed a comprehensive genome map of directional selection by detecting selective sweeps using an F ST-based approach that detects directional selection in lineages leading to the domesticated breeds and using a haplotype-based test that detects ongoing selective sweeps within the breeds. We show that candidate genes under selection are significantly enriched for loci implicated in quantitative traits important to pig reproduction and production. The candidate gene with the strongest signals of directional selection belongs to group III of the metabolomics glutamate receptors, known to affect brain functions associated with eating behavior, suggesting that loci under strong selection include loci involved in behaviorial traits in domesticated pigs including tameness. We show that a significant proportion of selection signatures coincide with loci that were previously inferred to affect phenotypic variation in pigs. We further identify functional enrichment related to behavior, such as signal transduction and neuronal activities, for those targets of selection during domestication in pigs.

  10. Biomarkers of Exposure to Toxic Substances. Volume 2: Genomics: Unique Patterns of Differential Gene Expression and Pathway Perturbation Resulting from Exposure to Nephrotoxins with Regional Specific Toxicity

    DTIC Science & Technology

    2009-05-01

    of chemicals agents . Changes in gene expression are among the most sensitive indicators of chemical exposure. Toxicogenomics, which is based on DNA...assessing gene expression changes and subsequently the mechanism of renal injury following exposure to nephrotoxins selected for their regional...Serine Treatment on Selected Serum Chemistry Parameters ........................ 8 Table 4: Effect of PUR Treatment on Selected Serum Chemistry

  11. AUCTSP: an improved biomarker gene pair class predictor.

    PubMed

    Kagaris, Dimitri; Khamesipour, Alireza; Yiannoutsos, Constantin T

    2018-06-26

    The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions ("pivot" genes). We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes. The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair.

  12. A novel feature extraction approach for microarray data based on multi-algorithm fusion

    PubMed Central

    Jiang, Zhu; Xu, Rong

    2015-01-01

    Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions. PMID:25780277

  13. A novel feature extraction approach for microarray data based on multi-algorithm fusion.

    PubMed

    Jiang, Zhu; Xu, Rong

    2015-01-01

    Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions.

  14. Adaptive molecular evolution of the Major Histocompatibility Complex genes, DRA and DQA, in the genus Equus

    PubMed Central

    2011-01-01

    Background Major Histocompatibility Complex (MHC) genes are central to vertebrate immune response and are believed to be under balancing selection by pathogens. This hypothesis has been supported by observations of extremely high polymorphism, elevated nonsynonymous to synonymous base pair substitution rates and trans-species polymorphisms at these loci. In equids, the organization and variability of this gene family has been described, however the full extent of diversity and selection is unknown. As selection is not expected to act uniformly on a functional gene, maximum likelihood codon-based models of selection that allow heterogeneity in selection across codon positions can be valuable for examining MHC gene evolution and the molecular basis for species adaptations. Results We investigated the evolution of two class II MHC genes of the Equine Lymphocyte Antigen (ELA), DRA and DQA, in the genus Equus with the addition of novel alleles identified in plains zebra (E. quagga, formerly E. burchelli). We found that both genes exhibited a high degree of polymorphism and inter-specific sharing of allele lineages. To our knowledge, DRA allelic diversity was discovered to be higher than has ever been observed in vertebrates. Evidence was also found to support a duplication of the DQA locus. Selection analyses, evaluated in terms of relative rates of nonsynonymous to synonymous mutations (dN/dS) averaged over the gene region, indicated that the majority of codon sites were conserved and under purifying selection (dN

  15. Neuroplasticity of selective attention: Research foundations and preliminary evidence for a gene by intervention interaction

    PubMed Central

    Stevens, Courtney; Pakulak, Eric; Hampton Wray, Amanda; Bell, Theodore A.; Neville, Helen J.

    2017-01-01

    This article reviews the trajectory of our research program on selective attention, which has moved from basic research on the neural processes underlying selective attention to translational studies using selective attention as a neurobiological target for evidence-based interventions. We use this background to present a promising preliminary investigation of how genetic and experiential factors interact during development (i.e., gene × intervention interactions). Our findings provide evidence on how exposure to a family-based training can modify the associations between genotype (5-HTTLPR) and the neural mechanisms of selective attention in preschool children from lower socioeconomic status backgrounds. PMID:28819066

  16. Neuroplasticity of selective attention: Research foundations and preliminary evidence for a gene by intervention interaction.

    PubMed

    Isbell, Elif; Stevens, Courtney; Pakulak, Eric; Hampton Wray, Amanda; Bell, Theodore A; Neville, Helen J

    2017-08-29

    This article reviews the trajectory of our research program on selective attention, which has moved from basic research on the neural processes underlying selective attention to translational studies using selective attention as a neurobiological target for evidence-based interventions. We use this background to present a promising preliminary investigation of how genetic and experiential factors interact during development (i.e., gene × intervention interactions). Our findings provide evidence on how exposure to a family-based training can modify the associations between genotype (5-HTTLPR) and the neural mechanisms of selective attention in preschool children from lower socioeconomic status backgrounds.

  17. Candidate gene identification of ovulation-inducing genes by RNA sequencing with an in vivo assay in zebrafish.

    PubMed

    Klangnurak, Wanlada; Fukuyo, Taketo; Rezanujjaman, M D; Seki, Masahide; Sugano, Sumio; Suzuki, Yutaka; Tokumoto, Toshinobu

    2018-01-01

    We previously reported the microarray-based selection of three ovulation-related genes in zebrafish. We used a different selection method in this study, RNA sequencing analysis. An additional eight up-regulated candidates were found as specifically up-regulated genes in ovulation-induced samples. Changes in gene expression were confirmed by qPCR analysis. Furthermore, up-regulation prior to ovulation during natural spawning was verified in samples from natural pairing. Gene knock-out zebrafish strains of one of the candidates, the starmaker gene (stm), were established by CRISPR genome editing techniques. Unexpectedly, homozygous mutants were fertile and could spawn eggs. However, a high percentage of unfertilized eggs and abnormal embryos were produced from these homozygous females. The results suggest that the stm gene is necessary for fertilization. In this study, we selected additional ovulation-inducing candidate genes, and a novel function of the stm gene was investigated.

  18. Successful recovery of transgenic cowpea (Vigna unguiculata) using the 6-phosphomannose isomerase gene as the selectable marker.

    PubMed

    Bakshi, Souvika; Saha, Bedabrata; Roy, Nand Kishor; Mishra, Sagarika; Panda, Sanjib Kumar; Sahoo, Lingaraj

    2012-06-01

    A new method for obtaining transgenic cowpea was developed using positive selection based on the Escherichia coli 6-phosphomannose isomerase gene as the selectable marker and mannose as the selective agent. Only transformed cells were capable of utilizing mannose as a carbon source. Cotyledonary node explants from 4-day-old in vitro-germinated seedlings of cultivar Pusa Komal were inoculated with Agrobacterium tumefaciens strain EHA105 carrying the vector pNOV2819. Regenerating transformed shoots were selected on medium supplemented with a combination of 20 g/l mannose and 5 g/l sucrose as carbon source. The transformed shoots were rooted on medium devoid of mannose. Transformation efficiency based on PCR analysis of individual putative transformed shoots was 3.6%. Southern blot analysis on five randomly chosen PCR-positive plants confirmed the integration of the pmi transgene. Qualitative reverse transcription (qRT-PCR) analysis demonstrated the expression of pmi in T₀ transgenic plants. Chlorophenol red (CPR) assays confirmed the activity of PMI in transgenic plants, and the gene was transmitted to progeny in a Mendelian fashion. The transformation method presented here for cowpea using mannose selection is efficient and reproducible, and could be used to introduce a desirable gene(s) into cowpea for biotic and abiotic stress tolerance.

  19. AQUATIC PLANT SPECIATION AFFECTED BY DIVERSIFYING SELECTION OF ORGANELLE DNA REGIONS(1).

    PubMed

    Kato, Syou; Misawa, Kazuharu; Takahashi, Fumio; Sakayama, Hidetoshi; Sano, Satomi; Kosuge, Keiko; Kasai, Fumie; Watanabe, Makoto M; Tanaka, Jiro; Nozaki, Hisayoshi

    2011-10-01

    Many of the genes that control photosynthesis are carried in the chloroplast. These genes differ among species. However, evidence has yet to be reported revealing the involvement of organelle genes in the initial stages of plant speciation. To elucidate the molecular basis of aquatic plant speciation, we focused on the unique plant species Chara braunii C. C. Gmel. that inhabits both shallow and deep freshwater habitats and exhibits habitat-based dimorphism of chloroplast DNA (cpDNA). Here, we examined the "shallow" and "deep" subpopulations of C. braunii using two nuclear DNA (nDNA) markers and cpDNA. Genetic differentiation between the two subpopulations was measured in both nDNA and cpDNA regions, although phylogenetic analyses suggested nuclear gene flow between subpopulations. Neutrality tests based on Tajima's D demonstrated diversifying selection acting on organelle DNA regions. Furthermore, both "shallow" and "deep" haplotypes of cpDNA detected in cultures originating from bottom soils of three deep environments suggested that migration of oospores (dormant zygotes) between the two habitats occurs irrespective of the complete habitat-based dimorphism of cpDNA from field-collected vegetative thalli. Therefore, the two subpopulations are highly selected by their different aquatic habitats and show prezygotic isolation, which represents an initial process of speciation affected by ecologically based divergent selection of organelle genes. © 2011 Phycological Society of America.

  20. The genomics of selection in dogs and the parallel evolution between dogs and humans.

    PubMed

    Wang, Guo-dong; Zhai, Weiwei; Yang, He-chuan; Fan, Ruo-xi; Cao, Xue; Zhong, Li; Wang, Lu; Liu, Fei; Wu, Hong; Cheng, Lu-guang; Poyarkov, Andrei D; Poyarkov, Nikolai A; Tang, Shu-sheng; Zhao, Wen-ming; Gao, Yun; Lv, Xue-mei; Irwin, David M; Savolainen, Peter; Wu, Chung-I; Zhang, Ya-ping

    2013-01-01

    The genetic bases of demographic changes and artificial selection underlying domestication are of great interest in evolutionary biology. Here we perform whole-genome sequencing of multiple grey wolves, Chinese indigenous dogs and dogs of diverse breeds. Demographic analysis show that the split between wolves and Chinese indigenous dogs occurred 32,000 years ago and that the subsequent bottlenecks were mild. Therefore, dogs may have been under human selection over a much longer time than previously concluded, based on molecular data, perhaps by initially scavenging with humans. Population genetic analysis identifies a list of genes under positive selection during domestication, which overlaps extensively with the corresponding list of positively selected genes in humans. Parallel evolution is most apparent in genes for digestion and metabolism, neurological process and cancer. Our study, for the first time, draws together humans and dogs in their recent genomic evolution.

  1. Differentially Coexpressed Disease Gene Identification Based on Gene Coexpression Network.

    PubMed

    Jiang, Xue; Zhang, Han; Quan, Xiongwen

    2016-01-01

    Screening disease-related genes by analyzing gene expression data has become a popular theme. Traditional disease-related gene selection methods always focus on identifying differentially expressed gene between case samples and a control group. These traditional methods may not fully consider the changes of interactions between genes at different cell states and the dynamic processes of gene expression levels during the disease progression. However, in order to understand the mechanism of disease, it is important to explore the dynamic changes of interactions between genes in biological networks at different cell states. In this study, we designed a novel framework to identify disease-related genes and developed a differentially coexpressed disease-related gene identification method based on gene coexpression network (DCGN) to screen differentially coexpressed genes. We firstly constructed phase-specific gene coexpression network using time-series gene expression data and defined the conception of differential coexpression of genes in coexpression network. Then, we designed two metrics to measure the value of gene differential coexpression according to the change of local topological structures between different phase-specific networks. Finally, we conducted meta-analysis of gene differential coexpression based on the rank-product method. Experimental results demonstrated the feasibility and effectiveness of DCGN and the superior performance of DCGN over other popular disease-related gene selection methods through real-world gene expression data sets.

  2. Surprisingly Low Limits of Selection in Plant Domestication

    PubMed Central

    Allaby, Robin G.; Kitchen, James L.; Fuller, Dorian Q.

    2015-01-01

    Current debate concerns the pace at which domesticated plants emerged from cultivated wild populations and how many genes were involved. Using an individual-based model, based on the assumptions of Haldane and Maynard Smith, respectively, we estimate that a surprisingly low number of 50–100 loci are the most that could be under selection in a cultivation regime at the selection strengths observed in the archaeological record. This finding is robust to attempts to rescue populations from extinction through selection from high standing genetic variation, gene flow, and the Maynard Smith-based model of threshold selection. Selective sweeps come at a cost, reducing the capacity of plants to adapt to new environments, which may contribute to the explanation of why selective sweeps have not been detected more frequently and why expansion of the agrarian package during the Neolithic was so frequently associated with collapse. PMID:27081302

  3. Whole-genome scanning for the litter size trait associated genes and SNPs under selection in dairy goat (Capra hircus)

    PubMed Central

    Lai, Fang-Nong; Zhai, Hong-Li; Cheng, Ming; Ma, Jun-Yu; Cheng, Shun-Feng; Ge, Wei; Zhang, Guo-Liang; Wang, Jun-Jie; Zhang, Rui-Qian; Wang, Xue; Min, Ling-Jiang; Song, Jiu-Zhou; Shen, Wei

    2016-01-01

    Dairy goats are one of the most utilized domesticated animals in China. Here, we selected extreme populations based on differential fecundity in two Laoshan dairy goat populations. Utilizing deep sequencing we have generated 68.7 and 57.8 giga base of sequencing data, and identified 12,458,711 and 12,423,128 SNPs in the low fecundity and high fecundity groups, respectively. Following selective sweep analyses, a number of loci and candidate genes in the two populations were scanned independently. The reproduction related genes CCNB2, AR, ADCY1, DNMT3B, SMAD2, AMHR2, ERBB2, FGFR1, MAP3K12 and THEM4 were specifically selected in the high fecundity group whereas KDM6A, TENM1, SWI5 and CYM were specifically selected in the low fecundity group. A sub-set of genes including SYCP2, SOX5 and POU3F4 were localized both in the high and low fecundity selection windows, suggesting that these particular genes experienced strong selection with lower genetic diversity. From the genome data, the rare nonsense mutations may not contribute to fecundity, whereas nonsynonymous SNPs likely play a predominant role. The nonsynonymous exonic SNPs in SETDB2 and CDH26 which were co-localized in the selected region may take part in fecundity traits. These observations bring us a new insights into the genetic variation influencing fecundity traits within dairy goats. PMID:27905513

  4. Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data.

    PubMed

    Ooi, Chia Huey; Chetty, Madhu; Teng, Shyh Wei

    2006-06-23

    Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. We present two realistically evaluated correlation-based feature selection techniques which incorporate, in addition to the two existing criteria involved in forming a predictor set (relevance and redundancy), a third criterion called the degree of differential prioritization (DDP). DDP functions as a parameter to strike the balance between relevance and redundancy, providing our techniques with the novel ability to differentially prioritize the optimization of relevance against redundancy (and vice versa). This ability proves useful in producing optimal classification accuracy while using reasonably small predictor set sizes for nine well-known multiclass microarray datasets. For multiclass microarray datasets, especially the GCM and NCI60 datasets, DDP enables our filter-based techniques to produce accuracies better than those reported in previous studies which employed similarly realistic evaluation procedures.

  5. Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.

    PubMed

    Liu, Bo; Tian, Meihong; Zhang, Chunhua; Li, Xiangtao

    2015-04-01

    Biomarker discovery from high-dimensional data is a complex task in the development of efficient cancer diagnoses and classification. However, these data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a discrete biogeography based optimization is proposed to select the good subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the fisher-markov selector is used to choose fixed number of gene data. Secondly, to make biogeography based optimization suitable for the feature selection problem; discrete migration model and discrete mutation model are proposed to balance the exploration and exploitation ability. Then, discrete biogeography based optimization, as we called DBBO, is proposed by integrating discrete migration model and discrete mutation model. Finally, the DBBO method is used for feature selection, and three classifiers are used as the classifier with the 10 fold cross-validation method. In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on four breast cancer dataset benchmarks. Comparison with genetic algorithm, particle swarm optimization, differential evolution algorithm and hybrid biogeography based optimization, experimental results demonstrate that the proposed method is better or at least comparable with previous method from literature when considering the quality of the solutions obtained. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Confident difference criterion: a new Bayesian differentially expressed gene selection algorithm with applications.

    PubMed

    Yu, Fang; Chen, Ming-Hui; Kuo, Lynn; Talbott, Heather; Davis, John S

    2015-08-07

    Recently, the Bayesian method becomes more popular for analyzing high dimensional gene expression data as it allows us to borrow information across different genes and provides powerful estimators for evaluating gene expression levels. It is crucial to develop a simple but efficient gene selection algorithm for detecting differentially expressed (DE) genes based on the Bayesian estimators. In this paper, by extending the two-criterion idea of Chen et al. (Chen M-H, Ibrahim JG, Chi Y-Y. A new class of mixture models for differential gene expression in DNA microarray data. J Stat Plan Inference. 2008;138:387-404), we propose two new gene selection algorithms for general Bayesian models and name these new methods as the confident difference criterion methods. One is based on the standardized differences between two mean expression values among genes; the other adds the differences between two variances to it. The proposed confident difference criterion methods first evaluate the posterior probability of a gene having different gene expressions between competitive samples and then declare a gene to be DE if the posterior probability is large. The theoretical connection between the proposed first method based on the means and the Bayes factor approach proposed by Yu et al. (Yu F, Chen M-H, Kuo L. Detecting differentially expressed genes using alibrated Bayes factors. Statistica Sinica. 2008;18:783-802) is established under the normal-normal-model with equal variances between two samples. The empirical performance of the proposed methods is examined and compared to those of several existing methods via several simulations. The results from these simulation studies show that the proposed confident difference criterion methods outperform the existing methods when comparing gene expressions across different conditions for both microarray studies and sequence-based high-throughput studies. A real dataset is used to further demonstrate the proposed methodology. In the real data application, the confident difference criterion methods successfully identified more clinically important DE genes than the other methods. The confident difference criterion method proposed in this paper provides a new efficient approach for both microarray studies and sequence-based high-throughput studies to identify differentially expressed genes.

  7. Training set selection for the prediction of essential genes.

    PubMed

    Cheng, Jian; Xu, Zhao; Wu, Wenwu; Zhao, Li; Li, Xiangchen; Liu, Yanlin; Tao, Shiheng

    2014-01-01

    Various computational models have been developed to transfer annotations of gene essentiality between organisms. However, despite the increasing number of microorganisms with well-characterized sets of essential genes, selection of appropriate training sets for predicting the essential genes of poorly-studied or newly sequenced organisms remains challenging. In this study, a machine learning approach was applied reciprocally to predict the essential genes in 21 microorganisms. Results showed that training set selection greatly influenced predictive accuracy. We determined four criteria for training set selection: (1) essential genes in the selected training set should be reliable; (2) the growth conditions in which essential genes are defined should be consistent in training and prediction sets; (3) species used as training set should be closely related to the target organism; and (4) organisms used as training and prediction sets should exhibit similar phenotypes or lifestyles. We then analyzed the performance of an incomplete training set and an integrated training set with multiple organisms. We found that the size of the training set should be at least 10% of the total genes to yield accurate predictions. Additionally, the integrated training sets exhibited remarkable increase in stability and accuracy compared with single sets. Finally, we compared the performance of the integrated training sets with the four criteria and with random selection. The results revealed that a rational selection of training sets based on our criteria yields better performance than random selection. Thus, our results provide empirical guidance on training set selection for the identification of essential genes on a genome-wide scale.

  8. Computational Selection of Transcriptomics Experiments Improves Guilt-by-Association Analyses

    PubMed Central

    Bhat, Prajwal; Yang, Haixuan; Bögre, László; Devoto, Alessandra; Paccanaro, Alberto

    2012-01-01

    The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. However, the use of such large collections could hamper GBA functional analysis for genes whose expression is condition specific. In these cases a smaller set of condition related experiments should instead be used, but identifying such functionally relevant experiments from large collections based on literature knowledge alone is an impractical task. We begin this paper by analyzing, both from a mathematical and a biological point of view, why only condition specific experiments should be used in GBA functional analysis. We are able to show that this phenomenon is independent of the functional categorization scheme and of the organisms being analyzed. We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. Our algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for yeast and Arabidopsis. We demonstrate that: using the selected experiments there is a statistically significant improvement in correlation between genes in the functional category of interest; the selected experiments improve GBA-based gene function prediction; the effectiveness of the selected experiments increases with annotation specificity; our algorithm can be successfully applied to GBA-based pathway reconstruction. Importantly, the set of experiments selected by the algorithm reflects the existing literature knowledge about the experiments. [A MATLAB implementation of the algorithm and all the data used in this paper can be downloaded from the paper website: http://www.paccanarolab.org/papers/CorrGene/]. PMID:22879875

  9. Detection of Pathways Affected by Positive Selection in Primate Lineages Ancestral to Humans

    PubMed Central

    Moretti, S.; Davydov, I.I.; Excoffier, L.

    2017-01-01

    Abstract Gene set enrichment approaches have been increasingly successful in finding signals of recent polygenic selection in the human genome. In this study, we aim at detecting biological pathways affected by positive selection in more ancient human evolutionary history. Focusing on four branches of the primate tree that lead to modern humans, we tested all available protein coding gene trees of the Primates clade for signals of adaptation in these branches, using the likelihood-based branch site test of positive selection. The results of these locus-specific tests were then used as input for a gene set enrichment test, where whole pathways are globally scored for a signal of positive selection, instead of focusing only on outlier “significant” genes. We identified signals of positive selection in several pathways that are mainly involved in immune response, sensory perception, metabolism, and energy production. These pathway-level results are highly significant, even though there is no functional enrichment when only focusing on top scoring genes. Interestingly, several gene sets are found significant at multiple levels in the phylogeny, but different genes are responsible for the selection signal in the different branches. This suggests that the same function has been optimized in different ways at different times in primate evolution. PMID:28333345

  10. Using variable rate models to identify genes under selection in sequence pairs: their validity and limitations for EST sequences.

    PubMed

    Church, Sheri A; Livingstone, Kevin; Lai, Zhao; Kozik, Alexander; Knapp, Steven J; Michelmore, Richard W; Rieseberg, Loren H

    2007-02-01

    Using likelihood-based variable selection models, we determined if positive selection was acting on 523 EST sequence pairs from two lineages of sunflower and lettuce. Variable rate models are generally not used for comparisons of sequence pairs due to the limited information and the inaccuracy of estimates of specific substitution rates. However, previous studies have shown that the likelihood ratio test (LRT) is reliable for detecting positive selection, even with low numbers of sequences. These analyses identified 56 genes that show a signature of selection, of which 75% were not identified by simpler models that average selection across codons. Subsequent mapping studies in sunflower show four of five of the positively selected genes identified by these methods mapped to domestication QTLs. We discuss the validity and limitations of using variable rate models for comparisons of sequence pairs, as well as the limitations of using ESTs for identification of positively selected genes.

  11. Grouped gene selection and multi-classification of acute leukemia via new regularized multinomial regression.

    PubMed

    Li, Juntao; Wang, Yanyan; Jiang, Tao; Xiao, Huimin; Song, Xuekun

    2018-05-09

    Diagnosing acute leukemia is the necessary prerequisite to treating it. Multi-classification on the gene expression data of acute leukemia is help for diagnosing it which contains B-cell acute lymphoblastic leukemia (BALL), T-cell acute lymphoblastic leukemia (TALL) and acute myeloid leukemia (AML). However, selecting cancer-causing genes is a challenging problem in performing multi-classification. In this paper, weighted gene co-expression networks are employed to divide the genes into groups. Based on the dividing groups, a new regularized multinomial regression with overlapping group lasso penalty (MROGL) has been presented to simultaneously perform multi-classification and select gene groups. By implementing this method on three-class acute leukemia data, the grouped genes which work synergistically are identified, and the overlapped genes shared by different groups are also highlighted. Moreover, MROGL outperforms other five methods on multi-classification accuracy. Copyright © 2017. Published by Elsevier B.V.

  12. Identifying Genetic Signatures of Natural Selection Using Pooled Population Sequencing in Picea abies

    PubMed Central

    Chen, Jun; Källman, Thomas; Ma, Xiao-Fei; Zaina, Giusi; Morgante, Michele; Lascoux, Martin

    2016-01-01

    The joint inference of selection and past demography remain a costly and demanding task. We used next generation sequencing of two pools of 48 Norway spruce mother trees, one corresponding to the Fennoscandian domain, and the other to the Alpine domain, to assess nucleotide polymorphism at 88 nuclear genes. These genes are candidate genes for phenological traits, and most belong to the photoperiod pathway. Estimates of population genetic summary statistics from the pooled data are similar to previous estimates, suggesting that pooled sequencing is reliable. The nonsynonymous SNPs tended to have both lower frequency differences and lower FST values between the two domains than silent ones. These results suggest the presence of purifying selection. The divergence between the two domains based on synonymous changes was around 5 million yr, a time similar to a recent phylogenetic estimate of 6 million yr, but much larger than earlier estimates based on isozymes. Two approaches, one of them novel and that considers both FST and difference in allele frequencies between the two domains, were used to identify SNPs potentially under diversifying selection. SNPs from around 20 genes were detected, including genes previously identified as main target for selection, such as PaPRR3 and PaGI. PMID:27172202

  13. Identifying Genetic Signatures of Natural Selection Using Pooled Population Sequencing in Picea abies.

    PubMed

    Chen, Jun; Källman, Thomas; Ma, Xiao-Fei; Zaina, Giusi; Morgante, Michele; Lascoux, Martin

    2016-07-07

    The joint inference of selection and past demography remain a costly and demanding task. We used next generation sequencing of two pools of 48 Norway spruce mother trees, one corresponding to the Fennoscandian domain, and the other to the Alpine domain, to assess nucleotide polymorphism at 88 nuclear genes. These genes are candidate genes for phenological traits, and most belong to the photoperiod pathway. Estimates of population genetic summary statistics from the pooled data are similar to previous estimates, suggesting that pooled sequencing is reliable. The nonsynonymous SNPs tended to have both lower frequency differences and lower FST values between the two domains than silent ones. These results suggest the presence of purifying selection. The divergence between the two domains based on synonymous changes was around 5 million yr, a time similar to a recent phylogenetic estimate of 6 million yr, but much larger than earlier estimates based on isozymes. Two approaches, one of them novel and that considers both FST and difference in allele frequencies between the two domains, were used to identify SNPs potentially under diversifying selection. SNPs from around 20 genes were detected, including genes previously identified as main target for selection, such as PaPRR3 and PaGI. Copyright © 2016 Chen et al.

  14. Positive selection on D-lactate dehydrogenases of Lactobacillus delbrueckii subspecies bulgaricus.

    PubMed

    Zhang, Jifeng; Gong, Guangyu; Wang, Xiao; Zhang, Hao; Tian, Weidong

    2015-08-01

    Lactobacillus delbrueckii has been widely used for yogurt fermentation. It has genes encoding both D- and L-type lactate dehydrogenases (LDHs) that catalyse the production of L(+) or D(-) stereoisomer of lactic acid. D-lactic acid is the primary lactate product by L. delbrueckii, yet it cannot be metabolised by human intestine. Since it has been domesticated for long time, an interesting question arises regarding to whether the selection pressure has affected the evolution of both L-LDH and D-LDH genes in the genome. To answer this question, in this study the authors first investigated the evolution of these two genes by constructing phylogenetic trees. They found that D-LDH-based phylogenetic tree could better represent the phylogenetic relationship in the acidophilus complex than L-LDH-based tree. They next investigated the evolutions of LDH genes of L. delbrueckii at amino acid level, and found that D-LDH gene in L. delbrueckii is positively selected, possibly a consequence of long-term domestication. They further identified four amino acids that are under positive selection. One of them, V261, is located at the centre of three catalytic active sites, indicating likely functional effects on the enzyme activity. The selection from the domestication process thus provides direction for future engineering of D-LDH.

  15. Evolution and the complexity of bacteriophages.

    PubMed

    Serwer, Philip

    2007-03-13

    The genomes of both long-genome (> 200 Kb) bacteriophages and long-genome eukaryotic viruses have cellular gene homologs whose selective advantage is not explained. These homologs add genomic and possibly biochemical complexity. Understanding their significance requires a definition of complexity that is more biochemically oriented than past empirically based definitions. Initially, I propose two biochemistry-oriented definitions of complexity: either decreased randomness or increased encoded information that does not serve immediate needs. Then, I make the assumption that these two definitions are equivalent. This assumption and recent data lead to the following four-part hypothesis that explains the presence of cellular gene homologs in long bacteriophage genomes and also provides a pathway for complexity increases in prokaryotic cells: (1) Prokaryotes underwent evolutionary increases in biochemical complexity after the eukaryote/prokaryote splits. (2) Some of the complexity increases occurred via multi-step, weak selection that was both protected from strong selection and accelerated by embedding evolving cellular genes in the genomes of bacteriophages and, presumably, also archaeal viruses (first tier selection). (3) The mechanisms for retaining cellular genes in viral genomes evolved under additional, longer-term selection that was stronger (second tier selection). (4) The second tier selection was based on increased access by prokaryotic cells to improved biochemical systems. This access was achieved when DNA transfer moved to prokaryotic cells both the more evolved genes and their more competitive and complex biochemical systems. I propose testing this hypothesis by controlled evolution in microbial communities to (1) determine the effects of deleting individual cellular gene homologs on the growth and evolution of long genome bacteriophages and hosts, (2) find the environmental conditions that select for the presence of cellular gene homologs, (3) determine which, if any, bacteriophage genes were selected for maintaining the homologs and (4) determine the dynamics of homolog evolution. This hypothesis is an explanation of evolutionary leaps in general. If accurate, it will assist both understanding and influencing the evolution of microbes and their communities. Analysis of evolutionary complexity increase for at least prokaryotes should include analysis of genomes of long-genome bacteriophages.

  16. A computational approach to candidate gene prioritization for X-linked mental retardation using annotation-based binary filtering and motif-based linear discriminatory analysis

    PubMed Central

    2011-01-01

    Background Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches - the examination of similarities to known disease genes and/or the evaluation of functional annotation of genes. Each of these approaches has its own caveats. Here we employ a previously described method of candidate gene prioritization based mainly on gene annotation, in accompaniment with a technique based on the evaluation of pertinent sequence motifs or signatures, in an attempt to refine the gene prioritization approach. We apply this approach to X-linked mental retardation (XLMR), a group of heterogeneous disorders for which some of the underlying genetics is known. Results The gene annotation-based binary filtering method yielded a ranked list of putative XLMR candidate genes with good plausibility of being associated with the development of mental retardation. In parallel, a motif finding approach based on linear discriminatory analysis (LDA) was employed to identify short sequence patterns that may discriminate XLMR from non-XLMR genes. High rates (>80%) of correct classification was achieved, suggesting that the identification of these motifs effectively captures genomic signals associated with XLMR vs. non-XLMR genes. The computational tools developed for the motif-based LDA is integrated into the freely available genomic analysis portal Galaxy (http://main.g2.bx.psu.edu/). Nine genes (APLN, ZC4H2, MAGED4, MAGED4B, RAP2C, FAM156A, FAM156B, TBL1X, and UXT) were highlighted as highly-ranked XLMR methods. Conclusions The combination of gene annotation information and sequence motif-orientated computational candidate gene prediction methods highlight an added benefit in generating a list of plausible candidate genes, as has been demonstrated for XLMR. Reviewers: This article was reviewed by Dr Barbara Bardoni (nominated by Prof Juergen Brosius); Prof Neil Smalheiser and Dr Dustin Holloway (nominated by Prof Charles DeLisi). PMID:21668950

  17. Communicative genes in the evolution of empathy and altruism.

    PubMed

    Buck, Ross

    2011-11-01

    This paper discusses spontaneous communication and its implications for understanding empathy and altruism. The question of the possibility of "true" altruism-giving up one's genetic potential in favor of the genetic potential of another-is a fundamental issue common to the biological, behavioral, and social sciences. Darwin regarded "social instincts and sympathies" to be critical to the social order, but the possibility of biologically-based prosocial motives and emotions was questioned when selection was interpreted as operating at the level of the gene. In the selfish gene hypothesis, Dawkins argued that the unit of evolutionary selection must be an active, germ-line replicator: a unit whose activities determine whether copies of it are made across evolutionary timescales. He argued that the only active replicator existing across evolutionary timescales is the gene, so that the "selfish gene" is a replicator motivated only to make copies of itself. The communicative gene hypothesis notes that genes function by communicating, and the phenotype communication involves not only the individual sending and receiving abilities of the individual genes involved, but also the relationship between them relative to other genes. Therefore the selection of communication as phenotype involves the selection of individual genes and also their relationship. Relationships become replicators, and are selected across evolutionary timescales including social relationships (e.g., sex, nurturance, dominance-submission). An interesting implication of this view: apparent altruism has been interpreted by selfish gene theorists as due to kin selection and reciprocity, in which the survival of kin and comrade indirectly favor the genetic potential of the altruist. From the viewpoint of the communicative gene hypothesis, rather than underlying altruism, kin selection and reciprocity are ways of restricting altruism to kin and comrade: they are mechanisms not of altruism but of xenophobia.

  18. Determining Cutoff Point of Ensemble Trees Based on Sample Size in Predicting Clinical Dose with DNA Microarray Data.

    PubMed

    Yılmaz Isıkhan, Selen; Karabulut, Erdem; Alpar, Celal Reha

    2016-01-01

    Background/Aim . Evaluating the success of dose prediction based on genetic or clinical data has substantially advanced recently. The aim of this study is to predict various clinical dose values from DNA gene expression datasets using data mining techniques. Materials and Methods . Eleven real gene expression datasets containing dose values were included. First, important genes for dose prediction were selected using iterative sure independence screening. Then, the performances of regression trees (RTs), support vector regression (SVR), RT bagging, SVR bagging, and RT boosting were examined. Results . The results demonstrated that a regression-based feature selection method substantially reduced the number of irrelevant genes from raw datasets. Overall, the best prediction performance in nine of 11 datasets was achieved using SVR; the second most accurate performance was provided using a gradient-boosting machine (GBM). Conclusion . Analysis of various dose values based on microarray gene expression data identified common genes found in our study and the referenced studies. According to our findings, SVR and GBM can be good predictors of dose-gene datasets. Another result of the study was to identify the sample size of n = 25 as a cutoff point for RT bagging to outperform a single RT.

  19. Comparison of genome-wide selection strategies to identify furfural tolerance genes in Escherichia coli.

    PubMed

    Glebes, Tirzah Y; Sandoval, Nicholas R; Gillis, Jacob H; Gill, Ryan T

    2015-01-01

    Engineering both feedstock and product tolerance is important for transitioning towards next-generation biofuels derived from renewable sources. Tolerance to chemical inhibitors typically results in complex phenotypes, for which multiple genetic changes must often be made to confer tolerance. Here, we performed a genome-wide search for furfural-tolerant alleles using the TRackable Multiplex Recombineering (TRMR) method (Warner et al. (2010), Nature Biotechnology), which uses chromosomally integrated mutations directed towards increased or decreased expression of virtually every gene in Escherichia coli. We employed various growth selection strategies to assess the role of selection design towards growth enrichments. We also compared genes with increased fitness from our TRMR selection to those from a previously reported genome-wide identification study of furfural tolerance genes using a plasmid-based genomic library approach (Glebes et al. (2014) PLOS ONE). In several cases, growth improvements were observed for the chromosomally integrated promoter/RBS mutations but not for the plasmid-based overexpression constructs. Through this assessment, four novel tolerance genes, ahpC, yhjH, rna, and dicA, were identified and confirmed for their effect on improving growth in the presence of furfural. © 2014 Wiley Periodicals, Inc.

  20. Integrative functional analyses using rainbow trout selected for tolerance to plant diets reveal nutrigenomic signatures for soy utilization without the concurrence of enteritis.

    PubMed

    Abernathy, Jason; Brezas, Andreas; Snekvik, Kevin R; Hardy, Ronald W; Overturf, Ken

    2017-01-01

    Finding suitable alternative protein sources for diets of carnivorous fish species remains a major concern for sustainable aquaculture. Through genetic selection, we created a strain of rainbow trout that outperforms parental lines in utilizing an all-plant protein diet and does not develop enteritis in the distal intestine, as is typical with salmonids on long-term plant protein-based feeds. By incorporating this strain into functional analyses, we set out to determine which genes are critical to plant protein utilization in the absence of gut inflammation. After a 12-week feeding trial with our selected strain and a control trout strain fed either a fishmeal-based diet or an all-plant protein diet, high-throughput RNA sequencing was completed on both liver and muscle tissues. Differential gene expression analyses, weighted correlation network analyses and further functional characterization were performed. A strain-by-diet design revealed differential expression ranging from a few dozen to over one thousand genes among the various comparisons and tissues. Major gene ontology groups identified between comparisons included those encompassing central, intermediary and foreign molecule metabolism, associated biosynthetic pathways as well as immunity. A systems approach indicated that genes involved in purine metabolism were highly perturbed. Systems analysis among the tissues tested further suggests the interplay between selection for growth, dietary utilization and protein tolerance may also have implications for nonspecific immunity. By combining data from differential gene expression and co-expression networks using selected trout, along with ontology and pathway analyses, a set of 63 candidate genes for plant diet tolerance was found. Risk loci in human inflammatory bowel diseases were also found in our datasets, indicating rainbow trout selected for plant-diet tolerance may have added utility as a potential biomedical model.

  1. Identification of Single- and Multiple-Class Specific Signature Genes from Gene Expression Profiles by Group Marker Index

    PubMed Central

    Tsai, Yu-Shuen; Aguan, Kripamoy; Pal, Nikhil R.; Chung, I-Fang

    2011-01-01

    Informative genes from microarray data can be used to construct prediction model and investigate biological mechanisms. Differentially expressed genes, the main targets of most gene selection methods, can be classified as single- and multiple-class specific signature genes. Here, we present a novel gene selection algorithm based on a Group Marker Index (GMI), which is intuitive, of low-computational complexity, and efficient in identification of both types of genes. Most gene selection methods identify only single-class specific signature genes and cannot identify multiple-class specific signature genes easily. Our algorithm can detect de novo certain conditions of multiple-class specificity of a gene and makes use of a novel non-parametric indicator to assess the discrimination ability between classes. Our method is effective even when the sample size is small as well as when the class sizes are significantly different. To compare the effectiveness and robustness we formulate an intuitive template-based method and use four well-known datasets. We demonstrate that our algorithm outperforms the template-based method in difficult cases with unbalanced distribution. Moreover, the multiple-class specific genes are good biomarkers and play important roles in biological pathways. Our literature survey supports that the proposed method identifies unique multiple-class specific marker genes (not reported earlier to be related to cancer) in the Central Nervous System data. It also discovers unique biomarkers indicating the intrinsic difference between subtypes of lung cancer. We also associate the pathway information with the multiple-class specific signature genes and cross-reference to published studies. We find that the identified genes participate in the pathways directly involved in cancer development in leukemia data. Our method gives a promising way to find genes that can involve in pathways of multiple diseases and hence opens up the possibility of using an existing drug on other diseases as well as designing a single drug for multiple diseases. PMID:21909426

  2. Evaluation of two outlier-detection-based methods for detecting tissue-selective genes from microarray data.

    PubMed

    Kadota, Koji; Konishi, Tomokazu; Shimizu, Kentaro

    2007-05-01

    Large-scale expression profiling using DNA microarrays enables identification of tissue-selective genes for which expression is considerably higher and/or lower in some tissues than in others. Among numerous possible methods, only two outlier-detection-based methods (an AIC-based method and Sprent's non-parametric method) can treat equally various types of selective patterns, but they produce substantially different results. We investigated the performance of these two methods for different parameter settings and for a reduced number of samples. We focused on their ability to detect selective expression patterns robustly. We applied them to public microarray data collected from 36 normal human tissue samples and analyzed the effects of both changing the parameter settings and reducing the number of samples. The AIC-based method was more robust in both cases. The findings confirm that the use of the AIC-based method in the recently proposed ROKU method for detecting tissue-selective expression patterns is correct and that Sprent's method is not suitable for ROKU.

  3. An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways

    PubMed Central

    Ander, Bradley P.; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R.; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with ‘large p, small n’ problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed. PMID:23844055

  4. An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways.

    PubMed

    Peng, Bin; Zhu, Dianwen; Ander, Bradley P; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.

  5. Framework for reanalysis of publicly available Affymetrix® GeneChip® data sets based on functional regions of interest.

    PubMed

    Saka, Ernur; Harrison, Benjamin J; West, Kirk; Petruska, Jeffrey C; Rouchka, Eric C

    2017-12-06

    Since the introduction of microarrays in 1995, researchers world-wide have used both commercial and custom-designed microarrays for understanding differential expression of transcribed genes. Public databases such as ArrayExpress and the Gene Expression Omnibus (GEO) have made millions of samples readily available. One main drawback to microarray data analysis involves the selection of probes to represent a specific transcript of interest, particularly in light of the fact that transcript-specific knowledge (notably alternative splicing) is dynamic in nature. We therefore developed a framework for reannotating and reassigning probe groups for Affymetrix® GeneChip® technology based on functional regions of interest. This framework addresses three issues of Affymetrix® GeneChip® data analyses: removing nonspecific probes, updating probe target mapping based on the latest genome knowledge and grouping probes into gene, transcript and region-based (UTR, individual exon, CDS) probe sets. Updated gene and transcript probe sets provide more specific analysis results based on current genomic and transcriptomic knowledge. The framework selects unique probes, aligns them to gene annotations and generates a custom Chip Description File (CDF). The analysis reveals only 87% of the Affymetrix® GeneChip® HG-U133 Plus 2 probes uniquely align to the current hg38 human assembly without mismatches. We also tested new mappings on the publicly available data series using rat and human data from GSE48611 and GSE72551 obtained from GEO, and illustrate that functional grouping allows for the subtle detection of regions of interest likely to have phenotypical consequences. Through reanalysis of the publicly available data series GSE48611 and GSE72551, we profiled the contribution of UTR and CDS regions to the gene expression levels globally. The comparison between region and gene based results indicated that the detected expressed genes by gene-based and region-based CDFs show high consistency and regions based results allows us to detection of changes in transcript formation.

  6. Ensemble Feature Learning of Genomic Data Using Support Vector Machine

    PubMed Central

    Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.

    2016-01-01

    The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923

  7. Detection of biomarkers for Hepatocellular Carcinoma using a hybrid univariate gene selection methods

    PubMed Central

    2012-01-01

    Background Discovering new biomarkers has a great role in improving early diagnosis of Hepatocellular carcinoma (HCC). The experimental determination of biomarkers needs a lot of time and money. This motivates this work to use in-silico prediction of biomarkers to reduce the number of experiments required for detecting new ones. This is achieved by extracting the most representative genes in microarrays of HCC. Results In this work, we provide a method for extracting the differential expressed genes, up regulated ones, that can be considered candidate biomarkers in high throughput microarrays of HCC. We examine the power of several gene selection methods (such as Pearson’s correlation coefficient, Cosine coefficient, Euclidean distance, Mutual information and Entropy with different estimators) in selecting informative genes. A biological interpretation of the highly ranked genes is done using KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, ENTREZ and DAVID (Database for Annotation, Visualization, and Integrated Discovery) databases. The top ten genes selected using Pearson’s correlation coefficient and Cosine coefficient contained six genes that have been implicated in cancer (often multiple cancers) genesis in previous studies. A fewer number of genes were obtained by the other methods (4 genes using Mutual information, 3genes using Euclidean distance and only one gene using Entropy). A better result was obtained by the utilization of a hybrid approach based on intersecting the highly ranked genes in the output of all investigated methods. This hybrid combination yielded seven genes (2 genes for HCC and 5 genes in different types of cancer) in the top ten genes of the list of intersected genes. Conclusions To strengthen the effectiveness of the univariate selection methods, we propose a hybrid approach by intersecting several of these methods in a cascaded manner. This approach surpasses all of univariate selection methods when used individually according to biological interpretation and the examination of gene expression signal profiles. PMID:22867264

  8. Selective elimination of long INterspersed element-1 expressing tumour cells by targeted expression of the HSV-TK suicide gene

    PubMed Central

    Chendeb, Mariam; Schneider, Robert; Davidson, Irwin; Fadloun, Anas

    2017-01-01

    In gene therapy, effective and selective suicide gene expression is crucial. We exploited the endogenous Long INterspersed Element-1 (L1) machinery often reactivated in human cancers to integrate the Herpes Simplex Virus Thymidine Kinase (HSV-TK) suicide gene selectively into the genome of cancer cells. We developed a plasmid-based system directing HSV-TK expression only when reverse transcribed and integrated in the host genome via the endogenous L1 ORF1/2 proteins and an Alu element. Delivery of these new constructs into cells followed by Ganciclovir (GCV) treatment selectively induced mortality of L1 ORF1/2 protein expressing cancer cells, but had no effect on primary cells that do not express L1 ORF1/2. This novel strategy for selective targeting of tumour cells provides high tolerability as the HSV-TK gene cannot be expressed without reverse transcription and integration, and high selectivity as these processes take place only in cancer cells expressing high levels of functional L1 ORF1/2. PMID:28415677

  9. A power set-based statistical selection procedure to locate susceptible rare variants associated with complex traits with sequencing data.

    PubMed

    Sun, Hokeun; Wang, Shuang

    2014-08-15

    Existing association methods for rare variants from sequencing data have focused on aggregating variants in a gene or a genetic region because of the fact that analysing individual rare variants is underpowered. However, these existing rare variant detection methods are not able to identify which rare variants in a gene or a genetic region of all variants are associated with the complex diseases or traits. Once phenotypic associations of a gene or a genetic region are identified, the natural next step in the association study with sequencing data is to locate the susceptible rare variants within the gene or the genetic region. In this article, we propose a power set-based statistical selection procedure that is able to identify the locations of the potentially susceptible rare variants within a disease-related gene or a genetic region. The selection performance of the proposed selection procedure was evaluated through simulation studies, where we demonstrated the feasibility and superior power over several comparable existing methods. In particular, the proposed method is able to handle the mixed effects when both risk and protective variants are present in a gene or a genetic region. The proposed selection procedure was also applied to the sequence data on the ANGPTL gene family from the Dallas Heart Study to identify potentially susceptible rare variants within the trait-related genes. An R package 'rvsel' can be downloaded from http://www.columbia.edu/∼sw2206/ and http://statsun.pusan.ac.kr. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Exploring signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry

    PubMed Central

    2013-01-01

    Background Currently, there is very limited knowledge about the genes involved in normal pigmentation variation in East Asian populations. We carried out a genome-wide scan of signatures of positive selection using the 1000 Genomes Phase I dataset, in order to identify pigmentation genes showing putative signatures of selective sweeps in East Asia. We applied a broad range of methods to detect signatures of selection including: 1) Tests designed to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima’s D, Fay and Wu’s H and Fu and Li’s D* and F*), 2) Tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb) and 3) Tests based on genetic differentiation between populations (LSBL). Based on the results obtained from a genome wide analysis of 25 kb windows, we constructed an empirical distribution for each statistic across all windows, and identified pigmentation genes that are outliers in the distribution. Results Our tests identified twenty genes that are relevant for pigmentation biology. Of these, eight genes (ATRN, EDAR, KLHL7, MITF, OCA2, TH, TMEM33 and TRPM1,) were extreme outliers (top 0.1% of the empirical distribution) for at least one statistic, and twelve genes (ADAM17, BNC2, CTSD, DCT, EGFR, LYST, MC1R, MLPH, OPRM1, PDIA6, PMEL (SILV) and TYRP1) were in the top 1% of the empirical distribution for at least one statistic. Additionally, eight of these genes (BNC2, EGFR, LYST, MC1R, OCA2, OPRM1, PMEL (SILV) and TYRP1) have been associated with pigmentary traits in association studies. Conclusions We identified a number of putative pigmentation genes showing extremely unusual patterns of genetic variation in East Asia. Most of these genes are outliers for different tests and/or different populations, and have already been described in previous scans for positive selection, providing strong support to the hypothesis that recent selective sweeps left a signature in these regions. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes in normal pigmentation variation. PMID:23848512

  11. Exploring signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry.

    PubMed

    Hider, Jessica L; Gittelman, Rachel M; Shah, Tapan; Edwards, Melissa; Rosenbloom, Arnold; Akey, Joshua M; Parra, Esteban J

    2013-07-12

    Currently, there is very limited knowledge about the genes involved in normal pigmentation variation in East Asian populations. We carried out a genome-wide scan of signatures of positive selection using the 1000 Genomes Phase I dataset, in order to identify pigmentation genes showing putative signatures of selective sweeps in East Asia. We applied a broad range of methods to detect signatures of selection including: 1) Tests designed to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima's D, Fay and Wu's H and Fu and Li's D* and F*), 2) Tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb) and 3) Tests based on genetic differentiation between populations (LSBL). Based on the results obtained from a genome wide analysis of 25 kb windows, we constructed an empirical distribution for each statistic across all windows, and identified pigmentation genes that are outliers in the distribution. Our tests identified twenty genes that are relevant for pigmentation biology. Of these, eight genes (ATRN, EDAR, KLHL7, MITF, OCA2, TH, TMEM33 and TRPM1,) were extreme outliers (top 0.1% of the empirical distribution) for at least one statistic, and twelve genes (ADAM17, BNC2, CTSD, DCT, EGFR, LYST, MC1R, MLPH, OPRM1, PDIA6, PMEL (SILV) and TYRP1) were in the top 1% of the empirical distribution for at least one statistic. Additionally, eight of these genes (BNC2, EGFR, LYST, MC1R, OCA2, OPRM1, PMEL (SILV) and TYRP1) have been associated with pigmentary traits in association studies. We identified a number of putative pigmentation genes showing extremely unusual patterns of genetic variation in East Asia. Most of these genes are outliers for different tests and/or different populations, and have already been described in previous scans for positive selection, providing strong support to the hypothesis that recent selective sweeps left a signature in these regions. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes in normal pigmentation variation.

  12. Population Level Purifying Selection and Gene Expression Shape Subgenome Evolution in Maize.

    PubMed

    Pophaly, Saurabh D; Tellier, Aurélien

    2015-12-01

    The maize ancestor experienced a recent whole-genome duplication (WGD) followed by gene erosion which generated two subgenomes, the dominant subgenome (maize1) experiencing fewer deletions than maize2. We take advantage of available extensive polymorphism and gene expression data in maize to study purifying selection and gene expression divergence between WGD retained paralog pairs. We first report a strong correlation in nucleotide diversity between duplicate pairs, except for upstream regions. We then show that maize1 genes are under stronger purifying selection than maize2. WGD retained genes have higher gene dosage and biased Gene Ontologies consistent with previous studies. The relative gene expression of paralogs across tissues demonstrates that 98% of duplicate pairs have either subfunctionalized in a tissuewise manner or have diverged consistently in their expression thereby preventing functional complementation. Tissuewise subfunctionalization seems to be a hallmark of transcription factors, whereas consistent repression occurs for macromolecular complexes. We show that dominant gene expression is a strong determinant of the strength of purifying selection, explaining the inferred stronger negative selection on maize1 genes. We propose a novel expression-based classification of duplicates which is more robust to explain observed polymorphism patterns than the subgenome location. Finally, upstream regions of repressed genes exhibit an enrichment in transposable elements which indicates a possible mechanism for expression divergence. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  13. High degree of genetic differentiation in marine three-spined sticklebacks (Gasterosteus aculeatus).

    PubMed

    Defaveri, Jacquelin; Shikano, Takahito; Shimada, Yukinori; Merilä, Juha

    2013-09-01

    Populations of widespread marine organisms are typically characterized by a low degree of genetic differentiation in neutral genetic markers, but much less is known about differentiation in genes whose functional roles are associated with specific selection regimes. To uncover possible adaptive population divergence and heterogeneous genomic differentiation in marine three-spined sticklebacks (Gasterosteus aculeatus), we used a candidate gene-based genome-scan approach to analyse variability in 138 microsatellite loci located within/close to (<6 kb) functionally important genes in samples collected from ten geographic locations. The degree of genetic differentiation in markers classified as neutral or under balancing selection-as determined with several outlier detection methods-was low (F(ST) = 0.033 or 0.011, respectively), whereas average FST for directionally selected markers was significantly higher (F(ST) = 0.097). Clustering analyses provided support for genomic and geographic heterogeneity in selection: six genetic clusters were identified based on allele frequency differences in the directionally selected loci, whereas four were identified with the neutral loci. Allelic variation in several loci exhibited significant associations with environmental variables, supporting the conjecture that temperature and salinity, but not optic conditions, are important drivers of adaptive divergence among populations. In general, these results suggest that in spite of the high degree of physical connectivity and gene flow as inferred from neutral marker genes, marine stickleback populations are strongly genetically structured in loci associated with functionally relevant genes. © 2013 John Wiley & Sons Ltd.

  14. Phylogenetic analysis of Rutaceous plants based on single nucleotide polymorphism in chloroplast and nuclear gene sequences

    USDA-ARS?s Scientific Manuscript database

    The family Rutaceae encompasses several genera including the economically important genus Citrus. In this study, we selected 22 citrus relatives belonging to the various sub groups of Rutaceae and compared the sequences of three gene fragments. The accessions selected belong to the subfamily Rutoide...

  15. Multilocus patterns of polymorphism and selection across the X chromosome of Caenorhabditis remanei.

    PubMed

    Cutter, Asher D

    2008-03-01

    Natural selection and neutral processes such as demography, mutation, and gene conversion all contribute to patterns of polymorphism within genomes. Identifying the relative importance of these varied components in evolution provides the principal challenge for population genetics. To address this issue in the nematode Caenorhabditis remanei, I sampled nucleotide polymorphism at 40 loci across the X chromosome. The site-frequency spectrum for these loci provides no evidence for population size change, and one locus presents a candidate for linkage to a target of balancing selection. Selection for codon usage bias leads to the non-neutrality of synonymous sites, and despite its weak magnitude of effect (N(e)s approximately 0.1), is responsible for profound patterns of diversity and divergence in the C. remanei genome. Although gene conversion is evident for many loci, biased gene conversion is not identified as a significant evolutionary process in this sample. No consistent association is observed between synonymous-site diversity and linkage-disequilibrium-based estimators of the population recombination parameter, despite theoretical predictions about background selection or widespread genetic hitchhiking, but genetic map-based estimates of recombination are needed to rigorously test for a diversity-recombination relationship. Coalescent simulations also illustrate how a spurious correlation between diversity and linkage-disequilibrium-based estimators of recombination can occur, due in part to the presence of unbiased gene conversion. These results illustrate the influence that subtle natural selection can exert on polymorphism and divergence, in the form of codon usage bias, and demonstrate the potential of C. remanei for detecting natural selection from genomic scans of polymorphism.

  16. A compendium and functional characterization of mammalian genes involved in adaptation to Arctic or Antarctic environments.

    PubMed

    Yudin, Nikolay S; Larkin, Denis M; Ignatieva, Elena V

    2017-12-28

    Many mammals are well adapted to surviving in extremely cold environments. These species have likely accumulated genetic changes that help them efficiently cope with low temperatures. It is not known whether the same genes related to cold adaptation in one species would be under selection in another species. The aims of this study therefore were: to create a compendium of mammalian genes related to adaptations to a low temperature environment; to identify genes related to cold tolerance that have been subjected to independent positive selection in several species; to determine promising candidate genes/pathways/organs for further empirical research on cold adaptation in mammals. After a search for publications containing keywords: "whole genome", "transcriptome or exome sequencing data", and "genome-wide genotyping array data" authors looked for information related to genetic signatures ascribable to positive selection in Arctic or Antarctic mammalian species. Publications related to Human, Arctic fox, Yakut horse, Mammoth, Polar bear, and Minke whale were chosen. The compendium of genes that potentially underwent positive selection in >1 of these six species consisted of 416 genes. Twelve of them showed traces of positive selection in three species. Gene ontology term enrichment analysis of 416 genes from the compendium has revealed 13 terms relevant to the scope of this study. We found that enriched terms were relevant to three major groups: terms associated with collagen proteins and the extracellular matrix; terms associated with the anatomy and physiology of cilium; terms associated with docking. We further revealed that genes from compendium were over-represented in the lists of genes expressed in the lung and liver. A compendium combining mammalian genes involved in adaptation to cold environment was designed, based on the intersection of positively selected genes from six Arctic and Antarctic species. The compendium contained 416 genes that have been positively selected in at least two species. However, we did not reveal any positively selected genes that would be related to cold adaptation in all species from our list. But, our work points to several strong candidate genes involved in mechanisms and biochemical pathways related to cold adaptation response in different species.

  17. [A comparison study of hpt and bar as selection marker gene of transgenic rice].

    PubMed

    Zhang, Chun-Yu; Li, Hong-Yu; Liu, Bin

    2012-12-01

    The decision of using selection marker is one of the key factors for success of plant genetic transformation and offspring screening. As two commonly used selection markers, hpt and bar genes are widely used in tissue culture-based rice transformation. To experimentally compare their performance, we investigated the efficiency of two transformation systems using Hygromycin and Bialaphos as the selection agents, respectively. The result indicated that the system using hpt gene as the selection marker saved 10 days and had double transformation efficiency and lower transgene copy number in comparison to the system using bar gene. Then, we assessed the feasibility of screening transgenic rice in the field by soaking the wild-type and transgenic seeds in a series of solutions containing step diluted hygromycin for two days. We targeted the suitable concentration for distinguishing the transgenic seeds from WT Kitaake seeds was 167 mg L(-1). However, the cost of screening by hygromycin is still much higher than that of Basta in field test. Therefore, this study experimentally demonstrated the advantages and disadvantages of the hpt and bar gene as the selection markers and thus provided a reference for choose of an appropriate selection marker according to the practical applications.

  18. Selection signatures in Shetland ponies.

    PubMed

    Frischknecht, M; Flury, C; Leeb, T; Rieder, S; Neuditschko, M

    2016-06-01

    Shetland ponies were selected for numerous traits including small stature, strength, hardiness and longevity. Despite the different selection criteria, Shetland ponies are well known for their small stature. We performed a selection signature analysis including genome-wide SNPs of 75 Shetland ponies and 76 large-sized horses. Based upon this dataset, we identified a selection signature on equine chromosome (ECA) 1 between 103.8 Mb and 108.5 Mb. A total of 33 annotated genes are located within this interval including the IGF1R gene at 104.2 Mb and the ADAMTS17 gene at 105.4 Mb. These two genes are well known to have a major impact on body height in numerous species including humans. Homozygosity mapping in the Shetland ponies identified a region with increased homozygosity between 107.4 Mb and 108.5 Mb. None of the annotated genes in this region have so far been associated with height. Thus, we cannot exclude the possibility that the identified selection signature on ECA1 is associated with some trait other than height, for which Shetland ponies were selected. © 2016 Stichting International Foundation for Animal Genetics.

  19. LS Bound based gene selection for DNA microarray data.

    PubMed

    Zhou, Xin; Mao, K Z

    2005-04-15

    One problem with discriminant analysis of DNA microarray data is that each sample is represented by quite a large number of genes, and many of them are irrelevant, insignificant or redundant to the discriminant problem at hand. Methods for selecting important genes are, therefore, of much significance in microarray data analysis. In the present study, a new criterion, called LS Bound measure, is proposed to address the gene selection problem. The LS Bound measure is derived from leave-one-out procedure of LS-SVMs (least squares support vector machines), and as the upper bound for leave-one-out classification results it reflects to some extent the generalization performance of gene subsets. We applied this LS Bound measure for gene selection on two benchmark microarray datasets: colon cancer and leukemia. We also compared the LS Bound measure with other evaluation criteria, including the well-known Fisher's ratio and Mahalanobis class separability measure, and other published gene selection algorithms, including Weighting factor and SVM Recursive Feature Elimination. The strength of the LS Bound measure is that it provides gene subsets leading to more accurate classification results than the filter method while its computational complexity is at the level of the filter method. A companion website can be accessed at http://www.ntu.edu.sg/home5/pg02776030/lsbound/. The website contains: (1) the source code of the gene selection algorithm; (2) the complete set of tables and figures regarding the experimental study; (3) proof of the inequality (9). ekzmao@ntu.edu.sg.

  20. A dual selection based, targeted gene replacement tool for Magnaporthe grisea and Fusarium oxysporum.

    PubMed

    Khang, Chang Hyun; Park, Sook-Young; Lee, Yong-Hwan; Kang, Seogchan

    2005-06-01

    Rapid progress in fungal genome sequencing presents many new opportunities for functional genomic analysis of fungal biology through the systematic mutagenesis of the genes identified through sequencing. However, the lack of efficient tools for targeted gene replacement is a limiting factor for fungal functional genomics, as it often necessitates the screening of a large number of transformants to identify the desired mutant. We developed an efficient method of gene replacement and evaluated factors affecting the efficiency of this method using two plant pathogenic fungi, Magnaporthe grisea and Fusarium oxysporum. This method is based on Agrobacterium tumefaciens-mediated transformation with a mutant allele of the target gene flanked by the herpes simplex virus thymidine kinase (HSVtk) gene as a conditional negative selection marker against ectopic transformants. The HSVtk gene product converts 5-fluoro-2'-deoxyuridine to a compound toxic to diverse fungi. Because ectopic transformants express HSVtk, while gene replacement mutants lack HSVtk, growing transformants on a medium amended with 5-fluoro-2'-deoxyuridine facilitates the identification of targeted mutants by counter-selecting against ectopic transformants. In addition to M. grisea and F. oxysporum, the method and associated vectors are likely to be applicable to manipulating genes in a broad spectrum of fungi, thus potentially serving as an efficient, universal functional genomic tool for harnessing the growing body of fungal genome sequence data to study fungal biology.

  1. Development of Transcriptomics-based Biomarkers for Selected Endocrine Disrupting Chemicals in Zebrafish (Danio rerio)

    EPA Science Inventory

    Genome-wide transcriptional profiling by microarrays provides a powerful platform for gene expression-based biomarker discovery. After their wide acceptance in human disease diagnosis, prognosis, and drug discovery, these gene signatures are increasingly being adopted for environ...

  2. Development of transcriptomics-based biomarkers for selected endocrine disrupting chemicals in zebrafish (Danio rerio)

    EPA Science Inventory

    Genome-wide transcriptional profiling by microarrays provides a powerful platform for gene expression-based biomarker discovery. After their wide acceptance in human disease diagnosis, prognosis, and drug discovery, these gene signatures are increasingly being adopted for environ...

  3. Gene-Based Multiclass Cancer Diagnosis with Class-Selective Rejections

    PubMed Central

    Jrad, Nisrine; Grall-Maës, Edith; Beauseroy, Pierre

    2009-01-01

    Supervised learning of microarray data is receiving much attention in recent years. Multiclass cancer diagnosis, based on selected gene profiles, are used as adjunct of clinical diagnosis. However, supervised diagnosis may hinder patient care, add expense or confound a result. To avoid this misleading, a multiclass cancer diagnosis with class-selective rejection is proposed. It rejects some patients from one, some, or all classes in order to ensure a higher reliability while reducing time and expense costs. Moreover, this classifier takes into account asymmetric penalties dependant on each class and on each wrong or partially correct decision. It is based on ν-1-SVM coupled with its regularization path and minimizes a general loss function defined in the class-selective rejection scheme. The state of art multiclass algorithms can be considered as a particular case of the proposed algorithm where the number of decisions is given by the classes and the loss function is defined by the Bayesian risk. Two experiments are carried out in the Bayesian and the class selective rejection frameworks. Five genes selected datasets are used to assess the performance of the proposed method. Results are discussed and accuracies are compared with those computed by the Naive Bayes, Nearest Neighbor, Linear Perceptron, Multilayer Perceptron, and Support Vector Machines classifiers. PMID:19584932

  4. An efficient ensemble learning method for gene microarray classification.

    PubMed

    Osareh, Alireza; Shadgar, Bita

    2013-01-01

    The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

  5. Integrative approach for inference of gene regulatory networks using lasso-based random featuring and application to psychiatric disorders.

    PubMed

    Kim, Dongchul; Kang, Mingon; Biswas, Ashis; Liu, Chunyu; Gao, Jean

    2016-08-10

    Inferring gene regulatory networks is one of the most interesting research areas in the systems biology. Many inference methods have been developed by using a variety of computational models and approaches. However, there are two issues to solve. First, depending on the structural or computational model of inference method, the results tend to be inconsistent due to innately different advantages and limitations of the methods. Therefore the combination of dissimilar approaches is demanded as an alternative way in order to overcome the limitations of standalone methods through complementary integration. Second, sparse linear regression that is penalized by the regularization parameter (lasso) and bootstrapping-based sparse linear regression methods were suggested in state of the art methods for network inference but they are not effective for a small sample size data and also a true regulator could be missed if the target gene is strongly affected by an indirect regulator with high correlation or another true regulator. We present two novel network inference methods based on the integration of three different criteria, (i) z-score to measure the variation of gene expression from knockout data, (ii) mutual information for the dependency between two genes, and (iii) linear regression-based feature selection. Based on these criterion, we propose a lasso-based random feature selection algorithm (LARF) to achieve better performance overcoming the limitations of bootstrapping as mentioned above. In this work, there are three main contributions. First, our z score-based method to measure gene expression variations from knockout data is more effective than similar criteria of related works. Second, we confirmed that the true regulator selection can be effectively improved by LARF. Lastly, we verified that an integrative approach can clearly outperform a single method when two different methods are effectively jointed. In the experiments, our methods were validated by outperforming the state of the art methods on DREAM challenge data, and then LARF was applied to inferences of gene regulatory network associated with psychiatric disorders.

  6. Identification of Conflicting Selective Effects on Highly Expressed Genes

    PubMed Central

    Higgs, Paul G.; Hao, Weilong; Golding, G. Brian

    2007-01-01

    Many different selective effects on DNA and proteins influence the frequency of codons and amino acids in coding sequences. Selection is often stronger on highly expressed genes. Hence, by comparing high- and low-expression genes it is possible to distinguish the factors that are selected by evolution. It has been proposed that highly expressed genes should (i) preferentially use codons matching abundant tRNAs (translational efficiency), (ii) preferentially use amino acids with low cost of synthesis, (iii) be under stronger selection to maintain the required amino acid content, and (iv) be selected for translational robustness. These effects act simultaneously and can be contradictory. We develop a model that combines these factors, and use Akaike’s Information Criterion for model selection. We consider pairs of paralogues that arose by whole-genome duplication in Saccharmyces cerevisiae. A codon-based model is used that includes asymmetric effects due to selection on highly expressed genes. The largest effect is translational efficiency, which is found to strongly influence synonymous, but not non-synonymous rates. Minimization of the cost of amino acid synthesis is implicated. However, when a more general measure of selection for amino acid usage is used, the cost minimization effect becomes redundant. Small effects that we attribute to selection for translational robustness can be identified as an improvement in the model fit on top of the effects of translational efficiency and amino acid usage. PMID:19430600

  7. When is hub gene selection better than standard meta-analysis?

    PubMed

    Langfelder, Peter; Mischel, Paul S; Horvath, Steve

    2013-01-01

    Since hub nodes have been found to play important roles in many networks, highly connected hub genes are expected to play an important role in biology as well. However, the empirical evidence remains ambiguous. An open question is whether (or when) hub gene selection leads to more meaningful gene lists than a standard statistical analysis based on significance testing when analyzing genomic data sets (e.g., gene expression or DNA methylation data). Here we address this question for the special case when multiple genomic data sets are available. This is of great practical importance since for many research questions multiple data sets are publicly available. In this case, the data analyst can decide between a standard statistical approach (e.g., based on meta-analysis) and a co-expression network analysis approach that selects intramodular hubs in consensus modules. We assess the performance of these two types of approaches according to two criteria. The first criterion evaluates the biological insights gained and is relevant in basic research. The second criterion evaluates the validation success (reproducibility) in independent data sets and often applies in clinical diagnostic or prognostic applications. We compare meta-analysis with consensus network analysis based on weighted correlation network analysis (WGCNA) in three comprehensive and unbiased empirical studies: (1) Finding genes predictive of lung cancer survival, (2) finding methylation markers related to age, and (3) finding mouse genes related to total cholesterol. The results demonstrate that intramodular hub gene status with respect to consensus modules is more useful than a meta-analysis p-value when identifying biologically meaningful gene lists (reflecting criterion 1). However, standard meta-analysis methods perform as good as (if not better than) a consensus network approach in terms of validation success (criterion 2). The article also reports a comparison of meta-analysis techniques applied to gene expression data and presents novel R functions for carrying out consensus network analysis, network based screening, and meta analysis.

  8. A polynomial based model for cell fate prediction in human diseases.

    PubMed

    Ma, Lichun; Zheng, Jie

    2017-12-21

    Cell fate regulation directly affects tissue homeostasis and human health. Research on cell fate decision sheds light on key regulators, facilitates understanding the mechanisms, and suggests novel strategies to treat human diseases that are related to abnormal cell development. In this study, we proposed a polynomial based model to predict cell fate. This model was derived from Taylor series. As a case study, gene expression data of pancreatic cells were adopted to test and verify the model. As numerous features (genes) are available, we employed two kinds of feature selection methods, i.e. correlation based and apoptosis pathway based. Then polynomials of different degrees were used to refine the cell fate prediction function. 10-fold cross-validation was carried out to evaluate the performance of our model. In addition, we analyzed the stability of the resultant cell fate prediction model by evaluating the ranges of the parameters, as well as assessing the variances of the predicted values at randomly selected points. Results show that, within both the two considered gene selection methods, the prediction accuracies of polynomials of different degrees show little differences. Interestingly, the linear polynomial (degree 1 polynomial) is more stable than others. When comparing the linear polynomials based on the two gene selection methods, it shows that although the accuracy of the linear polynomial that uses correlation analysis outcomes is a little higher (achieves 86.62%), the one within genes of the apoptosis pathway is much more stable. Considering both the prediction accuracy and the stability of polynomial models of different degrees, the linear model is a preferred choice for cell fate prediction with gene expression data of pancreatic cells. The presented cell fate prediction model can be extended to other cells, which may be important for basic research as well as clinical study of cell development related diseases.

  9. Reference gene selection for quantitative gene expression studies during biological invasions: A test on multiple genes and tissues in a model ascidian Ciona savignyi.

    PubMed

    Huang, Xuena; Gao, Yangchun; Jiang, Bei; Zhou, Zunchun; Zhan, Aibin

    2016-01-15

    As invasive species have successfully colonized a wide range of dramatically different local environments, they offer a good opportunity to study interactions between species and rapidly changing environments. Gene expression represents one of the primary and crucial mechanisms for rapid adaptation to local environments. Here, we aim to select reference genes for quantitative gene expression analysis based on quantitative Real-Time PCR (qRT-PCR) for a model invasive ascidian, Ciona savignyi. We analyzed the stability of ten candidate reference genes in three tissues (siphon, pharynx and intestine) under two key environmental stresses (temperature and salinity) in the marine realm based on three programs (geNorm, NormFinder and delta Ct method). Our results demonstrated only minor difference for stability rankings among the three methods. The use of different single reference gene might influence the data interpretation, while multiple reference genes could minimize possible errors. Therefore, reference gene combinations were recommended for different tissues - the optimal reference gene combination for siphon was RPS15 and RPL17 under temperature stress, and RPL17, UBQ and TubA under salinity treatment; for pharynx, TubB, TubA and RPL17 were the most stable genes under temperature stress, while TubB, TubA and UBQ were the best under salinity stress; for intestine, UBQ, RPS15 and RPL17 were the most reliable reference genes under both treatments. Our results suggest that the necessity of selection and test of reference genes for different tissues under varying environmental stresses. The results obtained here are expected to reveal mechanisms of gene expression-mediated invasion success using C. savignyi as a model species. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. Modified signal-to-noise: a new simple and practical gene filtering approach based on the concept of projective adaptive resonance theory (PART) filtering method.

    PubMed

    Takahashi, Hiro; Honda, Hiroyuki

    2006-07-01

    Considering the recent advances in and the benefits of DNA microarray technologies, many gene filtering approaches have been employed for the diagnosis and prognosis of diseases. In our previous study, we developed a new filtering method, namely, the projective adaptive resonance theory (PART) filtering method. This method was effective in subclass discrimination. In the PART algorithm, the genes with a low variance in gene expression in either class, not both classes, were selected as important genes for modeling. Based on this concept, we developed novel simple filtering methods such as modified signal-to-noise (S2N') in the present study. The discrimination model constructed using these methods showed higher accuracy with higher reproducibility as compared with many conventional filtering methods, including the t-test, S2N, NSC and SAM. The reproducibility of prediction was evaluated based on the correlation between the sets of U-test p-values on randomly divided datasets. With respect to leukemia, lymphoma and breast cancer, the correlation was high; a difference of >0.13 was obtained by the constructed model by using <50 genes selected by S2N'. Improvement was higher in the smaller genes and such higher correlation was observed when t-test, NSC and SAM were used. These results suggest that these modified methods, such as S2N', have high potential to function as new methods for marker gene selection in cancer diagnosis using DNA microarray data. Software is available upon request.

  11. The prospect of gene therapy for prostate cancer: update on theory and status.

    PubMed

    Koeneman, K S; Hsieh, J T

    2001-09-01

    Molecularly based novel therapeutic agents are needed to address the problem of locally recurrent, or metastatic, advanced hormone-refractory prostate cancer. Recent basic science advances in mechanisms of gene expression, vector delivery, and targeting have rendered clinically relevant gene therapy to the prostatic fossa and distant sites feasible in the near future. Current research and clinical investigative efforts involving methods for more effective vector delivery and targeting, with enhanced gene expression to selected (specific) sites, are reviewed. These areas of research involve tissue-specific promoters, transgene exploration, vector design and delivery, and selective vector targeting. The 'vectorology' involved mainly addresses selective tissue homing with ligands, mechanisms of innate immune system evasion for durable transgene expression, and the possibility of repeat administration.

  12. Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification

    PubMed Central

    Wang, Yun; Huang, Fangzhou

    2018-01-01

    The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC2), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible. PMID:29666661

  13. Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification.

    PubMed

    Xu, Jiucheng; Mu, Huiyu; Wang, Yun; Huang, Fangzhou

    2018-01-01

    The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC 2 ), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.

  14. Evaluation of Two Outlier-Detection-Based Methods for Detecting Tissue-Selective Genes from Microarray Data

    PubMed Central

    Kadota, Koji; Konishi, Tomokazu; Shimizu, Kentaro

    2007-01-01

    Large-scale expression profiling using DNA microarrays enables identification of tissue-selective genes for which expression is considerably higher and/or lower in some tissues than in others. Among numerous possible methods, only two outlier-detection-based methods (an AIC-based method and Sprent’s non-parametric method) can treat equally various types of selective patterns, but they produce substantially different results. We investigated the performance of these two methods for different parameter settings and for a reduced number of samples. We focused on their ability to detect selective expression patterns robustly. We applied them to public microarray data collected from 36 normal human tissue samples and analyzed the effects of both changing the parameter settings and reducing the number of samples. The AIC-based method was more robust in both cases. The findings confirm that the use of the AIC-based method in the recently proposed ROKU method for detecting tissue-selective expression patterns is correct and that Sprent’s method is not suitable for ROKU. PMID:19936074

  15. Marine natural products for multi-targeted cancer treatment: A future insight.

    PubMed

    Kumar, Maushmi S; Adki, Kaveri M

    2018-05-30

    Cancer is world's second largest alarming disease, which involves abnormal cell growth and have potential to spread to other parts of the body. Most of the available anticancer drugs are designed to act on specific targets by altering the activity of involved transporters and genes. As cancer cells exhibit complex cellular machinery, the regeneration of cancer tissues and chemo resistance towards the therapy has been the main obstacle in cancer treatment. This fact encourages the researchers to explore the multitargeted use of existing medicines to overcome the shortcomings of chemotherapy for alternative and safer treatment strategies. Recent developments in genomics-proteomics and an understanding of the molecular pharmacology of cancer have also challenged researchers to come up with target-based drugs. The literature supports the evidence of natural compounds exhibiting antioxidant, antimitotic, anti-inflammatory, antibiotic as well as anticancer activity. In this review, we have selected marine sponges as a prolific source of bioactive compounds which can be explored for their possible use in cancer and have tried to link their role in cancer pathway. To prove this, we revisited the literature for the selection of cancer genes for the multitargeted use of existing drugs and natural products. We used Cytoscape network analysis and Search tool for retrieval of interacting genes/ proteins (STRING) to study the possible interactions to show the links between the antioxidants, antibiotics, anti-inflammatory and antimitotic agents and their targets for their possible use in cancer. We included total 78 pathways, their genes and natural compounds from the above four pharmacological classes used in cancer treatment for multitargeted approach. Based on the Cytoscape network analysis results, we shortlist 22 genes based on their average shortest path length connecting one node to all other nodes in a network. These selected genes are CDKN2A, FH, VHL, STK11, SUFU, RB1, MEN1, HRPT2, EXT1, 2, CDK4, p14, p16, TSC1, 2, AXIN2, SDBH C, D, NF1, 2, BHD, PTCH, GPC3, CYLD and WT1. The selected genes were analysed using STRING for their protein-protein interactions. Based on the above findings, we propose the selected genes to be considered as major targets and are suggested to be studied for discovering marine natural products as drug lead in cancer treatment. Copyright © 2018 Elsevier Masson SAS. All rights reserved.

  16. Improved site-specific recombinase-based method to produce selectable marker- and vector-backbone-free transgenic cells

    NASA Astrophysics Data System (ADS)

    Yu, Yuan; Tong, Qi; Li, Zhongxia; Tian, Jinhai; Wang, Yizhi; Su, Feng; Wang, Yongsheng; Liu, Jun; Zhang, Yong

    2014-02-01

    PhiC31 integrase-mediated gene delivery has been extensively used in gene therapy and animal transgenesis. However, random integration events are observed in phiC31-mediated integration in different types of mammalian cells; as a result, the efficiencies of pseudo attP site integration and evaluation of site-specific integration are compromised. To improve this system, we used an attB-TK fusion gene as a negative selection marker, thereby eliminating random integration during phiC31-mediated transfection. We also excised the selection system and plasmid bacterial backbone by using two other site-specific recombinases, Cre and Dre. Thus, we generated clean transgenic bovine fetal fibroblast cells free of selectable marker and plasmid bacterial backbone. These clean cells were used as donor nuclei for somatic cell nuclear transfer (SCNT), indicating a similar developmental competence of SCNT embryos to that of non-transgenic cells. Therefore, the present gene delivery system facilitated the development of gene therapy and agricultural biotechnology.

  17. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling

    PubMed Central

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems. PMID:25961028

  18. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling.

    PubMed

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.

  19. Sexual selection and sex linkage.

    PubMed

    Kirkpatrick, Mark; Hall, David W

    2004-04-01

    Some animal groups, such as birds, seem prone to extreme forms of sexual selection. One contributing factor may be sex linkage of genes affecting male displays and female preferences. Here we show that sex linkage can have substantial effects on the genetic correlation between these traits and consequently for Fisher's runaway and the good-genes mechanisms of sexual selection. Under some kinds of sex linkage (e.g. Z-linked preferences), a runaway is more likely than under autosomal inheritance, while under others (e.g., X-linked preferences and autosomal displays), the good-genes mechanism is particularly powerful. These theoretical results suggest empirical tests based on the comparative method.

  20. Construction of human antibody gene libraries and selection of antibodies by phage display.

    PubMed

    Schirrmann, Thomas; Hust, Michael

    2010-01-01

    Recombinant antibodies as therapeutics offer new opportunities for the treatment of many tumor diseases. To date, 18 antibody-based drugs are approved for cancer treatment and hundreds of anti-tumor antibodies are under development. The first clinically approved antibodies were of murine origin or human-mouse chimeric. However, since murine antibody domains are immunogenic in human patients and could result in human anti-mouse antibody (HAMA) responses, currently mainly humanized and fully human antibodies are developed for therapeutic applications.Here, in vitro antibody selection technologies directly allow the selection of human antibodies and the corresponding genes from human antibody gene libraries. Antibody phage display is the most common way to generate human antibodies and has already yielded thousands of recombinant antibodies for research, diagnostics and therapy. Here, we describe methods for the construction of human scFv gene libraries and the antibody selection.

  1. Gene Regulatory Network Inferences Using a Maximum-Relevance and Maximum-Significance Strategy

    PubMed Central

    Liu, Wei; Zhu, Wen; Liao, Bo; Chen, Xiangtao

    2016-01-01

    Recovering gene regulatory networks from expression data is a challenging problem in systems biology that provides valuable information on the regulatory mechanisms of cells. A number of algorithms based on computational models are currently used to recover network topology. However, most of these algorithms have limitations. For example, many models tend to be complicated because of the “large p, small n” problem. In this paper, we propose a novel regulatory network inference method called the maximum-relevance and maximum-significance network (MRMSn) method, which converts the problem of recovering networks into a problem of how to select the regulator genes for each gene. To solve the latter problem, we present an algorithm that is based on information theory and selects the regulator genes for a specific gene by maximizing the relevance and significance. A first-order incremental search algorithm is used to search for regulator genes. Eventually, a strict constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus obtain the complete network structure. We performed our method on five different datasets and compared our method to five state-of-the-art methods for network inference based on information theory. The results confirm the effectiveness of our method. PMID:27829000

  2. Traditional and modern plant breeding methods with examples in rice (Oryza sativa L.).

    PubMed

    Breseghello, Flavio; Coelho, Alexandre Siqueira Guedes

    2013-09-04

    Plant breeding can be broadly defined as alterations caused in plants as a result of their use by humans, ranging from unintentional changes resulting from the advent of agriculture to the application of molecular tools for precision breeding. The vast diversity of breeding methods can be simplified into three categories: (i) plant breeding based on observed variation by selection of plants based on natural variants appearing in nature or within traditional varieties; (ii) plant breeding based on controlled mating by selection of plants presenting recombination of desirable genes from different parents; and (iii) plant breeding based on monitored recombination by selection of specific genes or marker profiles, using molecular tools for tracking within-genome variation. The continuous application of traditional breeding methods in a given species could lead to the narrowing of the gene pool from which cultivars are drawn, rendering crops vulnerable to biotic and abiotic stresses and hampering future progress. Several methods have been devised for introducing exotic variation into elite germplasm without undesirable effects. Cases in rice are given to illustrate the potential and limitations of different breeding approaches.

  3. Genome-wide detection of selection signatures in Chinese indigenous Laiwu pigs revealed candidate genes regulating fat deposition in muscle.

    PubMed

    Chen, Minhui; Wang, Jiying; Wang, Yanping; Wu, Ying; Fu, Jinluan; Liu, Jian-Feng

    2018-05-18

    Currently, genome-wide scans for positive selection signatures in commercial breed have been investigated. However, few studies have focused on selection footprints of indigenous breeds. Laiwu pig is an invaluable Chinese indigenous pig breed with extremely high proportion of intramuscular fat (IMF), and an excellent model to detect footprint as the result of natural and artificial selection for fat deposition in muscle. In this study, based on GeneSeek Genomic profiler Porcine HD data, three complementary methods, F ST , iHS (integrated haplotype homozygosity score) and CLR (composite likelihood ratio), were implemented to detect selection signatures in the whole genome of Laiwu pigs. Totally, 175 candidate selected regions were obtained by at least two of the three methods, which covered 43.75 Mb genomic regions and corresponded to 1.79% of the genome sequence. Gene annotation of the selected regions revealed a list of functionally important genes for feed intake and fat deposition, reproduction, and immune response. Especially, in accordance to the phenotypic features of Laiwu pigs, among the candidate genes, we identified several genes, NPY1R, NPY5R, PIK3R1 and JAKMIP1, involved in the actions of two sets of neurons, which are central regulators in maintaining the balance between food intake and energy expenditure. Our results identified a number of regions showing signatures of selection, as well as a list of functionally candidate genes with potential effect on phenotypic traits, especially fat deposition in muscle. Our findings provide insights into the mechanisms of artificial selection of fat deposition and further facilitate follow-up functional studies.

  4. Integrative functional analyses using rainbow trout selected for tolerance to plant diets reveal nutrigenomic signatures for soy utilization without the concurrence of enteritis

    PubMed Central

    Brezas, Andreas; Snekvik, Kevin R.; Hardy, Ronald W.; Overturf, Ken

    2017-01-01

    Finding suitable alternative protein sources for diets of carnivorous fish species remains a major concern for sustainable aquaculture. Through genetic selection, we created a strain of rainbow trout that outperforms parental lines in utilizing an all-plant protein diet and does not develop enteritis in the distal intestine, as is typical with salmonids on long-term plant protein-based feeds. By incorporating this strain into functional analyses, we set out to determine which genes are critical to plant protein utilization in the absence of gut inflammation. After a 12-week feeding trial with our selected strain and a control trout strain fed either a fishmeal-based diet or an all-plant protein diet, high-throughput RNA sequencing was completed on both liver and muscle tissues. Differential gene expression analyses, weighted correlation network analyses and further functional characterization were performed. A strain-by-diet design revealed differential expression ranging from a few dozen to over one thousand genes among the various comparisons and tissues. Major gene ontology groups identified between comparisons included those encompassing central, intermediary and foreign molecule metabolism, associated biosynthetic pathways as well as immunity. A systems approach indicated that genes involved in purine metabolism were highly perturbed. Systems analysis among the tissues tested further suggests the interplay between selection for growth, dietary utilization and protein tolerance may also have implications for nonspecific immunity. By combining data from differential gene expression and co-expression networks using selected trout, along with ontology and pathway analyses, a set of 63 candidate genes for plant diet tolerance was found. Risk loci in human inflammatory bowel diseases were also found in our datasets, indicating rainbow trout selected for plant-diet tolerance may have added utility as a potential biomedical model. PMID:28723948

  5. Identifying positive selection candidate loci for high-altitude adaptation in Andean populations

    PubMed Central

    2009-01-01

    High-altitude environments (>2,500 m) provide scientists with a natural laboratory to study the physiological and genetic effects of low ambient oxygen tension on human populations. One approach to understanding how life at high altitude has affected human metabolism is to survey genome-wide datasets for signatures of natural selection. In this work, we report on a study to identify selection-nominated candidate genes involved in adaptation to hypoxia in one highland group, Andeans from the South American Altiplano. We analysed dense microarray genotype data using four test statistics that detect departures from neutrality. Using a candidate gene, single nucleotide polymorphism-based approach, we identified genes exhibiting preliminary evidence of recent genetic adaptation in this population. These included genes that are part of the hypoxia-inducible transcription factor (HIF) pathway, a biochemical pathway involved in oxygen homeostasis, as well as three other genomic regions previously not known to be associated with high-altitude phenotypes. In addition to identifying selection-nominated candidate genes, we also tested whether the HIF pathway shows evidence of natural selection. Our results indicate that the genes of this biochemical pathway as a group show no evidence of having evolved in response to hypoxia in Andeans. Results from particular HIF-targeted genes, however, suggest that genes in this pathway could play a role in Andean adaptation to high altitude, even if the pathway as a whole does not show higher relative rates of evolution. These data suggest a genetic role in high-altitude adaptation and provide a basis for genotype/phenotype association studies that are necessary to confirm the role of putative natural selection candidate genes and gene regions in adaptation to altitude. PMID:20038496

  6. The complexity of selection at the major primate beta-defensin locus.

    PubMed

    Semple, Colin A M; Maxwell, Alison; Gautier, Philippe; Kilanowski, Fiona M; Eastwood, Hayden; Barran, Perdita E; Dorin, Julia R

    2005-05-18

    We have examined the evolution of the genes at the major human beta-defensin locus and the orthologous loci in a range of other primates and mouse. For the first time these data allow us to examine selective episodes in the more recent evolutionary history of this locus as well as the ancient past. We have used a combination of maximum likelihood based tests and a maximum parsimony based sliding window approach to give a detailed view of the varying modes of selection operating at this locus. We provide evidence for strong positive selection soon after the duplication of these genes within an ancestral mammalian genome. Consequently variable selective pressures have acted on beta-defensin genes in different evolutionary lineages, with episodes both of negative, and more rarely positive selection, during the divergence of primates. Positive selection appears to have been more common in the rodent lineage, accompanying the birth of novel, rodent-specific beta-defensin genes. These observations allow a fuller understanding of the evolution of mammalian innate immunity. In both the rodent and primate lineages, sites in the second exon have been subject to positive selection and by implication are important in functional diversity. A small number of sites in the mature human peptides were found to have undergone repeated episodes of selection in different primate lineages. Particular sites were consistently implicated by multiple methods at positions throughout the mature peptides. These sites are clustered at positions predicted to be important for the specificity of the antimicrobial or chemoattractant properties of beta-defensins. Surprisingly, sites within the prepropeptide region were also implicated as being subject to significant positive selection, suggesting previously unappreciated functional significance for this region. Identification of these putatively functional sites has important implications for our understanding of beta-defensin function and for novel antibiotic design.

  7. Transient dominant host-range selection using Chinese hamster ovary cells to generate marker-free recombinant viral vectors from vaccinia virus.

    PubMed

    Liu, Liang; Cooper, Tamara; Eldi, Preethi; Garcia-Valtanen, Pablo; Diener, Kerrilyn R; Howley, Paul M; Hayball, John D

    2017-04-01

    Recombinant vaccinia viruses (rVACVs) are promising antigen-delivery systems for vaccine development that are also useful as research tools. Two common methods for selection during construction of rVACV clones are (i) co-insertion of drug resistance or reporter protein genes, which requires the use of additional selection drugs or detection methods, and (ii) dominant host-range selection. The latter uses VACV variants rendered replication-incompetent in host cell lines by the deletion of host-range genes. Replicative ability is restored by co-insertion of the host-range genes, providing for dominant selection of the recombinant viruses. Here, we describe a new method for the construction of rVACVs using the cowpox CP77 protein and unmodified VACV as the starting material. Our selection system will expand the range of tools available for positive selection of rVACV during vector construction, and it is substantially more high-fidelity than approaches based on selection for drug resistance.

  8. Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach.

    PubMed

    Peng, Jiajie; Zhang, Xuanshuo; Hui, Weiwei; Lu, Junya; Li, Qianqian; Liu, Shuhui; Shang, Xuequn

    2018-03-19

    Gene Ontology (GO) is one of the most popular bioinformatics resources. In the past decade, Gene Ontology-based gene semantic similarity has been effectively used to model gene-to-gene interactions in multiple research areas. However, most existing semantic similarity approaches rely only on GO annotations and structure, or incorporate only local interactions in the co-functional network. This may lead to inaccurate GO-based similarity resulting from the incomplete GO topology structure and gene annotations. We present NETSIM2, a new network-based method that allows researchers to measure GO-based gene functional similarities by considering the global structure of the co-functional network with a random walk with restart (RWR)-based method, and by selecting the significant term pairs to decrease the noise information. Based on the EC number (Enzyme Commission)-based groups of yeast and Arabidopsis, evaluation test shows that NETSIM2 can enhance the accuracy of Gene Ontology-based gene functional similarity. Using NETSIM2 as an example, we found that the accuracy of semantic similarities can be significantly improved after effectively incorporating the global gene-to-gene interactions in the co-functional network, especially on the species that gene annotations in GO are far from complete.

  9. In silico identification of novel ligands for G-quadruplex in the c- MYC promoter

    NASA Astrophysics Data System (ADS)

    Kang, Hyun-Jin; Park, Hyun-Ju

    2015-04-01

    G-quadruplex DNA formed in NHEIII1 region of oncogene promoter inhibits transcription of the genes. In this study, virtual screening combining pharmacophore-based search and structure-based docking screening was conducted to discover ligands binding to G-quadruplex in promoter region of c- MYC. Several hit ligands showed the selective PCR-arresting effects for oligonucleotide containing c- MYC G-quadruplex forming sequence. Among them, three hits selectively inhibited cell proliferation and decreased c- MYC mRNA level in Ramos cells, where NHEIII1 is included in translocated c- MYC gene for overexpression. Promoter assay using two kinds of constructs with wild-type and mutant sequences showed that interaction of these ligands with the G-quadruplex resulted in turning-off of the reporter gene. In conclusion, combined virtual screening methods were successfully used for discovery of selective c- MYC promoter G-quadruplex binders with anticancer activity.

  10. Identifying prognostic signature in ovarian cancer using DirGenerank

    PubMed Central

    Wang, Jian-Yong; Chen, Ling-Ling; Zhou, Xiong-Hui

    2017-01-01

    Identifying the prognostic genes in cancer is essential not only for the treatment of cancer patients, but also for drug discovery. However, it's still a big challenge to select the prognostic genes that can distinguish the risk of cancer patients across various data sets because of tumor heterogeneity. In this situation, the selected genes whose expression levels are statistically related to prognostic risks may be passengers. In this paper, based on gene expression data and prognostic data of ovarian cancer patients, we used conditional mutual information to construct gene dependency network in which the nodes (genes) with more out-degrees have more chances to be the modulators of cancer prognosis. After that, we proposed DirGenerank (Generank in direct netowrk) algorithm, which concerns both the gene dependency network and genes’ correlations to prognostic risks, to identify the gene signature that can predict the prognostic risks of ovarian cancer patients. Using ovarian cancer data set from TCGA (The Cancer Genome Atlas) as training data set, 40 genes with the highest importance were selected as prognostic signature. Survival analysis of these patients divided by the prognostic signature in testing data set and four independent data sets showed the signature can distinguish the prognostic risks of cancer patients significantly. Enrichment analysis of the signature with curated cancer genes and the drugs selected by CMAP showed the genes in the signature may be drug targets for therapy. In summary, we have proposed a useful pipeline to identify prognostic genes of cancer patients. PMID:28615526

  11. Artificial genetic selection for an efficient translation initiation site for expression of human RACK1 gene in Escherichia coli

    PubMed Central

    Zhelyabovskaya, Olga B.; Berlin, Yuri A.; Birikh, Klara R.

    2004-01-01

    In bacterial expression systems, translation initiation is usually the rate limiting and the least predictable stage of protein synthesis. Efficiency of a translation initiation site can vary dramatically depending on the sequence context. This is why many standard expression vectors provide very poor expression levels of some genes. This notion persuaded us to develop an artificial genetic selection protocol, which allows one to find for a given target gene an individual efficient ribosome binding site from a random pool. In order to create Darwinian pressure necessary for the genetic selection, we designed a system based on translational coupling, in which microorganism survival in the presence of antibiotic depends on expression of the target gene, while putting no special requirements on this gene. Using this system we obtained superproducing constructs for the human protein RACK1 (receptor for activated C kinase). PMID:15034151

  12. Blueberry (Vaccinium corymbosum L.).

    PubMed

    Song, Guo-Qing

    2015-01-01

    Vaccinium consists of approximately 450 species, of which highbush blueberry (Vaccinium corymbosum) is one of the three major Vaccinium fruit crops (i.e., blueberry, cranberry, and lingonberry) domesticated in the twentieth century. In blueberry the adventitious shoot regeneration using leaf explants has been the most desirable regeneration system to date; Agrobacterium tumefaciens-mediated transformation is the major gene delivery method and effective selection has been reported using either the neomycin phosphotransferase II gene (nptII) or the bialaphos resistance (bar) gene as selectable markers. The A. tumefaciens-mediated transformation protocol described in this chapter is based on combining the optimal conditions for efficient plant regeneration, reliable gene delivery, and effective selection. The protocol has led to successful regeneration of transgenic plants from leaf explants of four commercially important highbush blueberry cultivars for multiple purposes, providing a powerful approach to supplement conventional breeding methods for blueberry by introducing genes of interest.

  13. Gene disruption in Trichoderma atroviride via Agrobacterium-mediated transformation.

    PubMed

    Zeilinger, Susanne

    2004-02-01

    A modified Agrobacterium-mediated transformation method for the efficient disruption of two genes encoding signaling compounds of the mycoparasite Trichoderma atroviride is described, using the hph gene of Escherichia coli as selection marker. The transformation vectors contained about 1 kb of 5' and 3' non-coding regions from the tmk1 (encoding a MAP kinase) or tga3 (encoding an alpha-subunit of a heterotrimeric G protein) target loci flanking a selection marker. Transformation of fungal conidia and selection on hygromycin-containing media applying an overlay-based procedure, which overcomes the lack of formation of distinct single colonies by the fungus, led to stable clones for both disruption constructs. Southern and PCR analyses proved gene disruption by single-copy homologous integration with a frequency of approximately 60% for both genes; and the loss of tmk1 and tga3 transcript formation in the disruptants was demonstrated by RT-PCR.

  14. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    PubMed Central

    2010-01-01

    Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082

  15. Fusion–fission experiments in Aphidius: evolutionary split without isolation in response to environmental bimodality

    PubMed Central

    Emelianov, I; Hernandes-Lopez, A; Torrence, M; Watts, N

    2011-01-01

    Studying host-based divergence naturally maintained by a balance between selection and gene flow can provide valuable insights into genetic underpinnings of host adaptation and ecological speciation in parasites. Selection-gene flow balance is often postulated in sympatric host races, but direct experimental evidence is scarce. In this study, we present such evidence obtained in host races of Aphidius ervi, an important hymenopteran agent of biological control of aphids in agriculture, using a novel fusion–fission method of gene flow perturbation. In our study, between-race genetic divergence was obliterated by means of advanced hybridisation, followed by a multi-generation exposure of the resulting genetically uniform hybrid swarm to a two-host environment. This fusion–fission procedure was implemented under two contrasting regimes of between-host gene flow in two replicated experiments involving different racial pairs. Host-based genetic fission in response to environmental bimodality occurred in both experiments in as little as six generations of divergent adaptation despite continuous gene flow. We demonstrate that fission recovery of host-based divergence evolved faster and hybridisation-induced linkage disequilibrium decayed slower under restricted (6.7%) compared with unrestricted gene flow, directly pointing at a balance between gene flow and divergent selection. We also show, in four separate tests, that random drift had no or little role in the observed genetic split. Rates and patterns of fission divergence differed between racial pairs. Comparative linkage analysis of these differences is currently under way to test for the role of genomic architecture of adaptation in ecology-driven divergent evolution. PMID:20924399

  16. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.

    PubMed

    Cawley, Gavin C; Talbot, Nicola L C

    2006-10-01

    Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/

  17. Characterization of the Avian Trojan Gene Family Reveals Contrasting Evolutionary Constraints

    PubMed Central

    Petrov, Petar; Syrjänen, Riikka; Smith, Jacqueline; Gutowska, Maria Weronika; Uchida, Tatsuya; Vainio, Olli; Burt, David W

    2015-01-01

    “Trojan” is a leukocyte-specific, cell surface protein originally identified in the chicken. Its molecular function has been hypothesized to be related to anti-apoptosis and the proliferation of immune cells. The Trojan gene has been localized onto the Z sex chromosome. The adjacent two genes also show significant homology to Trojan, suggesting the existence of a novel gene/protein family. Here, we characterize this Trojan family, identify homologues in other species and predict evolutionary constraints on these genes. The two Trojan-related proteins in chicken were predicted as a receptor-type tyrosine phosphatase and a transmembrane protein, bearing a cytoplasmic immuno-receptor tyrosine-based activation motif. We identified the Trojan gene family in ten other bird species and found related genes in three reptiles and a fish species. The phylogenetic analysis of the homologues revealed a gradual diversification among the family members. Evolutionary analyzes of the avian genes predicted that the extracellular regions of the proteins have been subjected to positive selection. Such selection was possibly a response to evolving interacting partners or to pathogen challenges. We also observed an almost complete lack of intracellular positively selected sites, suggesting a conserved signaling mechanism of the molecules. Therefore, the contrasting patterns of selection likely correlate with the interaction and signaling potential of the molecules. PMID:25803627

  18. Characterization of the avian Trojan gene family reveals contrasting evolutionary constraints.

    PubMed

    Petrov, Petar; Syrjänen, Riikka; Smith, Jacqueline; Gutowska, Maria Weronika; Uchida, Tatsuya; Vainio, Olli; Burt, David W

    2015-01-01

    "Trojan" is a leukocyte-specific, cell surface protein originally identified in the chicken. Its molecular function has been hypothesized to be related to anti-apoptosis and the proliferation of immune cells. The Trojan gene has been localized onto the Z sex chromosome. The adjacent two genes also show significant homology to Trojan, suggesting the existence of a novel gene/protein family. Here, we characterize this Trojan family, identify homologues in other species and predict evolutionary constraints on these genes. The two Trojan-related proteins in chicken were predicted as a receptor-type tyrosine phosphatase and a transmembrane protein, bearing a cytoplasmic immuno-receptor tyrosine-based activation motif. We identified the Trojan gene family in ten other bird species and found related genes in three reptiles and a fish species. The phylogenetic analysis of the homologues revealed a gradual diversification among the family members. Evolutionary analyzes of the avian genes predicted that the extracellular regions of the proteins have been subjected to positive selection. Such selection was possibly a response to evolving interacting partners or to pathogen challenges. We also observed an almost complete lack of intracellular positively selected sites, suggesting a conserved signaling mechanism of the molecules. Therefore, the contrasting patterns of selection likely correlate with the interaction and signaling potential of the molecules.

  19. Sequence diversity patterns suggesting balancing selection in partially sex-linked genes of the plant Silene latifolia are not generated by demographic history or gene flow.

    PubMed

    Guirao-Rico, Sara; Sánchez-Gracia, Alejandro; Charlesworth, Deborah

    2017-03-01

    DNA sequence diversity in genes in the partially sex-linked pseudoautosomal region (PAR) of the sex chromosomes of the plant Silene latifolia is higher than expected from within-species diversity of other genes. This could be the footprint of sexually antagonistic (SA) alleles that are maintained by balancing selection in a PAR gene (or genes) and affect polymorphism in linked genome regions. SA selection is predicted to occur during sex chromosome evolution, but it is important to test whether the unexpectedly high sequence polymorphism could be explained without it, purely by the combined effects of partial linkage with the sex-determining region and the population's demographic history, including possible introgression from Silene dioica. To test this, we applied approximate Bayesian computation-based model choice to autosomal sequence diversity data, to find the most plausible scenario for the recent history of S. latifolia and then to estimate the posterior density of the most relevant parameters. We then used these densities to simulate variation to be expected at PAR genes. We conclude that an excess of variants at high frequencies at PAR genes should arise in S. latifolia populations only for genes with strong associations with fully sex-linked genes, which requires closer linkage with the fully sex-linked region than that estimated for the PAR genes where apparent deviations from neutrality were observed. These results support the need to invoke selection to explain the S. latifolia PAR gene diversity, and encourage further work to test the possibility of balancing selection due to sexual antagonism. © 2016 John Wiley & Sons Ltd.

  20. Marker-assisted pyramiding of brown planthopper (Nilaparvata lugens Stål) resistance genes Bph1 and Bph2 on rice chromosome 12.

    PubMed

    Sharma, Prem N; Torii, Akihide; Takumi, Shigeo; Mori, Naoki; Nakamura, Chiharu

    2004-01-01

    Brown planthopper (BPH) (Nilaparvata lugens Stål) is a significant insect pest of rice (Oryza sativa L.). We constructed a gene-pyramided japonica line, in which two BPH resistance genes Bph1 and Bph2 on the long arm of chromosome 12 independently derived from two indica resistance lines were combined through the recombinant selection. The gene-pyramiding was achieved based on the previously constructed high-resolution linkage maps of the two genes. Two co-dominant and four dominant PCR-based markers flanking the loci were used to select for a homozygous recombinant line in a segregating population that was derived from a cross between the parental homozygous single-gene introgression lines. BPH bioassay showed that the resistance level of the pyramided line was equivalent to that of the Bph1-single introgression line, which showed a higher level of resistance than the Bph2-single introgression line. The pyramid line should provide a useful experimental means for studying the fine structure of the chromosomal region covering these two major BPH resistance genes.

  1. An Improved Single-Step Cloning Strategy Simplifies the Agrobacterium tumefaciens-Mediated Transformation (ATMT)-Based Gene-Disruption Method for Verticillium dahliae.

    PubMed

    Wang, Sheng; Xing, Haiying; Hua, Chenlei; Guo, Hui-Shan; Zhang, Jie

    2016-06-01

    The soilborne fungal pathogen Verticillium dahliae infects a broad range of plant species to cause severe diseases. The availability of Verticillium genome sequences has provided opportunities for large-scale investigations of individual gene function in Verticillium strains using Agrobacterium tumefaciens-mediated transformation (ATMT)-based gene-disruption strategies. Traditional ATMT vectors require multiple cloning steps and elaborate characterization procedures to achieve successful gene replacement; thus, these vectors are not suitable for high-throughput ATMT-based gene deletion. Several advancements have been made that either involve simplification of the steps required for gene-deletion vector construction or increase the efficiency of the technique for rapid recombinant characterization. However, an ATMT binary vector that is both simple and efficient is still lacking. Here, we generated a USER-ATMT dual-selection (DS) binary vector, which combines both the advantages of the USER single-step cloning technique and the efficiency of the herpes simplex virus thymidine kinase negative-selection marker. Highly efficient deletion of three different genes in V. dahliae using the USER-ATMT-DS vector enabled verification that this newly-generated vector not only facilitates the cloning process but also simplifies the subsequent identification of fungal homologous recombinants. The results suggest that the USER-ATMT-DS vector is applicable for efficient gene deletion and suitable for large-scale gene deletion in V. dahliae.

  2. A novel dominant selectable system for the selection of transgenic plants under in vitro and greenhouse conditions based on phosphite metabolism.

    PubMed

    López-Arredondo, Damar L; Herrera-Estrella, Luis

    2013-05-01

    Antibiotic and herbicide resistance genes are currently the most frequently used selectable marker genes for plant research and crop development. However, the use of antibiotics and herbicides must be carefully controlled because the degree of susceptibility to these compounds varies widely among plant species and because they can also affect plant regeneration. Therefore, new selectable marker systems that are effective for a broad range of plant species are still needed. Here, we report a simple and inexpensive system based on providing transgenic plant cells the capacity to convert a nonmetabolizable compound (phosphite, Phi) into an essential nutrient for cell growth (phosphate) trough the expression of a bacterial gene encoding a phosphite oxidoreductase (PTXD). This system is effective for the selection of Arabidopsis transgenic plants by germinating T0 seeds directly on media supplemented with Phi and to select transgenic tobacco shoots from cocultivated leaf disc explants using nutrient media supplemented with Phi as both a source of phosphorus and selective agent. Because the ptxD/Phi system also allows the establishment of large-scale screening systems under greenhouse conditions completely eliminating false transformation events, it should facilitate the development of novel plant transformation methods. © 2013 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd.

  3. A mixture model-based approach to the clustering of microarray expression data.

    PubMed

    McLachlan, G J; Bean, R W; Peel, D

    2002-03-01

    This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/

  4. DNA sequence variation and selection of tag single-nucleotide polymorphisms at candidate genes for drought-stress response in Pinus taeda L.

    PubMed

    González-Martínez, Santiago C; Ersoz, Elhan; Brown, Garth R; Wheeler, Nicholas C; Neale, David B

    2006-03-01

    Genetic association studies are rapidly becoming the experimental approach of choice to dissect complex traits, including tolerance to drought stress, which is the most common cause of mortality and yield losses in forest trees. Optimization of association mapping requires knowledge of the patterns of nucleotide diversity and linkage disequilibrium and the selection of suitable polymorphisms for genotyping. Moreover, standard neutrality tests applied to DNA sequence variation data can be used to select candidate genes or amino acid sites that are putatively under selection for association mapping. In this article, we study the pattern of polymorphism of 18 candidate genes for drought-stress response in Pinus taeda L., an important tree crop. Data analyses based on a set of 21 putatively neutral nuclear microsatellites did not show population genetic structure or genomewide departures from neutrality. Candidate genes had moderate average nucleotide diversity at silent sites (pi(sil) = 0.00853), varying 100-fold among single genes. The level of within-gene LD was low, with an average pairwise r2 of 0.30, decaying rapidly from approximately 0.50 to approximately 0.20 at 800 bp. No apparent LD among genes was found. A selective sweep may have occurred at the early-response-to-drought-3 (erd3) gene, although population expansion can also explain our results and evidence for selection was not conclusive. One other gene, ccoaomt-1, a methylating enzyme involved in lignification, showed dimorphism (i.e., two highly divergent haplotype lineages at equal frequency), which is commonly associated with the long-term action of balancing selection. Finally, a set of haplotype-tagging SNPs (htSNPs) was selected. Using htSNPs, a reduction of genotyping effort of approximately 30-40%, while sampling most common allelic variants, can be gained in our ongoing association studies for drought tolerance in pine.

  5. Multiple-endpoints gene alteration-based (MEGA) assay: A toxicogenomics approach for water quality assessment of wastewater effluents.

    PubMed

    Fukushima, Toshikazu; Hara-Yamamura, Hiroe; Nakashima, Koji; Tan, Lea Chua; Okabe, Satoshi

    2017-12-01

    Wastewater effluents contain a significant number of toxic contaminants, which, even at low concentrations, display a wide variety of toxic actions. In this study, we developed a multiple-endpoints gene alteration-based (MEGA) assay, a real-time PCR-based transcriptomic analysis, to assess the water quality of wastewater effluents for human health risk assessment and management. Twenty-one genes from the human hepatoblastoma cell line (HepG2), covering the basic health-relevant stress responses such as response to xenobiotics, genotoxicity, and cytotoxicity, were selected and incorporated into the MEGA assay. The genes related to the p53-mediated DNA damage response and cytochrome P450 were selected as markers for genotoxicity and response to xenobiotics, respectively. Additionally, the genes that were dose-dependently regulated by exposure to the wastewater effluents were chosen as markers for cytotoxicity. The alterations in the expression of an individual gene, induced by exposure to the wastewater effluents, were evaluated by real-time PCR and the results were validated by genotoxicity (e.g., comet assay) and cell-based cytotoxicity tests. In summary, the MEGA assay is a real-time PCR-based assay that targets cellular responses to contaminants present in wastewater effluents at the transcriptional level; it is rapid, cost-effective, and high-throughput and can thus complement any chemical analysis for water quality assessment and management. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Genome-Wide Analysis Reveals Selection for Important Traits in Domestic Horse Breeds

    PubMed Central

    Petersen, Jessica L.; Mickelson, James R.; Rendahl, Aaron K.; Valberg, Stephanie J.; Andersson, Lisa S.; Axelsson, Jeanette; Bailey, Ernie; Bannasch, Danika; Binns, Matthew M.; Borges, Alexandre S.; Brama, Pieter; da Câmara Machado, Artur; Capomaccio, Stefano; Cappelli, Katia; Cothran, E. Gus; Distl, Ottmar; Fox-Clipsham, Laura; Graves, Kathryn T.; Guérin, Gérard; Haase, Bianca; Hasegawa, Telhisa; Hemmann, Karin; Hill, Emmeline W.; Leeb, Tosso; Lindgren, Gabriella; Lohi, Hannes; Lopes, Maria Susana; McGivney, Beatrice A.; Mikko, Sofia; Orr, Nicholas; Penedo, M. Cecilia T.; Piercy, Richard J.; Raekallio, Marja; Rieder, Stefan; Røed, Knut H.; Swinburne, June; Tozaki, Teruaki; Vaudin, Mark; Wade, Claire M.; McCue, Molly E.

    2013-01-01

    Intense selective pressures applied over short evolutionary time have resulted in homogeneity within, but substantial variation among, horse breeds. Utilizing this population structure, 744 individuals from 33 breeds, and a 54,000 SNP genotyping array, breed-specific targets of selection were identified using an FST-based statistic calculated in 500-kb windows across the genome. A 5.5-Mb region of ECA18, in which the myostatin (MSTN) gene was centered, contained the highest signature of selection in both the Paint and Quarter Horse. Gene sequencing and histological analysis of gluteal muscle biopsies showed a promoter variant and intronic SNP of MSTN were each significantly associated with higher Type 2B and lower Type 1 muscle fiber proportions in the Quarter Horse, demonstrating a functional consequence of selection at this locus. Signatures of selection on ECA23 in all gaited breeds in the sample led to the identification of a shared, 186-kb haplotype including two doublesex related mab transcription factor genes (DMRT2 and 3). The recent identification of a DMRT3 mutation within this haplotype, which appears necessary for the ability to perform alternative gaits, provides further evidence for selection at this locus. Finally, putative loci for the determination of size were identified in the draft breeds and the Miniature horse on ECA11, as well as when signatures of selection surrounding candidate genes at other loci were examined. This work provides further evidence of the importance of MSTN in racing breeds, provides strong evidence for selection upon gait and size, and illustrates the potential for population-based techniques to find genomic regions driving important phenotypes in the modern horse. PMID:23349635

  7. Selection signatures in four lignin genes from switchgrass populations divergently selected for in vitro dry matter digestibility

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Shiyu; Kaeppler, Shawn M.; Vogel, Kenneth P.

    Switchgrass is undergoing development as a dedicated cellulosic bioenergy crop. Fermentation of lignocellulosic biomass to ethanol in a bioenergy system or to volatile fatty acids in a livestock production system is strongly and negatively influenced by lignification of cell walls. This study detects specific loci that exhibit selection signatures across switchgrass breeding populations that differ in in vitro dry matter digestibility (IVDMD), ethanol yield, and lignin concentration. Allele frequency changes in candidate genes were used to detect loci under selection. Out of the 183 polymorphisms identified in the four candidate genes, twenty-five loci in the intron regions and four locimore » in coding regions were found to display a selection signature. All loci in the coding regions are synonymous substitutions. Selection in both directions were observed on polymorphisms that appeared to be under selection. Genetic diversity and linkage disequilibrium within the candidate genes were low. The recurrent divergent selection caused excessive moderate allele frequencies in the cycle 3 reduced lignin population as compared to the base population. As a result, this study provides valuable insight on genetic changes occurring in short-term selection in the polyploid populations, and discovered potential markers for breeding switchgrass with improved biomass quality.« less

  8. Selection signatures in four lignin genes from switchgrass populations divergently selected for in vitro dry matter digestibility

    DOE PAGES

    Chen, Shiyu; Kaeppler, Shawn M.; Vogel, Kenneth P.; ...

    2016-11-28

    Switchgrass is undergoing development as a dedicated cellulosic bioenergy crop. Fermentation of lignocellulosic biomass to ethanol in a bioenergy system or to volatile fatty acids in a livestock production system is strongly and negatively influenced by lignification of cell walls. This study detects specific loci that exhibit selection signatures across switchgrass breeding populations that differ in in vitro dry matter digestibility (IVDMD), ethanol yield, and lignin concentration. Allele frequency changes in candidate genes were used to detect loci under selection. Out of the 183 polymorphisms identified in the four candidate genes, twenty-five loci in the intron regions and four locimore » in coding regions were found to display a selection signature. All loci in the coding regions are synonymous substitutions. Selection in both directions were observed on polymorphisms that appeared to be under selection. Genetic diversity and linkage disequilibrium within the candidate genes were low. The recurrent divergent selection caused excessive moderate allele frequencies in the cycle 3 reduced lignin population as compared to the base population. As a result, this study provides valuable insight on genetic changes occurring in short-term selection in the polyploid populations, and discovered potential markers for breeding switchgrass with improved biomass quality.« less

  9. Genome-Wide Variation Patterns Uncover the Origin and Selection in Cultivated Ginseng (Panax ginseng Meyer)

    PubMed Central

    Li, Ming-Rui; Shi, Feng-Xue; Li, Ya-Ling; Jiang, Peng; Jiao, Lili

    2017-01-01

    Abstract Chinese ginseng (Panax ginseng Meyer) is a medicinally important herb and plays crucial roles in traditional Chinese medicine. Pharmacological analyses identified diverse bioactive components from Chinese ginseng. However, basic biological attributes including domestication and selection of the ginseng plant remain under-investigated. Here, we presented a genome-wide view of the domestication and selection of cultivated ginseng based on the whole genome data. A total of 8,660 protein-coding genes were selected for genome-wide scanning of the 30 wild and cultivated ginseng accessions. In complement, the 45s rDNA, chloroplast and mitochondrial genomes were included to perform phylogenetic and population genetic analyses. The observed spatial genetic structure between northern cultivated ginseng (NCG) and southern cultivated ginseng (SCG) accessions suggested multiple independent origins of cultivated ginseng. Genome-wide scanning further demonstrated that NCG and SCG have undergone distinct selection pressures during the domestication process, with more genes identified in the NCG (97 genes) than in the SCG group (5 genes). Functional analyses revealed that these genes are involved in diverse pathways, including DNA methylation, lignin biosynthesis, and cell differentiation. These findings suggested that the SCG and NCG groups have distinct demographic histories. Candidate genes identified are useful for future molecular breeding of cultivated ginseng. PMID:28922794

  10. Population genomics of the honey bee reveals strong signatures of positive selection on worker traits.

    PubMed

    Harpur, Brock A; Kent, Clement F; Molodtsova, Daria; Lebon, Jonathan M D; Alqarni, Abdulaziz S; Owayss, Ayman A; Zayed, Amro

    2014-02-18

    Most theories used to explain the evolution of eusociality rest upon two key assumptions: mutations affecting the phenotype of sterile workers evolve by positive selection if the resulting traits benefit fertile kin, and that worker traits provide the primary mechanism allowing social insects to adapt to their environment. Despite the common view that positive selection drives phenotypic evolution of workers, we know very little about the prevalence of positive selection acting on the genomes of eusocial insects. We mapped the footprints of positive selection in Apis mellifera through analysis of 40 individual genomes, allowing us to identify thousands of genes and regulatory sequences with signatures of adaptive evolution over multiple timescales. We found Apoidea- and Apis-specific genes to be enriched for signatures of positive selection, indicating that novel genes play a disproportionately large role in adaptive evolution of eusocial insects. Worker-biased proteins have higher signatures of adaptive evolution relative to queen-biased proteins, supporting the view that worker traits are key to adaptation. We also found genes regulating worker division of labor to be enriched for signs of positive selection. Finally, genes associated with worker behavior based on analysis of brain gene expression were highly enriched for adaptive protein and cis-regulatory evolution. Our study highlights the significant contribution of worker phenotypes to adaptive evolution in social insects, and provides a wealth of knowledge on the loci that influence fitness in honey bees.

  11. Population genomics of the honey bee reveals strong signatures of positive selection on worker traits

    PubMed Central

    Harpur, Brock A.; Kent, Clement F.; Molodtsova, Daria; Lebon, Jonathan M. D.; Alqarni, Abdulaziz S.; Owayss, Ayman A.; Zayed, Amro

    2014-01-01

    Most theories used to explain the evolution of eusociality rest upon two key assumptions: mutations affecting the phenotype of sterile workers evolve by positive selection if the resulting traits benefit fertile kin, and that worker traits provide the primary mechanism allowing social insects to adapt to their environment. Despite the common view that positive selection drives phenotypic evolution of workers, we know very little about the prevalence of positive selection acting on the genomes of eusocial insects. We mapped the footprints of positive selection in Apis mellifera through analysis of 40 individual genomes, allowing us to identify thousands of genes and regulatory sequences with signatures of adaptive evolution over multiple timescales. We found Apoidea- and Apis-specific genes to be enriched for signatures of positive selection, indicating that novel genes play a disproportionately large role in adaptive evolution of eusocial insects. Worker-biased proteins have higher signatures of adaptive evolution relative to queen-biased proteins, supporting the view that worker traits are key to adaptation. We also found genes regulating worker division of labor to be enriched for signs of positive selection. Finally, genes associated with worker behavior based on analysis of brain gene expression were highly enriched for adaptive protein and cis-regulatory evolution. Our study highlights the significant contribution of worker phenotypes to adaptive evolution in social insects, and provides a wealth of knowledge on the loci that influence fitness in honey bees. PMID:24488971

  12. flyDIVaS: A Comparative Genomics Resource for Drosophila Divergence and Selection

    PubMed Central

    Stanley, Craig E.; Kulathinal, Rob J.

    2016-01-01

    With arguably the best finished and expertly annotated genome assembly, Drosophila melanogaster is a formidable genetics model to study all aspects of biology. Nearly a decade ago, the 12 Drosophila genomes project expanded D. melanogaster’s breadth as a comparative model through the community-development of an unprecedented genus- and genome-wide comparative resource. However, since its inception, these datasets for evolutionary inference and biological discovery have become increasingly outdated, outmoded, and inaccessible. Here, we provide an updated and upgradable comparative genomics resource of Drosophila divergence and selection, flyDIVaS, based on the latest genomic assemblies, curated FlyBase annotations, and recent OrthoDB orthology calls. flyDIVaS is an online database containing D. melanogaster-centric orthologous gene sets, CDS and protein alignments, divergence statistics (% gaps, dN, dS, dN/dS), and codon-based tests of positive Darwinian selection. Out of 13,920 protein-coding D. melanogaster genes, ∼80% have one aligned ortholog in the closely related species, D. simulans, and ∼50% have 1–1 12-way alignments in the original 12 sequenced species that span over 80 million yr of divergence. Genes and their orthologs can be chosen from four different taxonomic datasets differing in phylogenetic depth and coverage density, and visualized via interactive alignments and phylogenetic trees. Users can also batch download entire comparative datasets. A functional survey finds conserved mitotic and neural genes, highly diverged immune and reproduction-related genes, more conspicuous signals of divergence across tissue-specific genes, and an enrichment of positive selection among highly diverged genes. flyDIVaS will be regularly updated and can be freely accessed at www.flydivas.info. We encourage researchers to regularly use this resource as a tool for biological inference and discovery, and in their classrooms to help train the next generation of biologists to creatively use such genomic big data resources in an integrative manner. PMID:27226167

  13. flyDIVaS: A Comparative Genomics Resource for Drosophila Divergence and Selection.

    PubMed

    Stanley, Craig E; Kulathinal, Rob J

    2016-08-09

    With arguably the best finished and expertly annotated genome assembly, Drosophila melanogaster is a formidable genetics model to study all aspects of biology. Nearly a decade ago, the 12 Drosophila genomes project expanded D. melanogaster's breadth as a comparative model through the community-development of an unprecedented genus- and genome-wide comparative resource. However, since its inception, these datasets for evolutionary inference and biological discovery have become increasingly outdated, outmoded, and inaccessible. Here, we provide an updated and upgradable comparative genomics resource of Drosophila divergence and selection, flyDIVaS, based on the latest genomic assemblies, curated FlyBase annotations, and recent OrthoDB orthology calls. flyDIVaS is an online database containing D. melanogaster-centric orthologous gene sets, CDS and protein alignments, divergence statistics (% gaps, dN, dS, dN/dS), and codon-based tests of positive Darwinian selection. Out of 13,920 protein-coding D. melanogaster genes, ∼80% have one aligned ortholog in the closely related species, D. simulans, and ∼50% have 1-1 12-way alignments in the original 12 sequenced species that span over 80 million yr of divergence. Genes and their orthologs can be chosen from four different taxonomic datasets differing in phylogenetic depth and coverage density, and visualized via interactive alignments and phylogenetic trees. Users can also batch download entire comparative datasets. A functional survey finds conserved mitotic and neural genes, highly diverged immune and reproduction-related genes, more conspicuous signals of divergence across tissue-specific genes, and an enrichment of positive selection among highly diverged genes. flyDIVaS will be regularly updated and can be freely accessed at www.flydivas.info We encourage researchers to regularly use this resource as a tool for biological inference and discovery, and in their classrooms to help train the next generation of biologists to creatively use such genomic big data resources in an integrative manner. Copyright © 2016 Stanley and Kulathinal.

  14. Marker-assisted combination of major genes for pathogen resistance in potato.

    PubMed

    Gebhardt, C; Bellin, D; Henselewski, H; Lehmann, W; Schwarzfischer, J; Valkonen, J P T

    2006-05-01

    Closely linked PCR-based markers facilitate the tracing and combining of resistance factors that have been introgressed previously into cultivated potato from different sources. Crosses were performed to combine the Ry ( adg ) gene for extreme resistance to Potato virus Y (PVY) with the Gro1 gene for resistance to the root cyst nematode Globodera rostochiensis and the Rx1 gene for extreme resistance to Potato virus X (PVX), or with resistance to potato wart (Synchytrium endobioticum). Marker-assisted selection (MAS) using four PCR-based diagnostic assays was applied to 110 F1 hybrids resulting from four 2x by 4x cross-combinations. Thirty tetraploid plants having the appropriate marker combinations were selected and tested for presence of the corresponding resistance traits. All plants tested showed the expected resistant phenotype. Unexpectedly, the plants segregated for additional resistance to pathotypes 1, 2 and 6 of S. endobioticum, which was subsequently shown to be inherited from the PVY resistant parents of the crosses. The selected plants can be used as sources of multiple resistance traits in pedigree breeding and are available from a potato germplasm bank.

  15. GeneRIF indexing: sentence selection based on machine learning.

    PubMed

    Jimeno-Yepes, Antonio J; Sticco, J Caitlin; Mork, James G; Aronson, Alan R

    2013-05-31

    A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries. The creation of GeneRIF entries involves the identification of the genes mentioned in MEDLINE®; citations and the sentences describing a novel function. We have compared several learning algorithms and several features extracted or derived from MEDLINE sentences to determine if a sentence should be selected for GeneRIF indexing. Features are derived from the sentences or using mechanisms to augment the information provided by them: assigning a discourse label using a previously trained model, for example. We show that machine learning approaches with specific feature combinations achieve results close to one of the annotators. We have evaluated different feature sets and learning algorithms. In particular, Naïve Bayes achieves better performance with a selection of features similar to one used in related work, which considers the location of the sentence, the discourse of the sentence and the functional terminology in it. The current performance is at a level similar to human annotation and it shows that machine learning can be used to automate the task of sentence selection for GeneRIF annotation. The current experiments are limited to the human species. We would like to see how the methodology can be extended to other species, specifically the normalization of gene mentions in other species.

  16. Nitrate-induced genes in tomato roots. Array analysis reveals novel genes that may play a role in nitrogen nutrition.

    PubMed

    Wang, Y H; Garvin, D F; Kochian, L V

    2001-09-01

    A subtractive tomato (Lycopersicon esculentum) root cDNA library enriched in genes up-regulated by changes in plant mineral status was screened with labeled mRNA from roots of both nitrate-induced and mineral nutrient-deficient (-nitrogen [N], -phosphorus, -potassium [K], -sulfur, -magnesium, -calcium, -iron, -zinc, and -copper) tomato plants. A subset of cDNAs was selected from this library based on mineral nutrient-related changes in expression. Additional cDNAs were selected from a second mineral-deficient tomato root library based on sequence homology to known genes. These selection processes yielded a set of 1,280 mineral nutrition-related cDNAs that were arrayed on nylon membranes for further analysis. These high-density arrays were hybridized with mRNA from tomato plants exposed to nitrate at different time points after N was withheld for 48 h, for plants that were grown on nitrate/ammonium for 5 weeks prior to the withholding of N. One hundred-fifteen genes were found to be up-regulated by nitrate resupply. Among these genes were several previously identified as nitrate responsive, including nitrate transporters, nitrate and nitrite reductase, and metabolic enzymes such as transaldolase, transketolase, malate dehydrogenase, asparagine synthetase, and histidine decarboxylase. We also identified 14 novel nitrate-inducible genes, including: (a) water channels, (b) root phosphate and K(+) transporters, (c) genes potentially involved in transcriptional regulation, (d) stress response genes, and (e) ribosomal protein genes. In addition, both families of nitrate transporters were also found to be inducible by phosphate, K, and iron deficiencies. The identification of these novel nitrate-inducible genes is providing avenues of research that will yield new insights into the molecular basis of plant N nutrition, as well as possible networking between the regulation of N, phosphorus, and K nutrition.

  17. Evolutionary Origins of Cancer Driver Genes and Implications for Cancer Prognosis

    PubMed Central

    Chu, Xin-Yi; Zhou, Xiong-Hui; Cui, Ze-Jia; Zhang, Hong-Yu

    2017-01-01

    The cancer atavistic theory suggests that carcinogenesis is a reverse evolution process. It is thus of great interest to explore the evolutionary origins of cancer driver genes and the relevant mechanisms underlying the carcinogenesis. Moreover, the evolutionary features of cancer driver genes could be helpful in selecting cancer biomarkers from high-throughput data. In this study, through analyzing the cancer endogenous molecular networks, we revealed that the subnetwork originating from eukaryota could control the unlimited proliferation of cancer cells, and the subnetwork originating from eumetazoa could recapitulate the other hallmarks of cancer. In addition, investigations based on multiple datasets revealed that cancer driver genes were enriched in genes originating from eukaryota, opisthokonta, and eumetazoa. These results have important implications for enhancing the robustness of cancer prognosis models through selecting the gene signatures by the gene age information. PMID:28708071

  18. Evolutionary Origins of Cancer Driver Genes and Implications for Cancer Prognosis.

    PubMed

    Chu, Xin-Yi; Jiang, Ling-Han; Zhou, Xiong-Hui; Cui, Ze-Jia; Zhang, Hong-Yu

    2017-07-14

    The cancer atavistic theory suggests that carcinogenesis is a reverse evolution process. It is thus of great interest to explore the evolutionary origins of cancer driver genes and the relevant mechanisms underlying the carcinogenesis. Moreover, the evolutionary features of cancer driver genes could be helpful in selecting cancer biomarkers from high-throughput data. In this study, through analyzing the cancer endogenous molecular networks, we revealed that the subnetwork originating from eukaryota could control the unlimited proliferation of cancer cells, and the subnetwork originating from eumetazoa could recapitulate the other hallmarks of cancer. In addition, investigations based on multiple datasets revealed that cancer driver genes were enriched in genes originating from eukaryota, opisthokonta, and eumetazoa. These results have important implications for enhancing the robustness of cancer prognosis models through selecting the gene signatures by the gene age information.

  19. An Adaptive Genetic Association Test Using Double Kernel Machines.

    PubMed

    Zhan, Xiang; Epstein, Michael P; Ghosh, Debashis

    2015-10-01

    Recently, gene set-based approaches have become very popular in gene expression profiling studies for assessing how genetic variants are related to disease outcomes. Since most genes are not differentially expressed, existing pathway tests considering all genes within a pathway suffer from considerable noise and power loss. Moreover, for a differentially expressed pathway, it is of interest to select important genes that drive the effect of the pathway. In this article, we propose an adaptive association test using double kernel machines (DKM), which can both select important genes within the pathway as well as test for the overall genetic pathway effect. This DKM procedure first uses the garrote kernel machines (GKM) test for the purposes of subset selection and then the least squares kernel machine (LSKM) test for testing the effect of the subset of genes. An appealing feature of the kernel machine framework is that it can provide a flexible and unified method for multi-dimensional modeling of the genetic pathway effect allowing for both parametric and nonparametric components. This DKM approach is illustrated with application to simulated data as well as to data from a neuroimaging genetics study.

  20. DISSECTING THE GENETICS OF HUMAN HIGH MYOPIA: A MOLECULAR BIOLOGIC APPROACH

    PubMed Central

    Young, Terri L

    2004-01-01

    ABSTRACT Purpose Despite the plethora of experimental myopia animal studies that demonstrate biochemical factor changes in various eye tissues, and limited human studies utilizing pharmacologic agents to thwart axial elongation, we have little knowledge of the basic physiology that drives myopic development. Identifying the implicated genes for myopia susceptibility will provide a fundamental molecular understanding of how myopia occurs and may lead to directed physiologic (ie, pharmacologic, gene therapy) interventions. The purpose of this proposal is to describe the results of positional candidate gene screening of selected genes within the autosomal dominant high-grade myopia-2 locus (MYP2) on chromosome 18p11.31. Methods A physical map of a contracted MYP2 interval was compiled, and gene expression studies in ocular tissues using complementary DNA library screens, microarray matches, and reverse-transcription techniques aided in prioritizing gene selection for screening. The TGIF, EMLIN-2, MLCB, and CLUL1 genes were screened in DNA samples from unrelated controls and in high-myopia affected and unaffected family members from the original seven MYP2 pedigrees. All candidate genes were screened by direct base pair sequence analysis. Results Consistent segregation of a gene sequence alteration (polymorphism) with myopia was not demonstrated in any of the seven families. Novel single nucleotide polymorphisms were found. Conclusion The positional candidate genes TGIF, EMLIN-2, MLCB, and CLUL1 are not associated with MYP2-linked high-grade myopia. Base change polymorphisms discovered with base sequence screening of these genes were submitted to an Internet database. Other genes that also map within the interval are currently undergoing mutation screening. PMID:15747770

  1. An interpretive review of selective sweep studies in Bos taurus cattle populations: identification of unique and shared selection signals across breeds

    PubMed Central

    Gutiérrez-Gil, Beatriz; Arranz, Juan J.; Wiener, Pamela

    2015-01-01

    This review compiles the results of 21 genomic studies of European Bos taurus breeds and thus provides a general picture of the selection signatures in taurine cattle identified by genome-wide selection-mapping scans. By performing a comprehensive summary of the results reported in the literature, we compiled a list of 1049 selection sweeps described across 37 cattle breeds (17 beef breeds, 14 dairy breeds, and 6 dual-purpose breeds), and four different beef-vs.-dairy comparisons, which we subsequently grouped into core selective sweep (CSS) regions, defined as consecutive signals within 1 Mb of each other. We defined a total of 409 CSSs across the 29 bovine autosomes, 232 (57%) of which were associated with a single-breed (Single-breed CSSs), 134 CSSs (33%) were associated with a limited number of breeds (Two-to-Four-breed CSSs) and 39 CSSs (9%) were associated with five or more breeds (Multi-breed CSSs). For each CSS, we performed a candidate gene survey that identified 291 genes within the CSS intervals (from the total list of 5183 BioMart-extracted genes) linked to dairy and meat production, stature, and coat color traits. A complementary functional enrichment analysis of the CSS positional candidates highlighted other genes related to pathways underlying behavior, immune response, and reproductive traits. The Single-breed CSSs revealed an over-representation of genes related to dairy and beef production, this was further supported by over-representation of production-related pathway terms in these regions based on a functional enrichment analysis. Overall, this review provides a comparative map of the selection sweeps reported in European cattle breeds and presents for the first time a characterization of the selection sweeps that are found in individual breeds. Based on their uniqueness, these breed-specific signals could be considered as “divergence signals,” which may be useful in characterizing and protecting livestock genetic diversity. PMID:26029239

  2. Selective Constraints on Coding Sequences of Nervous System Genes Are a Major Determinant of Duplicate Gene Retention in Vertebrates

    PubMed Central

    Roux, Julien; Liu, Jialin; Robinson-Rechavi, Marc

    2017-01-01

    Abstract The evolutionary history of vertebrates is marked by three ancient whole-genome duplications: two successive rounds in the ancestor of vertebrates, and a third one specific to teleost fishes. Biased loss of most duplicates enriched the genome for specific genes, such as slow evolving genes, but this selective retention process is not well understood. To understand what drives the long-term preservation of duplicate genes, we characterized duplicated genes in terms of their expression patterns. We used a new method of expression enrichment analysis, TopAnat, applied to in situ hybridization data from thousands of genes from zebrafish and mouse. We showed that the presence of expression in the nervous system is a good predictor of a higher rate of retention of duplicate genes after whole-genome duplication. Further analyses suggest that purifying selection against the toxic effects of misfolded or misinteracting proteins, which is particularly strong in nonrenewing neural tissues, likely constrains the evolution of coding sequences of nervous system genes, leading indirectly to the preservation of duplicate genes after whole-genome duplication. Whole-genome duplications thus greatly contributed to the expansion of the toolkit of genes available for the evolution of profound novelties of the nervous system at the base of the vertebrate radiation. PMID:28981708

  3. Evaluation of Gene-Based Family-Based Methods to Detect Novel Genes Associated With Familial Late Onset Alzheimer Disease

    PubMed Central

    Fernández, Maria V.; Budde, John; Del-Aguila, Jorge L.; Ibañez, Laura; Deming, Yuetiva; Harari, Oscar; Norton, Joanne; Morris, John C.; Goate, Alison M.; Cruchaga, Carlos

    2018-01-01

    Gene-based tests to study the combined effect of rare variants on a particular phenotype have been widely developed for case-control studies, but their evolution and adaptation for family-based studies, especially studies of complex incomplete families, has been slower. In this study, we have performed a practical examination of all the latest gene-based methods available for family-based study designs using both simulated and real datasets. We examined the performance of several collapsing, variance-component, and transmission disequilibrium tests across eight different software packages and 22 models utilizing a cohort of 285 families (N = 1,235) with late-onset Alzheimer disease (LOAD). After a thorough examination of each of these tests, we propose a methodological approach to identify, with high confidence, genes associated with the tested phenotype and we provide recommendations to select the best software and model for family-based gene-based analyses. Additionally, in our dataset, we identified PTK2B, a GWAS candidate gene for sporadic AD, along with six novel genes (CHRD, CLCN2, HDLBP, CPAMD8, NLRP9, and MAS1L) as candidate genes for familial LOAD. PMID:29670507

  4. Evaluation of Gene-Based Family-Based Methods to Detect Novel Genes Associated With Familial Late Onset Alzheimer Disease.

    PubMed

    Fernández, Maria V; Budde, John; Del-Aguila, Jorge L; Ibañez, Laura; Deming, Yuetiva; Harari, Oscar; Norton, Joanne; Morris, John C; Goate, Alison M; Cruchaga, Carlos

    2018-01-01

    Gene-based tests to study the combined effect of rare variants on a particular phenotype have been widely developed for case-control studies, but their evolution and adaptation for family-based studies, especially studies of complex incomplete families, has been slower. In this study, we have performed a practical examination of all the latest gene-based methods available for family-based study designs using both simulated and real datasets. We examined the performance of several collapsing, variance-component, and transmission disequilibrium tests across eight different software packages and 22 models utilizing a cohort of 285 families ( N = 1,235) with late-onset Alzheimer disease (LOAD). After a thorough examination of each of these tests, we propose a methodological approach to identify, with high confidence, genes associated with the tested phenotype and we provide recommendations to select the best software and model for family-based gene-based analyses. Additionally, in our dataset, we identified PTK2B , a GWAS candidate gene for sporadic AD, along with six novel genes ( CHRD, CLCN2, HDLBP, CPAMD8, NLRP9 , and MAS1L ) as candidate genes for familial LOAD.

  5. Prioritizing Genes Related to Nicotine Addiction Via a Multi-source-Based Approach.

    PubMed

    Liu, Xinhua; Liu, Meng; Li, Xia; Zhang, Lihua; Fan, Rui; Wang, Ju

    2015-08-01

    Nicotine has a broad impact on both the central and peripheral nervous systems. Over the past decades, an increasing number of genes potentially involved in nicotine addiction have been identified by different technical approaches. However, the molecular mechanisms underlying nicotine addiction remain largely unknown. Under such situation, prioritizing the candidate genes for further investigation is becoming increasingly important. In this study, we presented a multi-source-based gene prioritization approach for nicotine addiction by utilizing the vast amounts of information generated from for nicotine addiction study during the past years. In this approach, we first collected and curated genes from studies in four categories, i.e., genetic association analysis, genetic linkage analysis, high-throughput gene/protein expression analysis, and literature search of single gene/protein-based studies. Based on these resources, the genes were scored and a weight value was determined for each category. Finally, the genes were ranked by their combined scores, and 220 genes were selected as the prioritized nicotine addiction-related genes. Evaluation suggested the prioritized genes were promising targets for further analysis and replication study.

  6. Analysis of base and codon usage by rubella virus.

    PubMed

    Zhou, Yumei; Chen, Xianfeng; Ushijima, Hiroshi; Frey, Teryl K

    2012-05-01

    Rubella virus (RUBV), a small, plus-strand RNA virus that is an important human pathogen, has the unique feature that the GC content of its genome (70%) is the highest (by 20%) among RNA viruses. To determine the effect of this GC content on genomic evolution, base and codon usage were analyzed across viruses from eight diverse genotypes of RUBV. Despite differences in frequency of codon use, the favored codons in the RUBV genome matched those in the human genome for 18 of the 20 amino acids, indicating adaptation to the host. Although usage patterns were conserved in corresponding genes in the diverse genotypes, within-genome comparison revealed that both base and codon usages varied regionally, particularly in the hypervariable region (HVR) of the P150 replicase gene. While directional mutation pressure was predominant in determining base and codon usage within most of the genome (with the strongest tendency being towards C's at third codon positions), natural selection was predominant in the HVR region. The GC content of this region was the highest in the genome (>80%), and it was not clear if selection at the nucleotide level accompanied selection at the amino acid level. Dinucleotide frequency analysis of the RUBV genome revealed that TpA usage was lower than expected, similar to mammalian genes; however, CpG usage was not suppressed, and TpG usage was not enhanced, as is the case in mammalian genes.

  7. Analysis of the GRNs Inference by Using Tsallis Entropy and a Feature Selection Approach

    NASA Astrophysics Data System (ADS)

    Lopes, Fabrício M.; de Oliveira, Evaldo A.; Cesar, Roberto M.

    An important problem in the bioinformatics field is to understand how genes are regulated and interact through gene networks. This knowledge can be helpful for many applications, such as disease treatment design and drugs creation purposes. For this reason, it is very important to uncover the functional relationship among genes and then to construct the gene regulatory network (GRN) from temporal expression data. However, this task usually involves data with a large number of variables and small number of observations. In this way, there is a strong motivation to use pattern recognition and dimensionality reduction approaches. In particular, feature selection is specially important in order to select the most important predictor genes that can explain some phenomena associated with the target genes. This work presents a first study about the sensibility of entropy methods regarding the entropy functional form, applied to the problem of topology recovery of GRNs. The generalized entropy proposed by Tsallis is used to study this sensibility. The inference process is based on a feature selection approach, which is applied to simulated temporal expression data generated by an artificial gene network (AGN) model. The inferred GRNs are validated in terms of global network measures. Some interesting conclusions can be drawn from the experimental results, as reported for the first time in the present paper.

  8. Exploring Wound-Healing Genomic Machinery with a Network-Based Approach

    PubMed Central

    Vitali, Francesca; Marini, Simone; Balli, Martina; Grosemans, Hanne; Sampaolesi, Maurilio; Lussier, Yves A.; Cusella De Angelis, Maria Gabriella; Bellazzi, Riccardo

    2017-01-01

    The molecular mechanisms underlying tissue regeneration and wound healing are still poorly understood despite their importance. In this paper we develop a bioinformatics approach, combining biology and network theory to drive experiments for better understanding the genetic underpinnings of wound healing mechanisms and for selecting potential drug targets. We start by selecting literature-relevant genes in murine wound healing, and inferring from them a Protein-Protein Interaction (PPI) network. Then, we analyze the network to rank wound healing-related genes according to their topological properties. Lastly, we perform a procedure for in-silico simulation of a treatment action in a biological pathway. The findings obtained by applying the developed pipeline, including gene expression analysis, confirms how a network-based bioinformatics method is able to prioritize candidate genes for in vitro analysis, thus speeding up the understanding of molecular mechanisms and supporting the discovery of potential drug targets. PMID:28635674

  9. The nature of selection on the major histocompatibility complex.

    PubMed

    Apanius, V; Penn, D; Slev, P R; Ruff, L R; Potts, W K

    1997-01-01

    Only natural selection can account for the extreme genetic diversity of genes of the major histocompatibility complex (MHC). Although the structure and function of classic MHC genes is well understood at the molecular and cellular levels, there is controversy about how MHC diversity is selectively maintained. The diversifying selection can be driven by pathogen interactions and inbreeding avoidance mechanisms. Pathogen-driven selection can maintain MHC polymorphism based on heterozygote advantage or frequency-dependent selection due to pathogen evasion of MHC-dependent immune recognition. Empirical evidence demonstrates that specific MHC haplotypes are resistant to certain infectious agents, while susceptible to others. These data are consistent with both heterozygote advantage and frequency-dependent models. Additional research is needed to discriminate between these mechanisms. Infectious agents can precipitate autoimmunity and can potentially contribute to MHC diversity through molecular mimicry and by favoring immunodominance. MHC-dependent abortion and mate choice, based on olfaction, can also maintain MHC diversity and probably functions both to avoid genome-wide inbreeding and produce MHC-heterozygous offspring with increased immune responsiveness. Although this diverse set of hypotheses are often treated as competing alternatives, we believe that they all fit into a coherent, internally consistent thesis. It is likely that at least in some species, all of these mechanisms operate, leading to the extreme diversification found in MHC genes.

  10. A Granular Self-Organizing Map for Clustering and Gene Selection in Microarray Data.

    PubMed

    Ray, Shubhra Sankar; Ganivada, Avatharam; Pal, Sankar K

    2016-09-01

    A new granular self-organizing map (GSOM) is developed by integrating the concept of a fuzzy rough set with the SOM. While training the GSOM, the weights of a winning neuron and the neighborhood neurons are updated through a modified learning procedure. The neighborhood is newly defined using the fuzzy rough sets. The clusters (granules) evolved by the GSOM are presented to a decision table as its decision classes. Based on the decision table, a method of gene selection is developed. The effectiveness of the GSOM is shown in both clustering samples and developing an unsupervised fuzzy rough feature selection (UFRFS) method for gene selection in microarray data. While the superior results of the GSOM, as compared with the related clustering methods, are provided in terms of β -index, DB-index, Dunn-index, and fuzzy rough entropy, the genes selected by the UFRFS are not only better in terms of classification accuracy and a feature evaluation index, but also statistically more significant than the related unsupervised methods. The C-codes of the GSOM and UFRFS are available online at http://avatharamg.webs.com/software-code.

  11. An alternative agriculture system is defined by a distinct expression profile of select gene transcripts and proteins

    PubMed Central

    Kumar, Vinod; Mills, Douglas J.; Anderson, James D.; Mattoo, Autar K.

    2004-01-01

    Conventional agriculture has relied heavily on chemical inputs that have negatively impacted the environment and increased production costs. Transition to agricultural sustainability is a major challenge and requires that alternative agricultural practices are scientifically analyzed to provide a sufficiently informative knowledge base in favor of alternative farming practices. We show a molecular basis for delayed leaf senescence and tolerance to diseases in tomato plants cultivated in a legume (hairy vetch) mulch-based alternative agricultural system. In the hairy vetch-cultivated plants, expression of specific and select classes of genes is up-regulated compared to those grown on black polyethylene mulch. These include N-responsive genes such as NiR, GS1, rbcL, rbcS, and G6PD; chaperone genes such as hsp70 and BiP; defense genes such as chitinase and osmotin; a cytokinin-responsive gene CKR; and gibberellic acid 20 oxidase. We present a model of how their protein products likely complement one another in a field scenario to effect efficient utilization and mobilization of C and N, promote defense against disease, and enhance longevity. PMID:15249656

  12. Genome-Wide Analyses Reveal Genes Subject to Positive Selection in Pasteurella multocida

    PubMed Central

    Cao, Peili; Guo, Dongchun; Liu, Jiasen; Jiang, Qian; Xu, Zhuofei; Qu, Liandong

    2017-01-01

    Pasteurella multocida, a Gram-negative opportunistic pathogen, has led to a broad range of diseases in mammals and birds, including fowl cholera in poultry, pneumonia and atrophic rhinitis in swine and rabbit, hemorrhagic septicemia in cattle, and bite infections in humans. In order to better interpret the genetic diversity and adaptation evolution of this pathogen, seven genomes of P. multocida strains isolated from fowls, rabbit and pigs were determined by using high-throughput sequencing approach. Together with publicly available P. multocida genomes, evolutionary features were systematically analyzed in this study. Clustering of 70,565 protein-coding genes showed that the pangenome of 33 P. multocida strains was composed of 1,602 core genes, 1,364 dispensable genes, and 1,070 strain-specific genes. Of these, we identified a full spectrum of genes related to virulence factors and revealed genetic diversity of these potential virulence markers across P. multocida strains, e.g., bcbAB, fcbC, lipA, bexDCA, ctrCD, lgtA, lgtC, lic2A involved in biogenesis of surface polysaccharides, hsf encoding autotransporter adhesin, and fhaB encoding filamentous haemagglutinin. Furthermore, based on genome-wide positive selection scanning, a total of 35 genes were subject to strong selection pressure. Extensive analyses of protein subcellular location indicated that membrane-associated genes were highly abundant among all positively selected genes. The detected amino acid sites undergoing adaptive selection were preferably located in extracellular space, perhaps associated with bacterial evasion of host immune responses. Our findings shed more light on conservation and distribution of virulence-associated genes across P. multocida strains. Meanwhile, this study provides a genetic context for future researches on the mechanism of adaptive evolution in P. multocida. PMID:28611758

  13. Does antifouling paint select for antibiotic resistance?

    PubMed

    Flach, Carl-Fredrik; Pal, Chandan; Svensson, Carl Johan; Kristiansson, Erik; Östman, Marcus; Bengtsson-Palme, Johan; Tysklind, Mats; Larsson, D G Joakim

    2017-07-15

    There is concern that heavy metals and biocides contribute to the development of antibiotic resistance via co-selection. Most antifouling paints contain high amounts of such substances, which risks turning painted ship hulls into highly mobile refuges and breeding grounds for antibiotic-resistant bacteria. The objectives of this study were to start investigate if heavy-metal based antifouling paints can pose a risk for co-selection of antibiotic-resistant bacteria and, if so, identify the underlying genetic basis. Plastic panels with one side painted with copper and zinc-containing antifouling paint were submerged in a Swedish marina and biofilms from both sides of the panels were harvested after 2.5-4weeks. DNA was isolated from the biofilms and subjected to metagenomic sequencing. Biofilm bacteria were cultured on marine agar supplemented with tetracycline, gentamicin, copper sulfate or zinc sulfate. Biofilm communities from painted surfaces displayed lower taxonomic diversity and enrichment of Gammaproteobacteria. Bacteria from these communities showed increased resistance to both heavy metals and tetracycline but not to gentamicin. Significantly higher abundance of metal and biocide resistance genes was observed, whereas mobile antibiotic resistance genes were not enriched in these communities. In contrast, we found an enrichment of chromosomal RND efflux system genes, including such with documented ability to confer decreased susceptibility to both antibiotics and biocides/heavy metals. This was paralleled by increased abundances of integron-associated integrase and ISCR transposase genes. The results show that the heavy metal-based antifouling paint exerts a strong selection pressure on marine bacterial communities and can co-select for certain antibiotic-resistant bacteria, likely by favoring species and strains carrying genes that provide cross-resistance. Although this does not indicate an immediate risk for promotion of mobile antibiotic resistance, the clear increase of genes involved in mobilizing DNA provides a foundation for increased opportunities for gene transfer in such communities, which might also involve yet unknown resistance mechanisms. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Inferring gene dependency network specific to phenotypic alteration based on gene expression data and clinical information of breast cancer.

    PubMed

    Zhou, Xionghui; Liu, Juan

    2014-01-01

    Although many methods have been proposed to reconstruct gene regulatory network, most of them, when applied in the sample-based data, can not reveal the gene regulatory relations underlying the phenotypic change (e.g. normal versus cancer). In this paper, we adopt phenotype as a variable when constructing the gene regulatory network, while former researches either neglected it or only used it to select the differentially expressed genes as the inputs to construct the gene regulatory network. To be specific, we integrate phenotype information with gene expression data to identify the gene dependency pairs by using the method of conditional mutual information. A gene dependency pair (A,B) means that the influence of gene A on the phenotype depends on gene B. All identified gene dependency pairs constitute a directed network underlying the phenotype, namely gene dependency network. By this way, we have constructed gene dependency network of breast cancer from gene expression data along with two different phenotype states (metastasis and non-metastasis). Moreover, we have found the network scale free, indicating that its hub genes with high out-degrees may play critical roles in the network. After functional investigation, these hub genes are found to be biologically significant and specially related to breast cancer, which suggests that our gene dependency network is meaningful. The validity has also been justified by literature investigation. From the network, we have selected 43 discriminative hubs as signature to build the classification model for distinguishing the distant metastasis risks of breast cancer patients, and the result outperforms those classification models with published signatures. In conclusion, we have proposed a promising way to construct the gene regulatory network by using sample-based data, which has been shown to be effective and accurate in uncovering the hidden mechanism of the biological process and identifying the gene signature for phenotypic change.

  15. An Evolution-Based Screen for Genetic Differentiation between Anopheles Sister Taxa Enriches for Detection of Functional Immune Factors

    PubMed Central

    Takashima, Eizo; Williams, Marni; Eiglmeier, Karin; Pain, Adrien; Guelbeogo, Wamdaogo M.; Gneme, Awa; Brito-Fravallo, Emma; Holm, Inge; Lavazec, Catherine; Sagnon, N’Fale; Baxter, Richard H.; Riehle, Michelle M.; Vernick, Kenneth D.

    2015-01-01

    Nucleotide variation patterns across species are shaped by the processes of natural selection, including exposure to environmental pathogens. We examined patterns of genetic variation in two sister species, Anopheles gambiae and Anopheles coluzzii, both efficient natural vectors of human malaria in West Africa. We used the differentiation signature displayed by a known coordinate selective sweep of immune genes APL1 and TEP1 in A. coluzzii to design a population genetic screen trained on the sweep, classified a panel of 26 potential immune genes for concordance with the signature, and functionally tested their immune phenotypes. The screen results were strongly predictive for genes with protective immune phenotypes: genes meeting the screen criteria were significantly more likely to display a functional phenotype against malaria infection than genes not meeting the criteria (p = 0.0005). Thus, an evolution-based screen can efficiently prioritize candidate genes for labor-intensive downstream functional testing, and safely allow the elimination of genes not meeting the screen criteria. The suite of immune genes with characteristics similar to the APL1-TEP1 selective sweep appears to be more widespread in the A. coluzzii genome than previously recognized. The immune gene differentiation may be a consequence of adaptation of A. coluzzii to new pathogens encountered in its niche expansion during the separation from A. gambiae, although the role, if any of natural selection by Plasmodium is unknown. Application of the screen allowed identification of new functional immune factors, and assignment of new functions to known factors. We describe biochemical binding interactions between immune proteins that underlie functional activity for malaria infection, which highlights the interplay between pathogen specificity and the structure of immune complexes. We also find that most malaria-protective immune factors display phenotypes for either human or rodent malaria, with broad specificity a rarity. PMID:26633695

  16. a Genetic Algorithm Based on Sexual Selection for the Multidimensional 0/1 Knapsack Problems

    NASA Astrophysics Data System (ADS)

    Varnamkhasti, Mohammad Jalali; Lee, Lai Soon

    In this study, a new technique is presented for choosing mate chromosomes during sexual selection in a genetic algorithm. The population is divided into groups of males and females. During the sexual selection, the female chromosome is selected by the tournament selection while the male chromosome is selected based on the hamming distance from the selected female chromosome, fitness value or active genes. Computational experiments are conducted on the proposed technique and the results are compared with some selection mechanisms commonly used for solving multidimensional 0/1 knapsack problems published in the literature.

  17. Genetic signatures of natural selection in a model invasive ascidian

    NASA Astrophysics Data System (ADS)

    Lin, Yaping; Chen, Yiyong; Yi, Changho; Fong, Jonathan J.; Kim, Won; Rius, Marc; Zhan, Aibin

    2017-03-01

    Invasive species represent promising models to study species’ responses to rapidly changing environments. Although local adaptation frequently occurs during contemporary range expansion, the associated genetic signatures at both population and genomic levels remain largely unknown. Here, we use genome-wide gene-associated microsatellites to investigate genetic signatures of natural selection in a model invasive ascidian, Ciona robusta. Population genetic analyses of 150 individuals sampled in Korea, New Zealand, South Africa and Spain showed significant genetic differentiation among populations. Based on outlier tests, we found high incidence of signatures of directional selection at 19 loci. Hitchhiking mapping analyses identified 12 directional selective sweep regions, and all selective sweep windows on chromosomes were narrow (~8.9 kb). Further analyses indentified 132 candidate genes under selection. When we compared our genetic data and six crucial environmental variables, 16 putatively selected loci showed significant correlation with these environmental variables. This suggests that the local environmental conditions have left significant signatures of selection at both population and genomic levels. Finally, we identified “plastic” genomic regions and genes that are promising regions to investigate evolutionary responses to rapid environmental change in C. robusta.

  18. Construction and applications of exon-trapping gene-targeting vectors with a novel strategy for negative selection.

    PubMed

    Saito, Shinta; Ura, Kiyoe; Kodama, Miho; Adachi, Noritaka

    2015-06-30

    Targeted gene modification by homologous recombination provides a powerful tool for studying gene function in cells and animals. In higher eukaryotes, non-homologous integration of targeting vectors occurs several orders of magnitude more frequently than does targeted integration, making the gene-targeting technology highly inefficient. For this reason, negative-selection strategies have been employed to reduce the number of drug-resistant clones associated with non-homologous vector integration, particularly when artificial nucleases to introduce a DNA break at the target site are unavailable or undesirable. As such, an exon-trap strategy using a promoterless drug-resistance marker gene provides an effective way to counterselect non-homologous integrants. However, constructing exon-trapping targeting vectors has been a time-consuming and complicated process. By virtue of highly efficient att-mediated recombination, we successfully developed a simple and rapid method to construct plasmid-based vectors that allow for exon-trapping gene targeting. These exon-trap vectors were useful in obtaining correctly targeted clones in mouse embryonic stem cells and human HT1080 cells. Most importantly, with the use of a conditionally cytotoxic gene, we further developed a novel strategy for negative selection, thereby enhancing the efficiency of counterselection for non-homologous integration of exon-trap vectors. Our methods will greatly facilitate exon-trapping gene-targeting technologies in mammalian cells, particularly when combined with the novel negative selection strategy.

  19. Genetic variation predicting cisplatin cytotoxicity associated with overall survival in lung cancer patients receiving platinum-based chemotherapy †, ‡

    PubMed Central

    Tan, Xiang-Lin; Moyer, Ann M.; Fridley, Brooke L.; Schaid, Daniel J.; Niu, Nifang; Batzler, Anthony J.; Jenkins, Gregory D.; Abo, Ryan P.; Li, Liang; Cunningham, Julie M.; Sun, Zhifu; Yang, Ping; Wang, Liewei

    2011-01-01

    Purpose Inherited variability in the prognosis of lung cancer patients treated with platinum-based chemotherapy has been widely investigated. However, the overall contribution of genetic variation to platinum response is not well established. To identify novel candidate SNPs/genes, we performed a genome-wide association study (GWAS) for cisplatin cytotoxicity using lymphoblastoid cell lines (LCLs), followed by an association study of selected SNPs from the GWAS with overall survival (OS) in lung cancer patients. Experimental Design GWAS for cisplatin were performed with 283 ethnically diverse LCLs. 168 top SNPs were genotyped in 222 small cell and 961 non-small cell lung cancer (SCLC, NSCLC) patients treated with platinum-based therapy. Association of the SNPs with OS was determined using the Cox regression model. Selected candidate genes were functionally validated by siRNA knockdown in human lung cancer cells. Results Among 157 successfully genotyped SNPs, 9 and 10 SNPs were top SNPs associated with OS for patients with NSCLC and SCLC, respectively, although they were not significant after adjusting for multiple testing. Fifteen genes, including 7 located within 200 kb up or downstream of the four top SNPs and 8 genes for which expression was correlated with three SNPs in LCLs were selected for siRNA screening. Knockdown of DAPK3 and METTL6, for which expression levels were correlated with the rs11169748 and rs2440915 SNPs, significantly decreased cisplatin sensitivity in lung cancer cells. Conclusions This series of clinical and complementary laboratory-based functional studies identified several candidate genes/SNPs that might help predict treatment outcomes for platinum-based therapy of lung cancer. PMID:21775533

  20. Genetic quality and sexual selection: an integrated framework for good genes and compatible genes.

    PubMed

    Neff, Bryan D; Pitcher, Trevor E

    2005-01-01

    Why are females so choosy when it comes to mating? This question has puzzled and marveled evolutionary and behavioral ecologists for decades. In mating systems in which males provide direct benefits to the female or her offspring, such as food or shelter, the answer seems straightforward--females should prefer to mate with males that are able to provide more resources. The answer is less clear in other mating systems in which males provide no resources (other than sperm) to females. Theoretical models that account for the evolution of mate choice in such nonresource-based mating systems require that females obtain a genetic benefit through increased offspring fitness from their choice. Empirical studies of nonresource-based mating systems that are characterized by strong female choice for males with elaborate sexual traits (like the large tail of peacocks) suggest that additive genetic benefits can explain only a small percentage of the variation in fitness. Other research on genetic benefits has examined nonadditive effects as another source of genetic variation in fitness and a potential benefit to female mate choice. In this paper, we review the sexual selection literature on genetic quality to address five objectives. First, we attempt to provide an integrated framework for discussing genetic quality. We propose that the term 'good gene' be used exclusively to refer to additive genetic variation in fitness, 'compatible gene' be used to refer to nonadditive genetic variation in fitness, and 'genetic quality' be defined as the sum of the two effects. Second, we review empirical approaches used to calculate the effect size of genetic quality and discuss these approaches in the context of measuring benefits from good genes, compatible genes and both types of genes. Third, we discuss biological mechanisms for acquiring and promoting offspring genetic quality and categorize these into three stages during breeding: (i) precopulatory (mate choice); (ii) postcopulatory, prefertilization (sperm utilization); and (iii) postcopulatory, postfertilization (differential investment). Fourth, we present a verbal model of the effect of good genes sexual selection and compatible genes sexual selection on population genetic variation in fitness, and discuss the potential trade-offs that might exist between mate choice for good genes and mate choice for compatible genes. Fifth, we discuss some future directions for research on genetic quality and sexual selection.

  1. Gene Expression (mRNA) Markers for Differentiating between Malignant and Benign Follicular Thyroid Tumours

    PubMed Central

    Wojtas, Bartosz; Pfeifer, Aleksandra; Oczko-Wojciechowska, Malgorzata; Krajewska, Jolanta; Czarniecka, Agnieszka; Kukulska, Aleksandra; Eszlinger, Markus; Musholt, Thomas; Stokowy, Tomasz; Swierniak, Michal; Stobiecka, Ewa; Chmielik, Ewa; Rusinek, Dagmara; Tyszkiewicz, Tomasz; Halczok, Monika; Hauptmann, Steffen; Lange, Dariusz; Jarzab, Michal; Paschke, Ralf; Jarzab, Barbara

    2017-01-01

    Distinguishing between follicular thyroid cancer (FTC) and follicular thyroid adenoma (FTA) constitutes a long-standing diagnostic problem resulting in equivocal histopathological diagnoses. There is therefore a need for additional molecular markers. To identify molecular differences between FTC and FTA, we analyzed the gene expression microarray data of 52 follicular neoplasms. We also performed a meta-analysis involving 14 studies employing high throughput methods (365 follicular neoplasms analyzed). Based on these two analyses, we selected 18 genes differentially expressed between FTA and FTC. We validated them by quantitative real-time polymerase chain reaction (qRT-PCR) in an independent set of 71 follicular neoplasms from formaldehyde-fixed paraffin embedded (FFPE) tissue material. We confirmed differential expression for 7 genes (CPQ, PLVAP, TFF3, ACVRL1, ZFYVE21, FAM189A2, and CLEC3B). Finally, we created a classifier that distinguished between FTC and FTA with an accuracy of 78%, sensitivity of 76%, and specificity of 80%, based on the expression of 4 genes (CPQ, PLVAP, TFF3, ACVRL1). In our study, we have demonstrated that meta-analysis is a valuable method for selecting possible molecular markers. Based on our results, we conclude that there might exist a plausible limit of gene classifier accuracy of approximately 80%, when follicular tumors are discriminated based on formalin-fixed postoperative material. PMID:28574441

  2. Gene Expression (mRNA) Markers for Differentiating between Malignant and Benign Follicular Thyroid Tumours.

    PubMed

    Wojtas, Bartosz; Pfeifer, Aleksandra; Oczko-Wojciechowska, Malgorzata; Krajewska, Jolanta; Czarniecka, Agnieszka; Kukulska, Aleksandra; Eszlinger, Markus; Musholt, Thomas; Stokowy, Tomasz; Swierniak, Michal; Stobiecka, Ewa; Chmielik, Ewa; Rusinek, Dagmara; Tyszkiewicz, Tomasz; Halczok, Monika; Hauptmann, Steffen; Lange, Dariusz; Jarzab, Michal; Paschke, Ralf; Jarzab, Barbara

    2017-06-02

    Distinguishing between follicular thyroid cancer (FTC) and follicular thyroid adenoma (FTA) constitutes a long-standing diagnostic problem resulting in equivocal histopathological diagnoses. There is therefore a need for additional molecular markers. To identify molecular differences between FTC and FTA, we analyzed the gene expression microarray data of 52 follicular neoplasms. We also performed a meta-analysis involving 14 studies employing high throughput methods (365 follicular neoplasms analyzed). Based on these two analyses, we selected 18 genes differentially expressed between FTA and FTC. We validated them by quantitative real-time polymerase chain reaction (qRT-PCR) in an independent set of 71 follicular neoplasms from formaldehyde-fixed paraffin embedded (FFPE) tissue material. We confirmed differential expression for 7 genes ( CPQ , PLVAP , TFF3 , ACVRL1 , ZFYVE21 , FAM189A2 , and CLEC3B ). Finally, we created a classifier that distinguished between FTC and FTA with an accuracy of 78%, sensitivity of 76%, and specificity of 80%, based on the expression of 4 genes ( CPQ , PLVAP , TFF3 , ACVRL1 ). In our study, we have demonstrated that meta-analysis is a valuable method for selecting possible molecular markers. Based on our results, we conclude that there might exist a plausible limit of gene classifier accuracy of approximately 80%, when follicular tumors are discriminated based on formalin-fixed postoperative material.

  3. Genome analysis and identification of gelatinase encoded gene in Enterobacter aerogenes

    NASA Astrophysics Data System (ADS)

    Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat

    2016-11-01

    In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.

  4. A framework linkage map of perennial ryegrass based on SSR markers

    Treesearch

    G.P. Gill; P.L. Wilcox; D.J. Whittaker; R.A. Winz; P. Bickerstaff; Craig E. Echt; J. Kent; M.O. Humphreys; K.M. Elborough; R.C. Gardner

    2006-01-01

    A moderate-density linkage map for Lolium perenne L. has been constructed based on 376 simple sequence repeat (SSR) markers. Approximately one third ( 124) of the SSR markers were developed from GeneThresher libraries that preferentially select genomic DNA clones from the gene-rich unmethylated portion of the genome. The remaining SSR marker loci...

  5. Paper-based ion concentration polarization device for selective preconcentration of muc1 and lamp-2 genes

    NASA Astrophysics Data System (ADS)

    Son, Seok Young; Lee, Hyomin; Kim, Sung Jae

    2017-12-01

    Recently, novel biomolecules separation and detection methods based on ion concentration polarization (ICP) phenomena have been extensively researched due to its high amplification ratio and high-speed accumulation. Despite of these bright advances, the fabrication of conventional ICP devices still have complicated and times-consuming tasks. As an alternative platform, a paper have been recently used for the identical ICP operations. In this work, we demonstrated the selective preconcentration of a muc1 gene fragment as human breast cancer marker and a lamp-2 gene fragment as the cause of Danon disease in paper-based ICP devices. As a result, these two DNA fragments were successfully concentrated up to 60 fold at different location in a single paper-channel. The device would be a promising platform for point-of-care device due to an economic fabrication, the easy extraction of concentrated sample and an easy disposability.

  6. Targeted capture and resequencing of 1040 genes reveal environmentally driven functional variation in grey wolves.

    PubMed

    Schweizer, Rena M; Robinson, Jacqueline; Harrigan, Ryan; Silva, Pedro; Galverni, Marco; Musiani, Marco; Green, Richard E; Novembre, John; Wayne, Robert K

    2016-01-01

    In an era of ever-increasing amounts of whole-genome sequence data for individuals and populations, the utility of traditional single nucleotide polymorphisms (SNPs) array-based genome scans is uncertain. We previously performed a SNP array-based genome scan to identify candidate genes under selection in six distinct grey wolf (Canis lupus) ecotypes. Using this information, we designed a targeted capture array for 1040 genes, including all exons and flanking regions, as well as 5000 1-kb nongenic neutral regions, and resequenced these regions in 107 wolves. Selection tests revealed striking patterns of variation within candidate genes relative to noncandidate regions and identified potentially functional variants related to local adaptation. We found 27% and 47% of candidate genes from the previous SNP array study had functional changes that were outliers in sweed and bayenv analyses, respectively. This result verifies the use of genomewide SNP surveys to tag genes that contain functional variants between populations. We highlight nonsynonymous variants in APOB, LIPG and USH2A that occur in functional domains of these proteins, and that demonstrate high correlation with precipitation seasonality and vegetation. We find Arctic and High Arctic wolf ecotypes have higher numbers of genes under selection, which highlight their conservation value and heightened threat due to climate change. This study demonstrates that combining genomewide genotyping arrays with large-scale resequencing and environmental data provides a powerful approach to discern candidate functional variants in natural populations. © 2015 John Wiley & Sons Ltd.

  7. Phylogenetic Analysis of Shewanella Strains by DNA Relatedness Derived from Whole Genome Microarray DNA-DNA Hybridization and Comparison with Other Methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Liyou; Yi, T. Y.; Van Nostrand, Joy

    Phylogenetic analyses were done for the Shewanella strains isolated from Baltic Sea (38 strains), US DOE Hanford Uranium bioremediation site [Hanford Reach of the Columbia River (HRCR), 11 strains], Pacific Ocean and Hawaiian sediments (8 strains), and strains from other resources (16 strains) with three out group strains, Rhodopseudomonas palustris, Clostridium cellulolyticum, and Thermoanaerobacter ethanolicus X514, using DNA relatedness derived from WCGA-based DNA-DNA hybridizations, sequence similarities of 16S rRNA gene and gyrB gene, and sequence similarities of 6 loci of Shewanella genome selected from a shared gene list of the Shewanella strains with whole genome sequenced based on the averagemore » nucleotide identity of them (ANI). The phylogenetic trees based on 16S rRNA and gyrB gene sequences, and DNA relatedness derived from WCGA hybridizations of the tested Shewanella strains share exactly the same sub-clusters with very few exceptions, in which the strains were basically grouped by species. However, the phylogenetic analysis based on DNA relatedness derived from WCGA hybridizations dramatically increased the differentiation resolution at species and strains level within Shewanella genus. When the tree based on DNA relatedness derived from WCGA hybridizations was compared to the tree based on the combined sequences of the selected functional genes (6 loci), we found that the resolutions of both methods are similar, but the clustering of the tree based on DNA relatedness derived from WMGA hybridizations was clearer. These results indicate that WCGA-based DNA-DNA hybridization is an idea alternative of conventional DNA-DNA hybridization methods and it is superior to the phylogenetics methods based on sequence similarities of single genes. Detailed analysis is being performed for the re-classification of the strains examined.« less

  8. Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification

    PubMed Central

    Huang, Lingkang; Zhang, Hao Helen; Zeng, Zhao-Bang; Bushel, Pierre R.

    2013-01-01

    Background Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. Availability: The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html. PMID:23966761

  9. Genome-wide identification and evolution of the PIN-FORMED (PIN) gene family in Glycine max.

    PubMed

    Liu, Yuan; Wei, Haichao

    2017-07-01

    Soybean (Glycine max) is one of the most important crop plants. Wild and cultivated soybean varieties have significant differences worth further investigation, such as plant morphology, seed size, and seed coat development; these characters may be related to auxin biology. The PIN gene family encodes essential transport proteins in cell-to-cell auxin transport, but little research on soybean PIN genes (GmPIN genes) has been done, especially with respect to the evolution and differences between wild and cultivated soybean. In this study, we retrieved 23 GmPIN genes from the latest updated G. max genome database; six GmPIN protein sequences were changed compared with the previous database. Based on the Plant Genome Duplication Database, 18 GmPIN genes have been involved in segment duplication. Three pairs of GmPIN genes arose after the second soybean genome duplication, and six occurred after the first genome duplication. The duplicated GmPIN genes retained similar expression patterns. All the duplicated GmPIN genes experienced purifying selection (K a /K s < 1) to prevent accumulation of non-synonymous mutations and thus remained more similar. In addition, we also focused on the artificial selection of the soybean PIN genes. Five artificially selected GmPIN genes were identified by comparing the genome sequence of 17 wild and 14 cultivated soybean varieties. Our research provides useful and comprehensive basic information for understanding GmPIN genes.

  10. A general framework for optimization of probes for gene expression microarray and its application to the fungus Podospora anserina.

    PubMed

    Bidard, Frédérique; Imbeaud, Sandrine; Reymond, Nancie; Lespinet, Olivier; Silar, Philippe; Clavé, Corinne; Delacroix, Hervé; Berteaux-Lecellier, Véronique; Debuchy, Robert

    2010-06-18

    The development of new microarray technologies makes custom long oligonucleotide arrays affordable for many experimental applications, notably gene expression analyses. Reliable results depend on probe design quality and selection. Probe design strategy should cope with the limited accuracy of de novo gene prediction programs, and annotation up-dating. We present a novel in silico procedure which addresses these issues and includes experimental screening, as an empirical approach is the best strategy to identify optimal probes in the in silico outcome. We used four criteria for in silico probe selection: cross-hybridization, hairpin stability, probe location relative to coding sequence end and intron position. This latter criterion is critical when exon-intron gene structure predictions for intron-rich genes are inaccurate. For each coding sequence (CDS), we selected a sub-set of four probes. These probes were included in a test microarray, which was used to evaluate the hybridization behavior of each probe. The best probe for each CDS was selected according to three experimental criteria: signal-to-noise ratio, signal reproducibility, and representative signal intensities. This procedure was applied for the development of a gene expression Agilent platform for the filamentous fungus Podospora anserina and the selection of a single 60-mer probe for each of the 10,556 P. anserina CDS. A reliable gene expression microarray version based on the Agilent 44K platform was developed with four spot replicates of each probe to increase statistical significance of analysis.

  11. Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.

    PubMed

    Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar

    2016-04-01

    Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Identification of KIF3A as a Novel Candidate Gene for Childhood Asthma Using RNA Expression and Population Allelic Frequencies Differences

    PubMed Central

    Butsch Kovacic, Melinda; Biagini Myers, Jocelyn M.; Wang, Ning; Martin, Lisa J.; Lindsey, Mark; Ericksen, Mark B.; He, Hua; Patterson, Tia L.; Baye, Tesfaye M.; Torgerson, Dara; Roth, Lindsey A.; Gupta, Jayanta; Sivaprasad, Umasundari; Gibson, Aaron M.; Tsoras, Anna M.; Hu, Donglei; Eng, Celeste; Chapela, Rocío; Rodríguez-Santana, José R.; Rodríguez-Cintrón, William; Avila, Pedro C.; Beckman, Kenneth; Seibold, Max A.; Gignoux, Chris; Musaad, Salma M.; Chen, Weiguo; Burchard, Esteban González; Khurana Hershey, Gurjit K.

    2011-01-01

    Background Asthma is a chronic inflammatory disease with a strong genetic predisposition. A major challenge for candidate gene association studies in asthma is the selection of biologically relevant genes. Methodology/Principal Findings Using epithelial RNA expression arrays, HapMap allele frequency variation, and the literature, we identified six possible candidate susceptibility genes for childhood asthma including ADCY2, DNAH5, KIF3A, PDE4B, PLAU, SPRR2B. To evaluate these genes, we compared the genotypes of 194 predominantly tagging SNPs in 790 asthmatic, allergic and non-allergic children. We found that SNPs in all six genes were nominally associated with asthma (p<0.05) in our discovery cohort and in three independent cohorts at either the SNP or gene level (p<0.05). Further, we determined that our selection approach was superior to random selection of genes either differentially expressed in asthmatics compared to controls (p = 0.0049) or selected based on the literature alone (p = 0.0049), substantiating the validity of our gene selection approach. Importantly, we observed that 7 of 9 SNPs in the KIF3A gene more than doubled the odds of asthma (OR = 2.3, p<0.0001) and increased the odds of allergic disease (OR = 1.8, p<0.008). Our data indicate that KIF3A rs7737031 (T-allele) has an asthma population attributable risk of 18.5%. The association between KIF3A rs7737031 and asthma was validated in 3 independent populations, further substantiating the validity of our gene selection approach. Conclusions/Significance Our study demonstrates that KIF3A, a member of the kinesin superfamily of microtubule associated motors that are important in the transport of protein complexes within cilia, is a novel candidate gene for childhood asthma. Polymorphisms in KIF3A may in part be responsible for poor mucus and/or allergen clearance from the airways. Furthermore, our study provides a promising framework for the identification and evaluation of novel candidate susceptibility genes. PMID:21912604

  13. Genome-Wide Variation Patterns Uncover the Origin and Selection in Cultivated Ginseng (Panax ginseng Meyer).

    PubMed

    Li, Ming-Rui; Shi, Feng-Xue; Li, Ya-Ling; Jiang, Peng; Jiao, Lili; Liu, Bao; Li, Lin-Feng

    2017-09-01

    Chinese ginseng (Panax ginseng Meyer) is a medicinally important herb and plays crucial roles in traditional Chinese medicine. Pharmacological analyses identified diverse bioactive components from Chinese ginseng. However, basic biological attributes including domestication and selection of the ginseng plant remain under-investigated. Here, we presented a genome-wide view of the domestication and selection of cultivated ginseng based on the whole genome data. A total of 8,660 protein-coding genes were selected for genome-wide scanning of the 30 wild and cultivated ginseng accessions. In complement, the 45s rDNA, chloroplast and mitochondrial genomes were included to perform phylogenetic and population genetic analyses. The observed spatial genetic structure between northern cultivated ginseng (NCG) and southern cultivated ginseng (SCG) accessions suggested multiple independent origins of cultivated ginseng. Genome-wide scanning further demonstrated that NCG and SCG have undergone distinct selection pressures during the domestication process, with more genes identified in the NCG (97 genes) than in the SCG group (5 genes). Functional analyses revealed that these genes are involved in diverse pathways, including DNA methylation, lignin biosynthesis, and cell differentiation. These findings suggested that the SCG and NCG groups have distinct demographic histories. Candidate genes identified are useful for future molecular breeding of cultivated ginseng. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  14. The cld mutation: narrowing the critical chromosomal region and selecting candidate genes.

    PubMed

    Péterfy, Miklós; Mao, Hui Z; Doolittle, Mark H

    2006-10-01

    Combined lipase deficiency (cld) is a recessive, lethal mutation specific to the tw73 haplotype on mouse Chromosome 17. While the cld mutation results in lipase proteins that are inactive, aggregated, and retained in the endoplasmic reticulum (ER), it maps separately from the lipase structural genes. We have narrowed the gene critical region by about 50% using the tw18 haplotype for deletion mapping and a recombinant chromosome used originally to map cld with respect to the phenotypic marker tf. The region now extends from 22 to 25.6 Mbp on the wild-type chromosome, currently containing 149 genes and 50 expressed sequence tags (ESTs). To identify the affected gene, we have selected candidates based on their known role in associated biological processes, cellular components, and molecular functions that best fit with the predicted function of the cld gene. A secondary approach was based on differences in mRNA levels between mutant (cld/cld) and unaffected (+/cld) cells. Using both approaches, we have identified seven functional candidates with an ER localization and/or an involvement in protein maturation and folding that could explain the lipase deficiency, and six expression candidates that exhibit large differences in mRNA levels between mutant and unaffected cells. Significantly, two genes were found to be candidates with regard to both function and expression, thus emerging as the strongest candidates for cld. We discuss the implications of our mapping results and our selection of candidates with respect to other genes, deletions, and mutations occurring in the cld critical region.

  15. ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data

    PubMed Central

    McKinney, Brett A.; White, Bill C.; Grill, Diane E.; Li, Peter W.; Kennedy, Richard B.; Poland, Gregory A.; Oberg, Ann L.

    2013-01-01

    Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php. PMID:24339943

  16. ROKU: a novel method for identification of tissue-specific genes.

    PubMed

    Kadota, Koji; Ye, Jiazhen; Nakai, Yuji; Terada, Tohru; Shimizu, Kentaro

    2006-06-12

    One of the important goals of microarray research is the identification of genes whose expression is considerably higher or lower in some tissues than in others. We would like to have ways of identifying such tissue-specific genes. We describe a method, ROKU, which selects tissue-specific patterns from gene expression data for many tissues and thousands of genes. ROKU ranks genes according to their overall tissue specificity using Shannon entropy and detects tissues specific to each gene if any exist using an outlier detection method. We evaluated the capacity for the detection of various specific expression patterns using synthetic and real data. We observed that ROKU was superior to a conventional entropy-based method in its ability to rank genes according to overall tissue specificity and to detect genes whose expression pattern are specific only to objective tissues. ROKU is useful for the detection of various tissue-specific expression patterns. The framework is also directly applicable to the selection of diagnostic markers for molecular classification of multiple classes.

  17. Annotating ebony on the fly.

    PubMed

    Kohn, Michael H; Wittkopp, Patricia J

    2007-07-01

    The distinctive black phenotype of ebony mutants has made it one of the most widely used phenotypic markers in Drosophila genetics. Without doubt, ebony showcases the fruits of the fly community's labours to annotate gene function. As of this writing, FlyBase lists 142 references, 1277 fly stocks, 15 phenotypes and 44 alleles. In addition to its namesake pigmentation phenotype, ebony mutants affect other traits, including phototaxis and courtship. With phenotypic consequences of ebony variants readily apparent in the laboratory, does natural selection also see them in the wild? In this issue of Molecular Ecology, Pool & Aquadro investigate this question and found signs of natural selection on the ebony gene that appear to have resulted from selection for darker pigmentation at higher elevations in sub-Saharan populations of Drosophila melanogaster. Such findings from population genomic analysis of wild-derived strains should be included in gene annotations to provide a more holistic view of a gene's function. The evolutionary annotation of ebony added by Pool & Aquadro substantiates that pigmentation can be adaptive and implicates elevation as an important selective factor. This is important progress because the selective factors seem to differ between populations and species. In addition, the study raises issues to consider when extrapolating from selection at the molecular level to selection at the phenotypic level.

  18. Selection of genetically modified hematopoietic cells in vitro and in vivo using alkylating agent lysomustine.

    PubMed

    Rozov, F N; Grinenko, T S; Levit, G L; Krasnov, V P; Belyavsky, A V

    2010-09-15

    Efficient gene transfer into hematopoietic stem cells is vital for the success of gene therapy of hematopoietic and immune system disorders. An in vivo selection system based on a mutant form of the O(6)-methylguanine-DNA-methyltransferase gene (MGMTm) is considered one of the more promising strategies for expansion of hematopoietic cells transduced with viral vectors. Here we demonstrate that MGMTm-expressing cells can be efficiently selected using lysomustine, a nitrosourea derivative of lysine. K562 and murine bone marrow cells expressing MGMTm are protected from the cytotoxic action of lysomustine in vitro. We also show in a murine model that MGMTm-transduced hematopoietic cells can be expanded in vivo on transplantation into sublethally irradiated recipients followed by lysomustine treatment. These results indicate that lysomustine can be used as a potent novel chemoselection drug applicable for gene therapy of hematopoietic and immune system disorders. 2010 Elsevier Inc. All rights reserved.

  19. Island-Model Genomic Selection for Long-Term Genetic Improvement of Autogamous Crops.

    PubMed

    Yabe, Shiori; Yamasaki, Masanori; Ebana, Kaworu; Hayashi, Takeshi; Iwata, Hiroyoshi

    2016-01-01

    Acceleration of genetic improvement of autogamous crops such as wheat and rice is necessary to increase cereal production in response to the global food crisis. Population and pedigree methods of breeding, which are based on inbred line selection, are used commonly in the genetic improvement of autogamous crops. These methods, however, produce a few novel combinations of genes in a breeding population. Recurrent selection promotes recombination among genes and produces novel combinations of genes in a breeding population, but it requires inaccurate single-plant evaluation for selection. Genomic selection (GS), which can predict genetic potential of individuals based on their marker genotype, might have high reliability of single-plant evaluation and might be effective in recurrent selection. To evaluate the efficiency of recurrent selection with GS, we conducted simulations using real marker genotype data of rice cultivars. Additionally, we introduced the concept of an "island model" inspired by evolutionary algorithms that might be useful to maintain genetic variation through the breeding process. We conducted GS simulations using real marker genotype data of rice cultivars to evaluate the efficiency of recurrent selection and the island model in an autogamous species. Results demonstrated the importance of producing novel combinations of genes through recurrent selection. An initial population derived from admixture of multiple bi-parental crosses showed larger genetic gains than a population derived from a single bi-parental cross in whole cycles, suggesting the importance of genetic variation in an initial population. The island-model GS better maintained genetic improvement in later generations than the other GS methods, suggesting that the island-model GS can utilize genetic variation in breeding and can retain alleles with small effects in the breeding population. The island-model GS will become a new breeding method that enhances the potential of genomic selection in autogamous crops, especially bringing long-term improvement.

  20. Island-Model Genomic Selection for Long-Term Genetic Improvement of Autogamous Crops

    PubMed Central

    Yabe, Shiori; Yamasaki, Masanori; Ebana, Kaworu; Hayashi, Takeshi; Iwata, Hiroyoshi

    2016-01-01

    Acceleration of genetic improvement of autogamous crops such as wheat and rice is necessary to increase cereal production in response to the global food crisis. Population and pedigree methods of breeding, which are based on inbred line selection, are used commonly in the genetic improvement of autogamous crops. These methods, however, produce a few novel combinations of genes in a breeding population. Recurrent selection promotes recombination among genes and produces novel combinations of genes in a breeding population, but it requires inaccurate single-plant evaluation for selection. Genomic selection (GS), which can predict genetic potential of individuals based on their marker genotype, might have high reliability of single-plant evaluation and might be effective in recurrent selection. To evaluate the efficiency of recurrent selection with GS, we conducted simulations using real marker genotype data of rice cultivars. Additionally, we introduced the concept of an “island model” inspired by evolutionary algorithms that might be useful to maintain genetic variation through the breeding process. We conducted GS simulations using real marker genotype data of rice cultivars to evaluate the efficiency of recurrent selection and the island model in an autogamous species. Results demonstrated the importance of producing novel combinations of genes through recurrent selection. An initial population derived from admixture of multiple bi-parental crosses showed larger genetic gains than a population derived from a single bi-parental cross in whole cycles, suggesting the importance of genetic variation in an initial population. The island-model GS better maintained genetic improvement in later generations than the other GS methods, suggesting that the island-model GS can utilize genetic variation in breeding and can retain alleles with small effects in the breeding population. The island-model GS will become a new breeding method that enhances the potential of genomic selection in autogamous crops, especially bringing long-term improvement. PMID:27115872

  1. The complexity of selection at the major primate β-defensin locus

    PubMed Central

    Semple, Colin AM; Maxwell, Alison; Gautier, Philippe; Kilanowski, Fiona M; Eastwood, Hayden; Barran, Perdita E; Dorin, Julia R

    2005-01-01

    Background We have examined the evolution of the genes at the major human β-defensin locus and the orthologous loci in a range of other primates and mouse. For the first time these data allow us to examine selective episodes in the more recent evolutionary history of this locus as well as the ancient past. We have used a combination of maximum likelihood based tests and a maximum parsimony based sliding window approach to give a detailed view of the varying modes of selection operating at this locus. Results We provide evidence for strong positive selection soon after the duplication of these genes within an ancestral mammalian genome. Consequently variable selective pressures have acted on β-defensin genes in different evolutionary lineages, with episodes both of negative, and more rarely positive selection, during the divergence of primates. Positive selection appears to have been more common in the rodent lineage, accompanying the birth of novel, rodent-specific β-defensin genes. These observations allow a fuller understanding of the evolution of mammalian innate immunity. In both the rodent and primate lineages, sites in the second exon have been subject to positive selection and by implication are important in functional diversity. A small number of sites in the mature human peptides were found to have undergone repeated episodes of selection in different primate lineages. Particular sites were consistently implicated by multiple methods at positions throughout the mature peptides. These sites are clustered at positions predicted to be important for the specificity of the antimicrobial or chemoattractant properties of β-defensins. Surprisingly, sites within the prepropeptide region were also implicated as being subject to significant positive selection, suggesting previously unappreciated functional significance for this region. Conclusions Identification of these putatively functional sites has important implications for our understanding of β-defensin function and for novel antibiotic design. PMID:15904491

  2. Evolution of the viral hemorrhagic septicemia virus: divergence, selection and origin.

    PubMed

    He, Mei; Yan, Xue-Chun; Liang, Yang; Sun, Xiao-Wen; Teng, Chun-Bo

    2014-08-01

    Viral hemorrhagic septicemia virus (VHSV) is an economically significant rhabdovirus that affects an increasing number of freshwater and marine fish species. Extensive studies have been conducted on the molecular epizootiology, genetic diversity, and phylogeny of VHSV. However, there are discrepancies between the reported estimates of the nucleotide substitution rate for the G gene and the divergence times for the genotypes. Herein, Bayesian coalescent analyses were conducted to the time-stamped entire coding sequences of the six VHSV genes. Rate estimates based on the G gene indicated that the marine genotypes/subtypes might not all evolve slower than their major European freshwater counterpart. Age calculations on the six genes revealed that the first bifurcation event of the analyzed isolates might have taken place within the last 300 years, which was much younger than previously thought. Selection analyses suggested that two codons of the G gene might be positively selected. Surveys of codon usage bias showed that the P, M and NV genes exhibited genotype-specific variations. Furthermore, we proposed that VHSV originated from the Pacific Northwest of North America. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. Identification of hub subnetwork based on topological features of genes in breast cancer

    PubMed Central

    ZHUANG, DA-YONG; JIANG, LI; HE, QING-QING; ZHOU, PENG; YUE, TAO

    2015-01-01

    The aim of this study was to provide functional insight into the identification of hub subnetworks by aggregating the behavior of genes connected in a protein-protein interaction (PPI) network. We applied a protein network-based approach to identify subnetworks which may provide new insight into the functions of pathways involved in breast cancer rather than individual genes. Five groups of breast cancer data were downloaded and analyzed from the Gene Expression Omnibus (GEO) database of high-throughput gene expression data to identify gene signatures using the genome-wide global significance (GWGS) method. A PPI network was constructed using Cytoscape and clusters that focused on highly connected nodes were obtained using the molecular complex detection (MCODE) clustering algorithm. Pathway analysis was performed to assess the functional relevance of selected gene signatures based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Topological centrality was used to characterize the biological importance of gene signatures, pathways and clusters. The results revealed that, cluster1, as well as the cell cycle and oocyte meiosis pathways were significant subnetworks in the analysis of degree and other centralities, in which hub nodes mostly distributed. The most important hub nodes, with top ranked centrality, were also similar with the common genes from the above three subnetwork intersections, which was viewed as a hub subnetwork with more reproducible than individual critical genes selected without network information. This hub subnetwork attributed to the same biological process which was essential in the function of cell growth and death. This increased the accuracy of identifying gene interactions that took place within the same functional process and was potentially useful for the development of biomarkers and networks for breast cancer. PMID:25573623

  4. Distinct Trajectories of Massive Recent Gene Gains and Losses in Populations of a Microbial Eukaryotic Pathogen

    PubMed Central

    Hartmann, Fanny E.; Croll, Daniel

    2017-01-01

    Abstract Differences in gene content are a significant source of variability within species and have an impact on phenotypic traits. However, little is known about the mechanisms responsible for the most recent gene gains and losses. We screened the genomes of 123 worldwide isolates of the major pathogen of wheat Zymoseptoria tritici for robust evidence of gene copy number variation. Based on orthology relationships in three closely related fungi, we identified 599 gene gains and 1,024 gene losses that have not yet reached fixation within the focal species. Our analyses of gene gains and losses segregating in populations showed that gene copy number variation arose preferentially in subtelomeres and in proximity to transposable elements. Recently lost genes were enriched in virulence factors and secondary metabolite gene clusters. In contrast, recently gained genes encoded mostly secreted protein lacking a conserved domain. We analyzed the frequency spectrum at loci segregating a gene presence–absence polymorphism in four worldwide populations. Recent gene losses showed a significant excess in low-frequency variants compared with genome-wide single nucleotide polymorphism, which is indicative of strong negative selection against gene losses. Recent gene gains were either under weak negative selection or neutral. We found evidence for strong divergent selection among populations at individual loci segregating a gene presence–absence polymorphism. Hence, gene gains and losses likely contributed to local adaptation. Our study shows that microbial eukaryotes harbor extensive copy number variation within populations and that functional differences among recently gained and lost genes led to distinct evolutionary trajectories. PMID:28981698

  5. Validation and Implementation of Marker-Assisted Selection (MAS) for PVY Resistance (Ryadg gene) in a Tetraploid Potato Breeding Program

    USDA-ARS?s Scientific Manuscript database

    The gene Ryadg from S. tuberosum ssp. andigena provides extreme resistance to PVY. This gene has been mapped to chromosome XI and linked PCR-based DNA markers have been identified. Advanced tetraploid russeted potato clones developed by the U.S. Pacific Northwest Potato Breeding Program with Ryadg P...

  6. Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars.

    PubMed

    Cavanagh, Colin R; Chao, Shiaoman; Wang, Shichen; Huang, Bevan Emma; Stephen, Stuart; Kiani, Seifollah; Forrest, Kerrie; Saintenac, Cyrille; Brown-Guedira, Gina L; Akhunova, Alina; See, Deven; Bai, Guihua; Pumphrey, Michael; Tomar, Luxmi; Wong, Debbie; Kong, Stephan; Reynolds, Matthew; da Silva, Marta Lopez; Bockelman, Harold; Talbert, Luther; Anderson, James A; Dreisigacker, Susanne; Baenziger, Stephen; Carter, Arron; Korzun, Viktor; Morrell, Peter Laurent; Dubcovsky, Jorge; Morell, Matthew K; Sorrells, Mark E; Hayden, Matthew J; Akhunov, Eduard

    2013-05-14

    Domesticated crops experience strong human-mediated selection aimed at developing high-yielding varieties adapted to local conditions. To detect regions of the wheat genome subject to selection during improvement, we developed a high-throughput array to interrogate 9,000 gene-associated single-nucleotide polymorphisms (SNP) in a worldwide sample of 2,994 accessions of hexaploid wheat including landraces and modern cultivars. Using a SNP-based diversity map we characterized the impact of crop improvement on genomic and geographic patterns of genetic diversity. We found evidence of a small population bottleneck and extensive use of ancestral variation often traceable to founders of cultivars from diverse geographic regions. Analyzing genetic differentiation among populations and the extent of haplotype sharing, we identified allelic variants subjected to selection during improvement. Selective sweeps were found around genes involved in the regulation of flowering time and phenology. An introgression of a wild relative-derived gene conferring resistance to a fungal pathogen was detected by haplotype-based analysis. Comparing selective sweeps identified in different populations, we show that selection likely acts on distinct targets or multiple functionally equivalent alleles in different portions of the geographic range of wheat. The majority of the selected alleles were present at low frequency in local populations, suggesting either weak selection pressure or temporal variation in the targets of directional selection during breeding probably associated with changing agricultural practices or environmental conditions. The developed SNP chip and map of genetic variation provide a resource for advancing wheat breeding and supporting future population genomic and genome-wide association studies in wheat.

  7. Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars

    PubMed Central

    Cavanagh, Colin R.; Chao, Shiaoman; Wang, Shichen; Huang, Bevan Emma; Stephen, Stuart; Kiani, Seifollah; Forrest, Kerrie; Saintenac, Cyrille; Brown-Guedira, Gina L.; Akhunova, Alina; See, Deven; Bai, Guihua; Pumphrey, Michael; Tomar, Luxmi; Wong, Debbie; Kong, Stephan; Reynolds, Matthew; da Silva, Marta Lopez; Bockelman, Harold; Talbert, Luther; Anderson, James A.; Dreisigacker, Susanne; Baenziger, Stephen; Carter, Arron; Korzun, Viktor; Morrell, Peter Laurent; Dubcovsky, Jorge; Morell, Matthew K.; Sorrells, Mark E.; Hayden, Matthew J.; Akhunov, Eduard

    2013-01-01

    Domesticated crops experience strong human-mediated selection aimed at developing high-yielding varieties adapted to local conditions. To detect regions of the wheat genome subject to selection during improvement, we developed a high-throughput array to interrogate 9,000 gene-associated single-nucleotide polymorphisms (SNP) in a worldwide sample of 2,994 accessions of hexaploid wheat including landraces and modern cultivars. Using a SNP-based diversity map we characterized the impact of crop improvement on genomic and geographic patterns of genetic diversity. We found evidence of a small population bottleneck and extensive use of ancestral variation often traceable to founders of cultivars from diverse geographic regions. Analyzing genetic differentiation among populations and the extent of haplotype sharing, we identified allelic variants subjected to selection during improvement. Selective sweeps were found around genes involved in the regulation of flowering time and phenology. An introgression of a wild relative-derived gene conferring resistance to a fungal pathogen was detected by haplotype-based analysis. Comparing selective sweeps identified in different populations, we show that selection likely acts on distinct targets or multiple functionally equivalent alleles in different portions of the geographic range of wheat. The majority of the selected alleles were present at low frequency in local populations, suggesting either weak selection pressure or temporal variation in the targets of directional selection during breeding probably associated with changing agricultural practices or environmental conditions. The developed SNP chip and map of genetic variation provide a resource for advancing wheat breeding and supporting future population genomic and genome-wide association studies in wheat. PMID:23630259

  8. Neutral mutation as the source of genetic variation in life history traits.

    PubMed

    Brcić-Kostić, Krunoslav

    2005-08-01

    The mechanism underlying the maintenance of adaptive genetic variation is a long-standing question in evolutionary genetics. There are two concepts (mutation-selection balance and balancing selection) which are based on the phenotypic differences between alleles. Mutation - selection balance and balancing selection cannot properly explain the process of gene substitution, i.e. the molecular evolution of quantitative trait loci affecting fitness. I assume that such loci have non-essential functions (small effects on fitness), and that they have the potential to evolve into new functions and acquire new adaptations. Here I show that a high amount of neutral polymorphism at these loci can exist in real populations. Consistent with this, I propose a hypothesis for the maintenance of genetic variation in life history traits which can be efficient for the fixation of alleles with very small selective advantage. The hypothesis is based on neutral polymorphism at quantitative trait loci and both neutral and adaptive gene substitutions. The model of neutral - adaptive conversion (NAC) assumes that neutral alleles are not neutral indefinitely, and that in specific and very rare situations phenotypic (relative fitness) differences between them can appear. In this paper I focus on NAC due to phenotypic plasticity of neutral alleles. The important evolutionary consequence of NAC could be the increased adaptive potential of a population. Loci responsible for adaptation should be fast evolving genes with minimally discernible phenotypic effects, and the recent discovery of genes with such characteristics implicates them as suitable candidates for loci involved in adaptation.

  9. Selective Constraints on Coding Sequences of Nervous System Genes Are a Major Determinant of Duplicate Gene Retention in Vertebrates.

    PubMed

    Roux, Julien; Liu, Jialin; Robinson-Rechavi, Marc

    2017-11-01

    The evolutionary history of vertebrates is marked by three ancient whole-genome duplications: two successive rounds in the ancestor of vertebrates, and a third one specific to teleost fishes. Biased loss of most duplicates enriched the genome for specific genes, such as slow evolving genes, but this selective retention process is not well understood. To understand what drives the long-term preservation of duplicate genes, we characterized duplicated genes in terms of their expression patterns. We used a new method of expression enrichment analysis, TopAnat, applied to in situ hybridization data from thousands of genes from zebrafish and mouse. We showed that the presence of expression in the nervous system is a good predictor of a higher rate of retention of duplicate genes after whole-genome duplication. Further analyses suggest that purifying selection against the toxic effects of misfolded or misinteracting proteins, which is particularly strong in nonrenewing neural tissues, likely constrains the evolution of coding sequences of nervous system genes, leading indirectly to the preservation of duplicate genes after whole-genome duplication. Whole-genome duplications thus greatly contributed to the expansion of the toolkit of genes available for the evolution of profound novelties of the nervous system at the base of the vertebrate radiation. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  10. A high efficiency gene disruption strategy using a positive-negative split selection marker and electroporation for Fusarium oxysporum.

    PubMed

    Liang, Liqin; Li, Jianqiang; Cheng, Lin; Ling, Jian; Luo, Zhongqin; Bai, Miao; Xie, Bingyan

    2014-11-01

    The Fusarium oxysporum species complex consists of fungal pathogens that cause serial vascular wilt disease on more than 100 cultivated species throughout the world. Gene function analysis is rapidly becoming more and more important as the whole-genome sequences of various F. oxysporum strains are being completed. Gene-disruption techniques are a common molecular tool for studying gene function, yet are often a limiting step in gene function identification. In this study we have developed a F. oxysporum high-efficiency gene-disruption strategy based on split-marker homologous recombination cassettes with dual selection and electroporation transformation. The method was efficiently used to delete three RNA-dependent RNA polymerase (RdRP) genes. The gene-disruption cassettes of three genes can be constructed simultaneously within a short time using this technique. The optimal condition for electroporation is 10μF capacitance, 300Ω resistance, 4kV/cm field strength, with 1μg of DNA (gene-disruption cassettes). Under these optimal conditions, we were able to obtain 95 transformants per μg DNA. And after positive-negative selection, the transformants were efficiently screened by PCR, screening efficiency averaged 85%: 90% (RdRP1), 85% (RdRP2) and 77% (RdRP3). This gene-disruption strategy should pave the way for high throughout genetic analysis in F. oxysporum. Copyright © 2014 Elsevier GmbH. All rights reserved.

  11. Positive Selection at the Polyhomeotic Locus Led to Decreased Thermosensitivity of Gene Expression in Temperate Drosophila melanogaster

    PubMed Central

    Voigt, Susanne; Laurent, Stefan; Litovchenko, Maria; Stephan, Wolfgang

    2015-01-01

    Drosophila melanogaster as a cosmopolitan species has successfully adapted to a wide range of different environments. Variation in temperature is one important environmental factor that influences the distribution of species in nature. In particular for insects, which are mostly ectotherms, ambient temperature plays a major role in their ability to colonize new habitats. Chromatin-based gene regulation is known to be sensitive to temperature. Ambient temperature leads to changes in the activation of genes regulated in this manner. One such regulatory system is the Polycomb group (PcG) whose target genes are more expressed at lower temperatures than at higher ones. Therefore, a greater range in ambient temperature in temperate environments may lead to greater variability (plasticity) in the expression of these genes. This might have detrimental effects, such that positive selection acts to lower the degree of the expression plasticity. We provide evidence for this process in a genomic region that harbors two PcG-regulated genes, polyhomeotic proximal (ph-p) and CG3835. We found a signature of positive selection in this gene region in European populations of D. melanogaster and investigated the region by means of reporter gene assays. The target of selection is located in the intergenic fragment between the two genes. It overlaps with the promoters of both genes and an experimentally validated Polycomb response element (PRE). This fragment harbors five sequence variants that are highly differentiated between European and African populations. The African alleles confer a temperature-induced plasticity in gene expression, which is typical for PcG-mediated gene regulation, whereas thermosensitivity is reduced for the European alleles. PMID:25855066

  12. Application of machine learning on brain cancer multiclass classification

    NASA Astrophysics Data System (ADS)

    Panca, V.; Rustam, Z.

    2017-07-01

    Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.

  13. Biallelic insertion of a transcriptional terminator via the CRISPR/Cas9 system efficiently silences expression of protein-coding and non-coding RNA genes.

    PubMed

    Liu, Yangyang; Han, Xiao; Yuan, Junting; Geng, Tuoyu; Chen, Shihao; Hu, Xuming; Cui, Isabelle H; Cui, Hengmi

    2017-04-07

    The type II bacterial CRISPR/Cas9 system is a simple, convenient, and powerful tool for targeted gene editing. Here, we describe a CRISPR/Cas9-based approach for inserting a poly(A) transcriptional terminator into both alleles of a targeted gene to silence protein-coding and non-protein-coding genes, which often play key roles in gene regulation but are difficult to silence via insertion or deletion of short DNA fragments. The integration of 225 bp of bovine growth hormone poly(A) signals into either the first intron or the first exon or behind the promoter of target genes caused efficient termination of expression of PPP1R12C , NSUN2 (protein-coding genes), and MALAT1 (non-protein-coding gene). Both NeoR and PuroR were used as markers in the selection of clonal cell lines with biallelic integration of a poly(A) signal. Genotyping analysis indicated that the cell lines displayed the desired biallelic silencing after a brief selection period. These combined results indicate that this CRISPR/Cas9-based approach offers an easy, convenient, and efficient novel technique for gene silencing in cell lines, especially for those in which gene integration is difficult because of a low efficiency of homology-directed repair. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  14. Soybean (Glycine max) expansin gene superfamily origins: segmental and tandem duplication events followed by divergent selection among subfamilies

    PubMed Central

    2014-01-01

    Background Expansins are plant cell wall loosening proteins that are involved in cell enlargement and a variety of other developmental processes. The expansin superfamily contains four subfamilies; namely, α-expansin (EXPA), β-expansin (EXPB), expansin-like A (EXLA), and expansin-like B (EXLB). Although the genome sequencing of soybeans is complete, our knowledge about the pattern of expansion and evolutionary history of soybean expansin genes remains limited. Results A total of 75 expansin genes were identified in the soybean genome, and grouped into four subfamilies based on their phylogenetic relationships. Structural analysis revealed that the expansin genes are conserved in each subfamily, but are divergent among subfamilies. Furthermore, in soybean and Arabidopsis, the expansin gene family has been mainly expanded through tandem and segmental duplications; however, in rice, segmental duplication appears to be the dominant process that generates this superfamily. The transcriptome atlas revealed notable differential expression in either transcript abundance or expression patterns under normal growth conditions. This finding was consistent with the differential distribution of the cis-elements in the promoter region, and indicated wide functional divergence in this superfamily. Moreover, some critical amino acids that contribute to functional divergence and positive selection were detected. Finally, site model and branch-site model analysis of positive selection indicated that the soybean expansin gene superfamily is under strong positive selection, and that divergent selection constraints might have influenced the evolution of the four subfamilies. Conclusion This study demonstrated that the soybean expansin gene superfamily has expanded through tandem and segmental duplication. Differential expression indicated wide functional divergence in this superfamily. Furthermore, positive selection analysis revealed that divergent selection constraints might have influenced the evolution of the four subfamilies. In conclusion, the results of this study contribute novel detailed information about the molecular evolution of the expansin gene superfamily in soybean. PMID:24720629

  15. Adaptation and evolution of deep-sea scale worms (Annelida: Polynoidae): insights from transcriptome comparison with a shallow-water species

    NASA Astrophysics Data System (ADS)

    Zhang, Yanjie; Sun, Jin; Chen, Chong; Watanabe, Hiromi K.; Feng, Dong; Zhang, Yu; Chiu, Jill M. Y.; Qian, Pei-Yuan; Qiu, Jian-Wen

    2017-04-01

    Polynoid scale worms (Polynoidae, Annelida) invaded deep-sea chemosynthesis-based ecosystems approximately 60 million years ago, but little is known about their genetic adaptation to the extreme deep-sea environment. In this study, we reported the first two transcriptomes of deep-sea polynoids (Branchipolynoe pettiboneae, Lepidonotopodium sp.) and compared them with the transcriptome of a shallow-water polynoid (Harmothoe imbricata). We determined codon and amino acid usage, positive selected genes, highly expressed genes and putative duplicated genes. Transcriptome assembly produced 98,806 to 225,709 contigs in the three species. There were more positively charged amino acids (i.e., histidine and arginine) and less negatively charged amino acids (i.e., aspartic acid and glutamic acid) in the deep-sea species. There were 120 genes showing clear evidence of positive selection. Among the 10% most highly expressed genes, there were more hemoglobin genes with high expression levels in both deep-sea species. The duplicated genes related to DNA recombination and metabolism, and gene expression were only enriched in deep-sea species. Deep-sea scale worms adopted two strategies of adaptation to hypoxia in the chemosynthesis-based habitats (i.e., rapid evolution of tetra-domain hemoglobin in Branchipolynoe or high expression of single-domain hemoglobin in Lepidonotopodium sp.).

  16. Adaptation and evolution of deep-sea scale worms (Annelida: Polynoidae): insights from transcriptome comparison with a shallow-water species

    PubMed Central

    Zhang, Yanjie; Sun, Jin; Chen, Chong; Watanabe, Hiromi K.; Feng, Dong; Zhang, Yu; Chiu, Jill M.Y.; Qian, Pei-Yuan; Qiu, Jian-Wen

    2017-01-01

    Polynoid scale worms (Polynoidae, Annelida) invaded deep-sea chemosynthesis-based ecosystems approximately 60 million years ago, but little is known about their genetic adaptation to the extreme deep-sea environment. In this study, we reported the first two transcriptomes of deep-sea polynoids (Branchipolynoe pettiboneae, Lepidonotopodium sp.) and compared them with the transcriptome of a shallow-water polynoid (Harmothoe imbricata). We determined codon and amino acid usage, positive selected genes, highly expressed genes and putative duplicated genes. Transcriptome assembly produced 98,806 to 225,709 contigs in the three species. There were more positively charged amino acids (i.e., histidine and arginine) and less negatively charged amino acids (i.e., aspartic acid and glutamic acid) in the deep-sea species. There were 120 genes showing clear evidence of positive selection. Among the 10% most highly expressed genes, there were more hemoglobin genes with high expression levels in both deep-sea species. The duplicated genes related to DNA recombination and metabolism, and gene expression were only enriched in deep-sea species. Deep-sea scale worms adopted two strategies of adaptation to hypoxia in the chemosynthesis-based habitats (i.e., rapid evolution of tetra-domain hemoglobin in Branchipolynoe or high expression of single-domain hemoglobin in Lepidonotopodium sp.). PMID:28397791

  17. Multimarker analysis suggests the involvement of BDNF signaling and microRNA biosynthesis in suicidal behavior.

    PubMed

    Pulay, Attila J; Réthelyi, János M

    2016-09-01

    Despite moderate heritability estimates the genetics of suicidal behavior remains unclear, genome-wide association and candidate gene studies focusing on single nucleotide associations reported inconsistent findings. Our study explored biologically informed, multimarker candidate gene associations with suicidal behavior in mood disorders. We analyzed the GAIN Whole Genome Association Study of Bipolar Disorder version 3 (n = 999, suicidal n = 358) and the GAIN Major Depression: Stage 1 Genomewide Association in Population-Based Samples (n = 1,753, suicidal n = 245) datasets. Suicidal behavior was defined as severe suicidal ideation or attempt. Candidate genes were selected based on literature search (Geneset1, n = 35), gene expression data of microRNA genes, (Geneset2, n = 68) and their target genes (Geneset3, n = 11,259). Quality control, dosage analyses were carried out with PLINK. Gene-based associations of Geneset1 were analyzed with KGG. Polygenic profile scores of suicidal behavior were computed in the major depression dataset both with PRSice and LDpred and validated in the bipolar disorder data. Several nominally significant gene-based associations were detected, but only DICER1 associated with suicidal behavior in both samples, while only the associations of NTRK2 in the depression sample reached family wise and experiment wise significance. Polygenic profile scores negatively predicted suicidal behavior in the bipolar sample for only Geneset2, with the strongest prediction by PRSice at Pt  < 0.03 (Nagelkerke R(2)  = 0.01, P < 0.007). Gene-based association results confirmed the potential involvement of the BDNF-NTRK2-CREB pathway in the pathogenesis of suicide and the cross-disorder association of DICER1. Polygenic risk prediction of the selected miRNA genes indicates that the miRNA system may play a mediating role, but with considerable pleiotropy. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  18. SFM: A novel sequence-based fusion method for disease genes identification and prioritization.

    PubMed

    Yousef, Abdulaziz; Moghadam Charkari, Nasrollah

    2015-10-21

    The identification of disease genes from human genome is of great importance to improve diagnosis and treatment of disease. Several machine learning methods have been introduced to identify disease genes. However, these methods mostly differ in the prior knowledge used to construct the feature vector for each instance (gene), the ways of selecting negative data (non-disease genes) where there is no investigational approach to find them and the classification methods used to make the final decision. In this work, a novel Sequence-based fusion method (SFM) is proposed to identify disease genes. In this regard, unlike existing methods, instead of using a noisy and incomplete prior-knowledge, the amino acid sequence of the proteins which is universal data has been carried out to present the genes (proteins) into four different feature vectors. To select more likely negative data from candidate genes, the intersection set of four negative sets which are generated using distance approach is considered. Then, Decision Tree (C4.5) has been applied as a fusion method to combine the results of four independent state-of the-art predictors based on support vector machine (SVM) algorithm, and to make the final decision. The experimental results of the proposed method have been evaluated by some standard measures. The results indicate the precision, recall and F-measure of 82.6%, 85.6% and 84, respectively. These results confirm the efficiency and validity of the proposed method. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Evolution Is an Experiment: Assessing Parallelism in Crop Domestication and Experimental Evolution: (Nei Lecture, SMBE 2014, Puerto Rico).

    PubMed

    Gaut, Brandon S

    2015-07-01

    In this commentary, I make inferences about the level of repeatability and constraint in the evolutionary process, based on two sets of replicated experiments. The first experiment is crop domestication, which has been replicated across many different species. I focus on results of whole-genome scans for genes selected during domestication and ask whether genes are, in fact, selected in parallel across different domestication events. If genes are selected in parallel, it implies that the number of genetic solutions to the challenge of domestication is constrained. However, I find no evidence for parallel selection events either between species (maize vs. rice) or within species (two domestication events within beans). These results suggest that there are few constraints on genetic adaptation, but conclusions must be tempered by several complicating factors, particularly the lack of explicit design standards for selection screens. The second experiment involves the evolution of Escherichia coli to thermal stress. Unlike domestication, this highly replicated experiment detected a limited set of genes that appear prone to modification during adaptation to thermal stress. However, the number of potentially beneficial mutations within these genes is large, such that adaptation is constrained at the genic level but much less so at the nucleotide level. Based on these two experiments, I make the general conclusion that evolution is remarkably flexible, despite the presence of epistatic interactions that constrain evolutionary trajectories. I also posit that evolution is so rapid that we should establish a Speciation Prize, to be awarded to the first researcher who demonstrates speciation with a sexual organism in the laboratory. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  20. Literature-based discovery of diabetes- and ROS-related targets

    PubMed Central

    2010-01-01

    Background Reactive oxygen species (ROS) are known mediators of cellular damage in multiple diseases including diabetic complications. Despite its importance, no comprehensive database is currently available for the genes associated with ROS. Methods We present ROS- and diabetes-related targets (genes/proteins) collected from the biomedical literature through a text mining technology. A web-based literature mining tool, SciMiner, was applied to 1,154 biomedical papers indexed with diabetes and ROS by PubMed to identify relevant targets. Over-represented targets in the ROS-diabetes literature were obtained through comparisons against randomly selected literature. The expression levels of nine genes, selected from the top ranked ROS-diabetes set, were measured in the dorsal root ganglia (DRG) of diabetic and non-diabetic DBA/2J mice in order to evaluate the biological relevance of literature-derived targets in the pathogenesis of diabetic neuropathy. Results SciMiner identified 1,026 ROS- and diabetes-related targets from the 1,154 biomedical papers (http://jdrf.neurology.med.umich.edu/ROSDiabetes/). Fifty-three targets were significantly over-represented in the ROS-diabetes literature compared to randomly selected literature. These over-represented targets included well-known members of the oxidative stress response including catalase, the NADPH oxidase family, and the superoxide dismutase family of proteins. Eight of the nine selected genes exhibited significant differential expression between diabetic and non-diabetic mice. For six genes, the direction of expression change in diabetes paralleled enhanced oxidative stress in the DRG. Conclusions Literature mining compiled ROS-diabetes related targets from the biomedical literature and led us to evaluate the biological relevance of selected targets in the pathogenesis of diabetic neuropathy. PMID:20979611

  1. Creating genetic resistance to HIV.

    PubMed

    Burnett, John C; Zaia, John A; Rossi, John J

    2012-10-01

    HIV/AIDS remains a chronic and incurable disease, in spite of the notable successes of combination antiretroviral therapy. Gene therapy offers the prospect of creating genetic resistance to HIV that supplants the need for antiviral drugs. In sight of this goal, a variety of anti-HIV genes have reached clinical testing, including gene-editing enzymes, protein-based inhibitors, and RNA-based therapeutics. Combinations of therapeutic genes against viral and host targets are designed to improve the overall antiviral potency and reduce the likelihood of viral resistance. In cell-based therapies, therapeutic genes are expressed in gene modified T lymphocytes or in hematopoietic stem cells that generate an HIV-resistant immune system. Such strategies must promote the selective proliferation of the transplanted cells and the prolonged expression of therapeutic genes. This review focuses on the current advances and limitations in genetic therapies against HIV, including the status of several recent and ongoing clinical studies. Copyright © 2012 Elsevier Ltd. All rights reserved.

  2. An Adaptive Genetic Association Test Using Double Kernel Machines

    PubMed Central

    Zhan, Xiang; Epstein, Michael P.; Ghosh, Debashis

    2014-01-01

    Recently, gene set-based approaches have become very popular in gene expression profiling studies for assessing how genetic variants are related to disease outcomes. Since most genes are not differentially expressed, existing pathway tests considering all genes within a pathway suffer from considerable noise and power loss. Moreover, for a differentially expressed pathway, it is of interest to select important genes that drive the effect of the pathway. In this article, we propose an adaptive association test using double kernel machines (DKM), which can both select important genes within the pathway as well as test for the overall genetic pathway effect. This DKM procedure first uses the garrote kernel machines (GKM) test for the purposes of subset selection and then the least squares kernel machine (LSKM) test for testing the effect of the subset of genes. An appealing feature of the kernel machine framework is that it can provide a flexible and unified method for multi-dimensional modeling of the genetic pathway effect allowing for both parametric and nonparametric components. This DKM approach is illustrated with application to simulated data as well as to data from a neuroimaging genetics study. PMID:26640602

  3. Purifying Selection on Exonic Splice Enhancers in Intronless Genes

    PubMed Central

    Savisaar, Rosina; Hurst, Laurence D.

    2016-01-01

    Exonic splice enhancers (ESEs) are short nucleotide motifs, enriched near exon ends, that enhance the recognition of the splice site and thus promote splicing. Are intronless genes under selection to avoid these motifs so as not to attract the splicing machinery to an mRNA that should not be spliced, thereby preventing the production of an aberrant transcript? Consistent with this possibility, we find that ESEs in putative recent retrocopies are at a higher density and evolving faster than those in other intronless genes, suggesting that they are being lost. Moreover, intronless genes are less dense in putative ESEs than intron-containing ones. However, this latter difference is likely due to the skewed base composition of intronless sequences, a skew that is in line with the general GC richness of few exon genes. Indeed, after controlling for such biases, we find that both intronless and intron-containing genes are denser in ESEs than expected by chance. Importantly, nucleotide-controlled analysis of evolutionary rates at synonymous sites in ESEs indicates that the ESEs in intronless genes are under purifying selection in both human and mouse. We conclude that on the loss of introns, some but not all, ESE motifs are lost, the remainder having functions beyond a role in splice promotion. These results have implications for the design of intronless transgenes and for understanding the causes of selection on synonymous sites. PMID:26802218

  4. Engineering of ribozyme-based aminoglycoside switches of gene expression by in vivo genetic selection in Saccharomyces cerevisiae.

    PubMed

    Klauser, Benedikt; Rehm, Charlotte; Summerer, Daniel; Hartig, Jörg S

    2015-01-01

    Synthetic RNA-based switches are a growing class of genetic controllers applied in synthetic biology to engineer cellular functions. In this chapter, we detail a protocol for the selection of posttranscriptional controllers of gene expression in yeast using the Schistosoma mansoni hammerhead ribozyme as a central catalytic unit. Incorporation of a small molecule-sensing aptamer domain into the ribozyme renders its activity ligand-dependent. Aptazymes display numerous advantages over conventional protein-based transcriptional controllers, namely, the use of little genomic space for encryption, their modular architecture allowing for easy reprogramming to new inputs, the physical linkage to the message to be controlled, and the ability to function without protein cofactors. Herein, we describe the method to select ribozyme-based switches of gene expression in Saccharomyces cerevisiae that we successfully implemented to engineer neomycin- and theophylline-responsive switches. We also highlight how to adapt the protocol to screen for switches responsive to other ligands. Reprogramming of the sensor unit and incorporation into any RNA of interest enables the fulfillment of a variety of regulatory functions. However, proper functioning of the aptazyme is largely dependent on optimal connection between the aptamer and the catalytic core. We obtained functional switches from a pool of variants carrying randomized connection sequences by an in vivo selection in MaV203 yeast cells that allows screening of a large sequence space of up to 1×10(9) variants. The protocol given explains how to construct aptazyme libraries, carry out the in vivo selection and characterize novel ON- and OFF-switches. © 2015 Elsevier Inc. All rights reserved.

  5. Adjusting for background mutation frequency biases improves the identification of cancer driver genes.

    PubMed

    Evans, Perry; Avey, Stefan; Kong, Yong; Krauthammer, Michael

    2013-09-01

    A common goal of tumor sequencing projects is finding genes whose mutations are selected for during tumor development. This is accomplished by choosing genes that have more non-synonymous mutations than expected from an estimated background mutation frequency. While this background frequency is unknown, it can be estimated using both the observed synonymous mutation frequency and the non-synonymous to synonymous mutation ratio. The synonymous mutation frequency can be determined across all genes or in a gene-specific manner. This choice introduces an interesting trade-off. A gene-specific frequency adjusts for an underlying mutation bias, but is difficult to estimate given missing synonymous mutation counts. Using a genome-wide synonymous frequency is more robust, but is less suited for adjusting biases. Studying four evaluation criteria for identifying genes with high non-synonymous mutation burden (reflecting preferential selection of expressed genes, genes with mutations in conserved bases, genes with many protein interactions, and genes that show loss of heterozygosity), we find that the gene-specific synonymous frequency is superior in the gene expression and protein interaction tests. In conclusion, the use of the gene-specific synonymous mutation frequency is well suited for assessing a gene's non-synonymous mutation burden.

  6. Fast-tracking determination of homozygous transgenic lines and transgene stacking using a reliable quantitative real-time PCR assay.

    PubMed

    Wang, Xianghong; Jiang, Daiming; Yang, Daichang

    2015-01-01

    The selection of homozygous lines is a crucial step in the characterization of newly generated transgenic plants. This is particularly time- and labor-consuming when transgenic stacking is required. Here, we report a fast and accurate method based on quantitative real-time PCR with a rice gene RBE4 as a reference gene for selection of homozygous lines when using multiple transgenic stacking in rice. Use of this method allowed can be used to determine the stacking of up to three transgenes within four generations. Selection accuracy reached 100 % for a single locus and 92.3 % for two loci. This method confers distinct advantages over current transgenic research methodologies, as it is more accurate, rapid, and reliable. Therefore, this protocol could be used to efficiently select homozygous plants and to expedite time- and labor-consuming processes normally required for multiple transgene stacking. This protocol was standardized for determination of multiple gene stacking in molecular breeding via marker-assisted selection.

  7. Self-assembled pentablock copolymers for selective and sustained gene delivery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Bingqi

    2011-05-15

    The poly(diethylaminoethyl methacrylate) (PDEAEM) - Pluronic F127 - PDEAEM pentablock copolymer (PB) gene delivery vector system has been found to possess an inherent selectivity in transfecting cancer cells over non-cancer cells in vitro, without attaching any targeting ligands. In order to understand the mechanism of this selective transfection, three possible intracellular barriers to transfection were investigated in both cancer and non-cancer cells. We concluded that escape from the endocytic pathway served as the primary intracellular barrier for PB-mediated transfection. Most likely, PB vectors were entrapped and rendered non-functional in acidic lysosomes of non-cancer cells, but survived in less acidic lysosomesmore » of cancer cells. The work highlights the importance of identifying intracellular barriers for different gene delivery systems and provides a new paradigm for designing targeting vectors based on intracellular differences between cell types, rather than through the use of targeting ligands. The PB vector was further developed to simultaneously deliver anticancer drugs and genes, which showed a synergistic effect demonstrated by significantly enhanced gene expression in vitro. Due to the thermosensitive gelation behavior, the PB vector packaging both drug and gene was also investigated for its in vitro sustained release properties by using polyethylene glycol diacrylate as a barrier gel to mimic the tumor matrix in vivo. Overall, this work resulted in the development of a gene delivery vector for sustained and selective gene delivery to tumor cells for cancer therapy.« less

  8. A data science approach to candidate gene selection of pain regarded as a process of learning and neural plasticity.

    PubMed

    Ultsch, Alfred; Kringel, Dario; Kalso, Eija; Mogil, Jeffrey S; Lötsch, Jörn

    2016-12-01

    The increasing availability of "big data" enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 535 genes identified empirically as relevant to pain with the knowledge about the functions of thousands of genes. Starting from an accepted description of chronic pain as displaying systemic features described by the terms "learning" and "neuronal plasticity," a functional genomics analysis proposed that among the functions of the 535 "pain genes," the biological processes "learning or memory" (P = 8.6 × 10) and "nervous system development" (P = 2.4 × 10) are statistically significantly overrepresented as compared with the annotations to these processes expected by chance. After establishing that the hypothesized biological processes were among important functional genomics features of pain, a subset of n = 34 pain genes were found to be annotated with both Gene Ontology terms. Published empirical evidence supporting their involvement in chronic pain was identified for almost all these genes, including 1 gene identified in March 2016 as being involved in pain. By contrast, such evidence was virtually absent in a randomly selected set of 34 other human genes. Hence, the present computational functional genomics-based method can be used for candidate gene selection, providing an alternative to established methods.

  9. Hypoxia adaptations in the grey wolf (Canis lupus chanco) from Qinghai-Tibet Plateau.

    PubMed

    Zhang, Wenping; Fan, Zhenxin; Han, Eunjung; Hou, Rong; Zhang, Liang; Galaverni, Marco; Huang, Jie; Liu, Hong; Silva, Pedro; Li, Peng; Pollinger, John P; Du, Lianming; Zhang, XiuyYue; Yue, Bisong; Wayne, Robert K; Zhang, Zhihe

    2014-07-01

    The Tibetan grey wolf (Canis lupus chanco) occupies habitats on the Qinghai-Tibet Plateau, a high altitude (>3000 m) environment where low oxygen tension exerts unique selection pressure on individuals to adapt to hypoxic conditions. To identify genes involved in hypoxia adaptation, we generated complete genome sequences of nine Chinese wolves from high and low altitude populations at an average coverage of 25× coverage. We found that, beginning about 55,000 years ago, the highland Tibetan grey wolf suffered a more substantial population decline than lowland wolves. Positively selected hypoxia-related genes in highland wolves are enriched in the HIF signaling pathway (P = 1.57E-6), ATP binding (P = 5.62E-5), and response to an oxygen-containing compound (P≤5.30E-4). Of these positively selected hypoxia-related genes, three genes (EPAS1, ANGPT1, and RYR2) had at least one specific fixed non-synonymous SNP in highland wolves based on the nine genome data. Our re-sequencing studies on a large panel of individuals showed a frequency difference greater than 58% between highland and lowland wolves for these specific fixed non-synonymous SNPs and a high degree of LD surrounding the three genes, which imply strong selection. Past studies have shown that EPAS1 and ANGPT1 are important in the response to hypoxic stress, and RYR2 is involved in heart function. These three genes also exhibited significant signals of natural selection in high altitude human populations, which suggest similar evolutionary constraints on natural selection in wolves and humans of the Qinghai-Tibet Plateau.

  10. Plant phosphomannose isomerase as a selectable marker for rice transformation

    PubMed Central

    Hu, Lei; Li, Hao; Qin, Ruiying; Xu, Rongfang; Li, Juan; Li, Li; Wei, Pengcheng; Yang, Jianbo

    2016-01-01

    The E. coli phosphomannose isomerase (EcPMI) gene is widely used as a selectable marker gene (SMG) in mannose (Man) selection-based plant transformation. Although some plant species exhibit significant PMI activity and active PMIs were even identified in Man-sensitive plants, whether plant PMIs can be used as SMGs remains unclear. In this study, we isolated four novel PMI genes from Chlorella variabilis and Oryza sativa. Their isoenzymatic activities were examined in vitro and compared with that of EcPMI. The active plant PMIs were separately constructed into binary vectors as SMGs and then transformed into rice via Agrobacterium. In both Indica and Japonica subspecies, our results indicated that the plant PMIs could select and produce transgenic plants in a pattern similar to that of EcPMI. The transgenic plants exhibited an accumulation of plant PMI transcripts and enhancement of the in vivo PMI activity. Furthermore, a gene of interest was successfully transformed into rice using the plant PMIs as SMGs. Thus, novel SMGs for Man selection were isolated from plants, and our analysis suggested that PMIs encoding active enzymes might be common in plants and could potentially be used as appropriate genetic elements in cisgenesis engineering. PMID:27174847

  11. Gene copy number evolution during tetraploid cotton radiation.

    PubMed

    Rong, J; Feltus, F A; Liu, L; Lin, L; Paterson, A H

    2010-11-01

    After polyploid formation, retention or loss of duplicated genes is not random. Genes with some functional domains are convergently restored to 'singleton' state after many independent genome duplications, and have been referred to as 'duplication-resistant' (DR) genes. To further explore the timeframe for their restoration to the singleton state, 27 cotton homologs of genes found to be 'DR' in Arabidopsis were selected based on diagnostic Pfam domains. Their copy numbers were studied using southern hybridization and sequence analysis in five tetraploid species and their ancestral A and D genome diploids. DR genes had significantly lower copy number than gene families hybridizing to randomly selected cotton ESTs. Three DR genes showed complete loss of D genome-derived homoeologs in some or all tetraploid species. Prior analysis has shown gene loss in polyploid cotton to be rare, and herein only one randomly selected gene showed loss of a homoeolog in only one of the five tetraploid species (Gossypium mustelinum). BAC sequencing confirmed two cases of gene loss in tetraploid cotton. Divergence among 5' sequences of DR genes amplified from G. arboreum, G. raimondii, and Gossypioides kirkii was correlated with gene copy number. These results show that genes containing Pfam domains associated with duplication resistance in Arabidopsis have also been preferentially restored to low copy number after a more recent polyploidization event in cotton. In tetraploid cotton, genes from the progenitor D genome seem to experience more gene copy number divergence than genes from the A genome. Together with D subgenome-biased alterations in gene expression, perhaps gene loss may contribute to the relatively larger portion of quantitative trait variation attributable to D than A subgenome chromosomes of tetraploid cotton.

  12. Explicit Building Block Multiobjective Evolutionary Computation: Methods and Applications

    DTIC Science & Technology

    2005-06-16

    which is introduced in 1990 by Richard Dawkins in his book ”The Selfish Gene .” [34] 356 E.5.7 Pareto Envelop-based Selection Algorithm I and II...IGC Intelligent Gene Collector . . . . . . . . . . . . . . . . . 59 OED Orthogonal Experimental Design . . . . . . . . . . . . . 59 MED Main Effect...complete one experiment 74 `′ The string length hold within the computer (can be longer than number of genes

  13. Saturation of an intra-gene pool linkage map: toward unified consensus linkage map in common bean

    USDA-ARS?s Scientific Manuscript database

    Map-based cloning to find genes of interest and marker assisted selection (MAS) requires good genetic maps with high reproducible markers. In this study, we saturated the linkage map of the intra-gene pool population of common bean DOR364×BAT477 (DB) by evaluating 2,706 molecular markers in includin...

  14. A general framework for optimization of probes for gene expression microarray and its application to the fungus Podospora anserina

    PubMed Central

    2010-01-01

    Background The development of new microarray technologies makes custom long oligonucleotide arrays affordable for many experimental applications, notably gene expression analyses. Reliable results depend on probe design quality and selection. Probe design strategy should cope with the limited accuracy of de novo gene prediction programs, and annotation up-dating. We present a novel in silico procedure which addresses these issues and includes experimental screening, as an empirical approach is the best strategy to identify optimal probes in the in silico outcome. Findings We used four criteria for in silico probe selection: cross-hybridization, hairpin stability, probe location relative to coding sequence end and intron position. This latter criterion is critical when exon-intron gene structure predictions for intron-rich genes are inaccurate. For each coding sequence (CDS), we selected a sub-set of four probes. These probes were included in a test microarray, which was used to evaluate the hybridization behavior of each probe. The best probe for each CDS was selected according to three experimental criteria: signal-to-noise ratio, signal reproducibility, and representative signal intensities. This procedure was applied for the development of a gene expression Agilent platform for the filamentous fungus Podospora anserina and the selection of a single 60-mer probe for each of the 10,556 P. anserina CDS. Conclusions A reliable gene expression microarray version based on the Agilent 44K platform was developed with four spot replicates of each probe to increase statistical significance of analysis. PMID:20565839

  15. Genes Regulated by Vitamin D in Bone Cells Are Positively Selected in East Asians

    PubMed Central

    Chen, Yuan; Xue, Yali; Luiselli, Donata; Tyler-Smith, Chris; Pagani, Luca; Ayub, Qasim

    2015-01-01

    Vitamin D and folate are activated and degraded by sunlight, respectively, and the physiological processes they control are likely to have been targets of selection as humans expanded from Africa into Eurasia. We investigated signals of positive selection in gene sets involved in the metabolism, regulation and action of these two vitamins in worldwide populations sequenced by Phase I of the 1000 Genomes Project. Comparing allele frequency-spectrum-based summary statistics between these gene sets and matched control genes, we observed a selection signal specific to East Asians for a gene set associated with vitamin D action in bones. The selection signal was mainly driven by three genes CXXC finger protein 1 (CXXC1), low density lipoprotein receptor-related protein 5 (LRP5) and runt-related transcription factor 2 (RUNX2). Examination of population differentiation and haplotypes allowed us to identify several candidate causal regulatory variants in each gene. Four of these candidate variants (one each in CXXC1 and RUNX2 and two in LRP5) had a >70% derived allele frequency in East Asians, but were present at lower (20–60%) frequency in Europeans as well, suggesting that the adaptation might have been part of a common response to climatic and dietary changes as humans expanded out of Africa, with implications for their role in vitamin D-dependent bone mineralization and osteoporosis insurgence. We also observed haplotype sharing between East Asians, Finns and an extinct archaic human (Denisovan) sample at the CXXC1 locus, which is best explained by incomplete lineage sorting. PMID:26719974

  16. Genetic signatures of natural selection in a model invasive ascidian

    PubMed Central

    Lin, Yaping; Chen, Yiyong; Yi, Changho; Fong, Jonathan J.; Kim, Won; Rius, Marc; Zhan, Aibin

    2017-01-01

    Invasive species represent promising models to study species’ responses to rapidly changing environments. Although local adaptation frequently occurs during contemporary range expansion, the associated genetic signatures at both population and genomic levels remain largely unknown. Here, we use genome-wide gene-associated microsatellites to investigate genetic signatures of natural selection in a model invasive ascidian, Ciona robusta. Population genetic analyses of 150 individuals sampled in Korea, New Zealand, South Africa and Spain showed significant genetic differentiation among populations. Based on outlier tests, we found high incidence of signatures of directional selection at 19 loci. Hitchhiking mapping analyses identified 12 directional selective sweep regions, and all selective sweep windows on chromosomes were narrow (~8.9 kb). Further analyses indentified 132 candidate genes under selection. When we compared our genetic data and six crucial environmental variables, 16 putatively selected loci showed significant correlation with these environmental variables. This suggests that the local environmental conditions have left significant signatures of selection at both population and genomic levels. Finally, we identified “plastic” genomic regions and genes that are promising regions to investigate evolutionary responses to rapid environmental change in C. robusta. PMID:28266616

  17. Gene flow from domesticated species to wild relatives: migration load in a model of multivariate selection.

    PubMed

    Tufto, Jarle

    2010-01-01

    Domesticated species frequently spread their genes into populations of wild relatives through interbreeding. The domestication process often involves artificial selection for economically desirable traits. This can lead to an indirect response in unknown correlated traits and a reduction in fitness of domesticated individuals in the wild. Previous models for the effect of gene flow from domesticated species to wild relatives have assumed that evolution occurs in one dimension. Here, I develop a quantitative genetic model for the balance between migration and multivariate stabilizing selection. Different forms of correlational selection consistent with a given observed ratio between average fitness of domesticated and wild individuals offsets the phenotypic means at migration-selection balance away from predictions based on simpler one-dimensional models. For almost all parameter values, correlational selection leads to a reduction in the migration load. For ridge selection, this reduction arises because the distance the immigrants deviates from the local optimum in effect is reduced. For realistic parameter values, however, the effect of correlational selection on the load is small, suggesting that simpler one-dimensional models may still be adequate in terms of predicting mean population fitness and viability.

  18. Cancer diagnosis marker extraction for soft tissue sarcomas based on gene expression profiling data by using projective adaptive resonance theory (PART) filtering method

    PubMed Central

    Takahashi, Hiro; Nemoto, Takeshi; Yoshida, Teruhiko; Honda, Hiroyuki; Hasegawa, Tadashi

    2006-01-01

    Background Recent advances in genome technologies have provided an excellent opportunity to determine the complete biological characteristics of neoplastic tissues, resulting in improved diagnosis and selection of treatment. To accomplish this objective, it is important to establish a sophisticated algorithm that can deal with large quantities of data such as gene expression profiles obtained by DNA microarray analysis. Results Previously, we developed the projective adaptive resonance theory (PART) filtering method as a gene filtering method. This is one of the clustering methods that can select specific genes for each subtype. In this study, we applied the PART filtering method to analyze microarray data that were obtained from soft tissue sarcoma (STS) patients for the extraction of subtype-specific genes. The performance of the filtering method was evaluated by comparison with other widely used methods, such as signal-to-noise, significance analysis of microarrays, and nearest shrunken centroids. In addition, various combinations of filtering and modeling methods were used to extract essential subtype-specific genes. The combination of the PART filtering method and boosting – the PART-BFCS method – showed the highest accuracy. Seven genes among the 15 genes that are frequently selected by this method – MIF, CYFIP2, HSPCB, TIMP3, LDHA, ABR, and RGS3 – are known prognostic marker genes for other tumors. These genes are candidate marker genes for the diagnosis of STS. Correlation analysis was performed to extract marker genes that were not selected by PART-BFCS. Sixteen genes among those extracted are also known prognostic marker genes for other tumors, and they could be candidate marker genes for the diagnosis of STS. Conclusion The procedure that consisted of two steps, such as the PART-BFCS and the correlation analysis, was proposed. The results suggest that novel diagnostic and therapeutic targets for STS can be extracted by a procedure that includes the PART filtering method. PMID:16948864

  19. Identification and validation of single nucleotide polymorphisms in growth- and maturation-related candidate genes in sole (Solea solea L.).

    PubMed

    Diopere, Eveline; Hellemans, Bart; Volckaert, Filip A M; Maes, Gregory E

    2013-03-01

    Genomic methodologies applied in evolutionary and fisheries research have been of great benefit to understand the marine ecosystem and the management of natural resources. Although single nucleotide polymorphisms (SNPs) are attractive for the study of local adaptation, spatial stock management and traceability, and investigating the effects of fisheries-induced selection, they have rarely been exploited in non-model organisms. This is partly due to difficulties in finding and validating SNPs in species with limited or no genomic resources. Complementary to random genome-scan approaches, a targeted candidate gene approach has the potential to unveil pre-selected functional diversity and provides more in depth information on the action of selection at specific genes. For example genes can be under selective pressure due to climate change and sustained periods of heavy fishing pressure. In this study, we applied a candidate gene approach in sole (Solea solea L.), an important member of the demersal ecosystem. As consumption flatfish it is heavy exploited and has experienced associated life-history changes over the last 60years. To discover novel genetic polymorphisms in or around genes linked to important life history traits in sole, we screened a total of 76 candidate genes related to growth and maturation using a targeted resequencing approach. We identified in total 86 putative SNPs in 22 genes and validated 29 SNPs using a multiplex single-base extension genotyping assay. We found 22 informative SNPs, of which two represent non-synonymous mutations, potentially of functional relevance. These novel markers should be rapidly and broadly applicable in analyses of natural sole populations, as a measure of the evolutionary signature of overfishing and for initiatives on marker assisted selection. Copyright © 2012 Elsevier B.V. All rights reserved.

  20. Role and regulation of autophagy in heat stress responses of tomato plants

    PubMed Central

    Zhou, Jie; Wang, Jian; Yu, Jing-Quan; Chen, Zhixiang

    2014-01-01

    As sessile organisms, plants are constantly exposed to a wide spectrum of stress conditions such as high temperature, which causes protein misfolding. Misfolded proteins are highly toxic and must be efficiently removed to reduce cellular proteotoxic stress if restoration of native conformations is unsuccessful. Although selective autophagy is known to function in protein quality control by targeting degradation of misfolded and potentially toxic proteins, its role and regulation in heat stress responses have not been analyzed in crop plants. In the present study, we found that heat stress induced expression of autophagy-related (ATG) genes and accumulation of autophagosomes in tomato plants. Virus-induced gene silencing (VIGS) of tomato ATG5 and ATG7 genes resulted in increased sensitivity of tomato plants to heat stress based on both increased development of heat stress symptoms and compromised photosynthetic parameters of heat-stressed leaf tissues. Silencing of tomato homologs for the selective autophagy receptor NBR1, which targets ubiquitinated protein aggregates, also compromised tomato heat tolerance. To better understand the regulation of heat-induced autophagy, we found that silencing of tomato ATG5, ATG7, or NBR1 compromised heat-induced expression of not only the targeted genes but also other autophagy-related genes. Furthermore, we identified two tomato genes encoding proteins highly homologous to Arabidopsis WRKY33 transcription factor, which has been previously shown to interact physically with an autophagy protein. Silencing of tomato WRKY33 genes compromised tomato heat tolerance and reduced heat-induced ATG gene expression and autophagosome accumulation. Based on these results, we propose that heat-induced autophagy in tomato is subject to cooperative regulation by both WRKY33 and ATG proteins and plays a critical role in tomato heat tolerance, mostly likely through selective removal of heat-induced protein aggregates. PMID:24817875

  1. Assessment of genetic diversity among four orchids based on ddRAD sequencing data for conservation purposes.

    PubMed

    Roy, Subhas Chandra; Moitra, Kaushik; De Sarker, Dilip

    2017-01-01

    Genetic diversity was assessed in the four orchid species using NGS based ddRAD sequencing data. The assembled nucleotide sequences (fastq) were deposited in the SRA archive of NCBI Database with accession number (SRP063543 for Dendrobium , SRP065790 for Geodorum, SRP072201 for Cymbidium and SRP072378 for Rhynchostylis ). Total base pair read was 1.1 Mbp in case of Dendrobium sp., 553.3 Kbp for Geodorum sp., 1.6 Gbp for Cymbidium , and 1.4 Gbp for Rhynchostylis . Average GC% was 43.9 in Geodorum , 43.7% in Dendrobium , 41.2% in Cymbidium and 42.3% in Rhynchostylis . Four partial gene sequences were used in DnaSP5 program for nucleotide diversity and phylogenetic relationship determination ( Ycf2 gene of Dendrobium, matK gene of Geodorum , psbD gene of Cymbidium and Ycf2 gene of Ryhnchostylis ). Nucleotide diversity (per site) Pi (π) was 0.10560 in Dendrobium, 0.03586 in Geodorum, 0.01364 in Cymbidium and 0.011344 in Rhynchostylis . Neutrality test statistics showed the negative value in all the four orchid species (Tajima's D value -2.17959 in Dendrobium , -2.01655 in Geodorum, -2.12362 in Rhynchostylis and -1.54222 in Cymbidium ) indicating the purifying selection. Result for these gene sequences ( mat K and Ycf 2 and psb D) indicate that they were not evolved neutrally, but signifying that selection might have played a role in evolution of these genes in these four groups of orchids. Phylogenetic relationship was analyzed by reconstructing dendrogram based on the matK, psbD and Ycf2 gene sequences using maximum likelihood method in MEGA6 program.

  2. The Role of Small RNA-Based Epigenetic Silencing for Purifying Selection on Transposable Elements in Capsella grandiflora

    PubMed Central

    Horvath, Robert

    2017-01-01

    Abstract To avoid negative effects of transposable element (TE) proliferation, plants epigenetically silence TEs using a number of mechanisms, including RNA-directed DNA methylation. These epigenetic modifications can extend outside the boundaries of TE insertions and lead to silencing of nearby genes, resulting in a trade-off between TE silencing and interference with nearby gene regulation. Therefore, purifying selection is expected to remove silenced TE insertions near genes more efficiently and prevent their accumulation within a population. To explore how effects of TE silencing on gene regulation shapes purifying selection on TEs, we analyzed whole genome sequencing data from 166 individuals of a large population of the outcrossing species Capsella grandiflora. We found that most TEs are rare, and in chromosome arms, silenced TEs are exposed to stronger purifying selection than those that are not silenced by 24-nucleotide small RNAs, especially with increasing proximity to genes. An age-of-allele test of neutrality on a subset of TEs supports our inference of purifying selection on silenced TEs, suggesting that our results are robust to varying transposition rates. Our results provide new insights into the processes affecting the accumulation of TEs in an outcrossing species and support the view that epigenetic silencing of TEs results in a trade-off between preventing TE proliferation and interference with nearby gene regulation. We also suggest that in the centromeric and pericentromeric regions, the negative aspects of epigenetic TE silencing are missing. PMID:29036316

  3. Use of Bayesian Networks to Probabilistically Model and Improve the Likelihood of Validation of Microarray Findings by RT-PCR

    PubMed Central

    English, Sangeeta B.; Shih, Shou-Ching; Ramoni, Marco F.; Smith, Lois E.; Butte, Atul J.

    2014-01-01

    Though genome-wide technologies, such as microarrays, are widely used, data from these methods are considered noisy; there is still varied success in downstream biological validation. We report a method that increases the likelihood of successfully validating microarray findings using real time RT-PCR, including genes at low expression levels and with small differences. We use a Bayesian network to identify the most relevant sources of noise based on the successes and failures in validation for an initial set of selected genes, and then improve our subsequent selection of genes for validation based on eliminating these sources of noise. The network displays the significant sources of noise in an experiment, and scores the likelihood of validation for every gene. We show how the method can significantly increase validation success rates. In conclusion, in this study, we have successfully added a new automated step to determine the contributory sources of noise that determine successful or unsuccessful downstream biological validation. PMID:18790084

  4. Using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex.

    PubMed

    Pavlidis, Paul; Qin, Jie; Arango, Victoria; Mann, John J; Sibille, Etienne

    2004-06-01

    One of the challenges in the analysis of gene expression data is placing the results in the context of other data available about genes and their relationships to each other. Here, we approach this problem in the study of gene expression changes associated with age in two areas of the human prefrontal cortex, comparing two computational methods. The first method, "overrepresentation analysis" (ORA), is based on statistically evaluating the fraction of genes in a particular gene ontology class found among the set of genes showing age-related changes in expression. The second method, "functional class scoring" (FCS), examines the statistical distribution of individual gene scores among all genes in the gene ontology class and does not involve an initial gene selection step. We find that FCS yields more consistent results than ORA, and the results of ORA depended strongly on the gene selection threshold. Our findings highlight the utility of functional class scoring for the analysis of complex expression data sets and emphasize the advantage of considering all available genomic information rather than sets of genes that pass a predetermined "threshold of significance."

  5. Distinct Trajectories of Massive Recent Gene Gains and Losses in Populations of a Microbial Eukaryotic Pathogen.

    PubMed

    Hartmann, Fanny E; Croll, Daniel

    2017-11-01

    Differences in gene content are a significant source of variability within species and have an impact on phenotypic traits. However, little is known about the mechanisms responsible for the most recent gene gains and losses. We screened the genomes of 123 worldwide isolates of the major pathogen of wheat Zymoseptoria tritici for robust evidence of gene copy number variation. Based on orthology relationships in three closely related fungi, we identified 599 gene gains and 1,024 gene losses that have not yet reached fixation within the focal species. Our analyses of gene gains and losses segregating in populations showed that gene copy number variation arose preferentially in subtelomeres and in proximity to transposable elements. Recently lost genes were enriched in virulence factors and secondary metabolite gene clusters. In contrast, recently gained genes encoded mostly secreted protein lacking a conserved domain. We analyzed the frequency spectrum at loci segregating a gene presence-absence polymorphism in four worldwide populations. Recent gene losses showed a significant excess in low-frequency variants compared with genome-wide single nucleotide polymorphism, which is indicative of strong negative selection against gene losses. Recent gene gains were either under weak negative selection or neutral. We found evidence for strong divergent selection among populations at individual loci segregating a gene presence-absence polymorphism. Hence, gene gains and losses likely contributed to local adaptation. Our study shows that microbial eukaryotes harbor extensive copy number variation within populations and that functional differences among recently gained and lost genes led to distinct evolutionary trajectories. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  6. Improving the efficiency of CHO cell line generation using glutamine synthetase gene knockout cells.

    PubMed

    Fan, Lianchun; Kadura, Ibrahim; Krebs, Lara E; Hatfield, Christopher C; Shaw, Margaret M; Frye, Christopher C

    2012-04-01

    Although Chinese hamster ovary (CHO) cells, with their unique characteristics, have become a major workhorse for the manufacture of therapeutic recombinant proteins, one of the major challenges in CHO cell line generation (CLG) is how to efficiently identify those rare, high-producing clones among a large population of low- and non-productive clones. It is not unusual that several hundred individual clones need to be screened for the identification of a commercial clonal cell line with acceptable productivity and growth profile making the cell line appropriate for commercial application. This inefficiency makes the process of CLG both time consuming and laborious. Currently, there are two main CHO expression systems, dihydrofolate reductase (DHFR)-based methotrexate (MTX) selection and glutamine synthetase (GS)-based methionine sulfoximine (MSX) selection, that have been in wide industrial use. Since selection of recombinant cell lines in the GS-CHO system is based on the balance between the expression of the GS gene introduced by the expression plasmid and the addition of the GS inhibitor, L-MSX, the expression of GS from the endogenous GS gene in parental CHOK1SV cells will likely interfere with the selection process. To study endogenous GS expression's potential impact on selection efficiency, GS-knockout CHOK1SV cell lines were generated using the zinc finger nuclease (ZFN) technology designed to specifically target the endogenous CHO GS gene. The high efficiency (∼2%) of bi-allelic modification on the CHO GS gene supports the unique advantages of the ZFN technology, especially in CHO cells. GS enzyme function disruption was confirmed by the observation of glutamine-dependent growth of all GS-knockout cell lines. Full evaluation of the GS-knockout cell lines in a standard industrial cell culture process was performed. Bulk culture productivity improved two- to three-fold through the use of GS-knockout cells as parent cells. The selection stringency was significantly increased, as indicated by the large reduction of non-producing and low-producing cells after 25 µM L-MSX selection, and resulted in a six-fold efficiency improvement in identifying similar numbers of high-productive cell lines for a given recombinant monoclonal antibody. The potential impact of GS-knockout cells on recombinant protein quality is also discussed. Copyright © 2011 Wiley Periodicals, Inc.

  7. Establishment of a Cre recombinase based mutagenesis protocol for markerless gene deletion in Streptococcus suis.

    PubMed

    Koczula, A; Willenborg, J; Bertram, R; Takamatsu, D; Valentin-Weigand, P; Goethe, R

    2014-12-01

    The lack of knowledge about pathogenicity mechanisms of Streptococcus (S.) suis is, at least partially, attributed to limited methods for its genetic manipulation. Here, we established a Cre-lox based recombination system for markerless gene deletions in S. suis serotype 2 with high selective pressure and without undesired side effects. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Role of the p55-gamma subunit of PI3K in ALK-induced cell migration: RNAi-based selection of cell migration regulators.

    PubMed

    Seo, Minchul; Kim, Jong-Heon; Suk, Kyoungho

    2017-05-04

    Recently, unbiased functional genetic selection identified novel cell migration-regulating genes. This RNAi-based functional selection was performed using 63,996 pooled lentiviral shRNAs targeting 21,332 mouse genes. After five rounds of selection using cells with accelerated or impaired migration, shRNAs were retrieved and identified by half-hairpin barcode sequencing using cells with the selected phenotypes. This selection process led to the identification of 29 novel cell migration regulators. One of these candidates, anaplastic lymphoma kinase (ALK), was further investigated. Subsequent studies revealed that ALK promoted cell migration through the PI3K-AKT pathway via the p55γ regulatory subunit of PI3K, rather than more commonly used p85 subunit. Western blot and immunohistochemistry studies using mouse brain tissues revealed similar temporal expression patterns of ALK, phospho-p55γ, and phospho-AKT during different stages of development. These data support an important role for the p55γ subunit of PI3K in ALK-induced cell migration during brain development.

  9. Selection and evaluation of novel reference genes for quantitative reverse transcription PCR (qRT-PCR) based on genome and transcriptome data in Brassica napus L.

    PubMed

    Yang, Hongli; Liu, Jing; Huang, Shunmou; Guo, Tingting; Deng, Linbin; Hua, Wei

    2014-03-15

    Selection of reference genes in Brassica napus, a tetraploid (4×) species, is a very difficult task without information on genome and transcriptome. By now, only several traditional reference genes which show significant expression differentiation under different conditions are used in B. napus. In the present study, based on genome and transcriptome data of the rapeseed Zhongshuang-11 cultivar, 14 candidate reference genes were screened for investigation in different tissues, cultivars, and treated conditions of B. napus. These genes were as follows: ELF5, ENTH, F-BOX7, F-BOX2, FYPP1, GDI1, GYF, MCP2d, OTP80, PPR, SPOC, Unknown1, Unknown2 and UBA. Among them, excluding GYF and FYPP1, another 12 genes, were identified to perform better than traditional reference genes ACTIN7 and GAPDH. To further validate the accuracy of the newly developed reference genes in normalization, expression levels of BnCAT1 (B. napus catalase 1) in different rapeseed tissues and seedlings under stress conditions were normalized by the three most stable reference genes PPR, GDI1, and ENTH and little difference existed in normalization results. To the best of our knowledge, this is the first time B. napus reference genes have been provided with the help of complete genome and transcriptome information. The new reference genes provided in this study are more accurate than previously reported reference genes in quantifying expression levels of B. napus genes. Crown Copyright © 2014. Published by Elsevier B.V. All rights reserved.

  10. TMEM88, CCL14 and CLEC3B as prognostic biomarkers for prognosis and palindromia of human hepatocellular carcinoma.

    PubMed

    Zhang, Xin; Wan, Jin-Xiang; Ke, Zun-Ping; Wang, Feng; Chai, Hai-Xia; Liu, Jia-Qiang

    2017-07-01

    Hepatocellular carcinoma is one of the most mortal and prevalent cancers with increasing incidence worldwide. Elucidating genetic driver genes for prognosis and palindromia of hepatocellular carcinoma helps managing clinical decisions for patients. In this study, the high-throughput RNA sequencing data on platform IlluminaHiSeq of hepatocellular carcinoma were downloaded from The Cancer Genome Atlas with 330 primary hepatocellular carcinoma patient samples. Stable key genes with differential expressions were identified with which Kaplan-Meier survival analysis was performed using Cox proportional hazards test in R language. Driver genes influencing the prognosis of this disease were determined using clustering analysis. Functional analysis of driver genes was performed by literature search and Gene Set Enrichment Analysis. Finally, the selected driver genes were verified using external dataset GSE40873. A total of 5781 stable key genes were identified, including 156 genes definitely related to prognoses of hepatocellular carcinoma. Based on the significant key genes, samples were grouped into five clusters which were further integrated into high- and low-risk classes based on clinical features. TMEM88, CCL14, and CLEC3B were selected as driver genes which clustered high-/low-risk patients successfully (generally, p = 0.0005124445). Finally, survival analysis of the high-/low-risk samples from external database illustrated significant difference with p value 0.0198. In conclusion, TMEM88, CCL14, and CLEC3B genes were stable and available in predicting the survival and palindromia time of hepatocellular carcinoma. These genes could function as potential prognostic genes contributing to improve patients' outcomes and survival.

  11. CRISPR/Cas9-mediated efficient genome editing via blastospore-based transformation in entomopathogenic fungus Beauveria bassiana.

    PubMed

    Chen, Jingjing; Lai, Yiling; Wang, Lili; Zhai, Suzhen; Zou, Gen; Zhou, Zhihua; Cui, Chunlai; Wang, Sibao

    2017-04-03

    Beauveria bassiana is an environmentally friendly alternative to chemical insecticides against various agricultural insect pests and vectors of human diseases. However, its application has been limited due to slow kill and sensitivity to abiotic stresses. Understanding of the molecular pathogenesis and physiological characteristics would facilitate improvement of the fungal performance. Loss-of-function mutagenesis is the most powerful tool to characterize gene functions, but it is hampered by the low rate of homologous recombination and the limited availability of selectable markers. Here, by combining the use of uridine auxotrophy as recipient and donor DNAs harboring auxotrophic complementation gene ura5 as a selectable marker with the blastospore-based transformation system, we established a highly efficient, low false-positive background and cost-effective CRISPR/Cas9-mediated gene editing system in B. bassiana. This system has been demonstrated as a simple and powerful tool for targeted gene knock-out and/or knock-in in B. bassiana in a single gene disruption. We further demonstrated that our system allows simultaneous disruption of multiple genes via homology-directed repair in a single transformation. This technology will allow us to study functionally redundant genes and holds significant potential to greatly accelerate functional genomics studies of B. bassiana.

  12. CRISPR/Cas9-mediated efficient genome editing via blastospore-based transformation in entomopathogenic fungus Beauveria bassiana

    PubMed Central

    Chen, Jingjing; Lai, Yiling; Wang, Lili; Zhai, Suzhen; Zou, Gen; Zhou, Zhihua; Cui, Chunlai; Wang, Sibao

    2017-01-01

    Beauveria bassiana is an environmentally friendly alternative to chemical insecticides against various agricultural insect pests and vectors of human diseases. However, its application has been limited due to slow kill and sensitivity to abiotic stresses. Understanding of the molecular pathogenesis and physiological characteristics would facilitate improvement of the fungal performance. Loss-of-function mutagenesis is the most powerful tool to characterize gene functions, but it is hampered by the low rate of homologous recombination and the limited availability of selectable markers. Here, by combining the use of uridine auxotrophy as recipient and donor DNAs harboring auxotrophic complementation gene ura5 as a selectable marker with the blastospore-based transformation system, we established a highly efficient, low false-positive background and cost-effective CRISPR/Cas9-mediated gene editing system in B. bassiana. This system has been demonstrated as a simple and powerful tool for targeted gene knock-out and/or knock-in in B. bassiana in a single gene disruption. We further demonstrated that our system allows simultaneous disruption of multiple genes via homology-directed repair in a single transformation. This technology will allow us to study functionally redundant genes and holds significant potential to greatly accelerate functional genomics studies of B. bassiana. PMID:28368054

  13. Automatic segmentation and supervised learning-based selection of nuclei in cancer tissue images.

    PubMed

    Nandy, Kaustav; Gudla, Prabhakar R; Amundsen, Ryan; Meaburn, Karen J; Misteli, Tom; Lockett, Stephen J

    2012-09-01

    Analysis of preferential localization of certain genes within the cell nuclei is emerging as a new technique for the diagnosis of breast cancer. Quantitation requires accurate segmentation of 100-200 cell nuclei in each tissue section to draw a statistically significant result. Thus, for large-scale analysis, manual processing is too time consuming and subjective. Fortuitously, acquired images generally contain many more nuclei than are needed for analysis. Therefore, we developed an integrated workflow that selects, following automatic segmentation, a subpopulation of accurately delineated nuclei for positioning of fluorescence in situ hybridization-labeled genes of interest. Segmentation was performed by a multistage watershed-based algorithm and screening by an artificial neural network-based pattern recognition engine. The performance of the workflow was quantified in terms of the fraction of automatically selected nuclei that were visually confirmed as well segmented and by the boundary accuracy of the well-segmented nuclei relative to a 2D dynamic programming-based reference segmentation method. Application of the method was demonstrated for discriminating normal and cancerous breast tissue sections based on the differential positioning of the HES5 gene. Automatic results agreed with manual analysis in 11 out of 14 cancers, all four normal cases, and all five noncancerous breast disease cases, thus showing the accuracy and robustness of the proposed approach. Published 2012 Wiley Periodicals, Inc.

  14. FOXP2 Targets Show Evidence of Positive Selection in European Populations

    PubMed Central

    Ayub, Qasim; Yngvadottir, Bryndis; Chen, Yuan; Xue, Yali; Hu, Min; Vernes, Sonja C.; Fisher, Simon E.; Tyler-Smith, Chris

    2013-01-01

    Forkhead box P2 (FOXP2) is a highly conserved transcription factor that has been implicated in human speech and language disorders and plays important roles in the plasticity of the developing brain. The pattern of nucleotide polymorphisms in FOXP2 in modern populations suggests that it has been the target of positive (Darwinian) selection during recent human evolution. In our study, we searched for evidence of selection that might have followed FOXP2 adaptations in modern humans. We examined whether or not putative FOXP2 targets identified by chromatin-immunoprecipitation genomic screening show evidence of positive selection. We developed an algorithm that, for any given gene list, systematically generates matched lists of control genes from the Ensembl database, collates summary statistics for three frequency-spectrum-based neutrality tests from the low-coverage resequencing data of the 1000 Genomes Project, and determines whether these statistics are significantly different between the given gene targets and the set of controls. Overall, there was strong evidence of selection of FOXP2 targets in Europeans, but not in the Han Chinese, Japanese, or Yoruba populations. Significant outliers included several genes linked to cellular movement, reproduction, development, and immune cell trafficking, and 13 of these constituted a significant network associated with cardiac arteriopathy. Strong signals of selection were observed for CNTNAP2 and RBFOX1, key neurally expressed genes that have been consistently identified as direct FOXP2 targets in multiple studies and that have themselves been associated with neurodevelopmental disorders involving language dysfunction. PMID:23602712

  15. RNA sequencing to study gene expression and SNP variations associated with growth in zebrafish fed a plant protein-based diet.

    PubMed

    Ulloa, Pilar E; Rincón, Gonzalo; Islas-Trejo, Alma; Araneda, Cristian; Iturra, Patricia; Neira, Roberto; Medrano, Juan F

    2015-06-01

    The objectives of this study were to measure gene expression in zebrafish and then identify SNP to be used as potential markers in a growth association study. We developed an approach where muscle samples collected from low- and high-growth fish were analyzed using RNA-Sequencing (RNA-seq), and SNP were chosen from the genes that were differentially expressed between the low and high groups. A population of 24 families was fed a plant protein-based diet from the larval to adult stages. From a total of 440 males, 5 % of the fish from both tails of the weight gain distribution were selected. Total RNA was extracted from individual muscle of 8 low-growth and 8 high-growth fish. Two pooled RNA-Seq libraries were prepared for each phenotype using 4 fish per library. Libraries were sequenced using the Illumina GAII Sequencer and analyzed using the CLCBio genomic workbench software. One hundred and twenty-four genes were differentially expressed between phenotypes (p value < 0.05 and FDR < 0.2). From these genes, 164 SNP were selected and genotyped in 240 fish samples. Marker-trait analysis revealed 5 SNP associated with growth in key genes (Nars, Lmod2b, Cuzd1, Acta1b, and Plac8l1). These genes are good candidates for further growth studies in fish and to consider for identification of potential SNPs associated with different growth rates in response to a plant protein-based diet.

  16. Reference Genes for Accurate Transcript Normalization in Citrus Genotypes under Different Experimental Conditions

    PubMed Central

    Mafra, Valéria; Kubo, Karen S.; Alves-Ferreira, Marcio; Ribeiro-Alves, Marcelo; Stuart, Rodrigo M.; Boava, Leonardo P.; Rodrigues, Carolina M.; Machado, Marcos A.

    2012-01-01

    Real-time reverse transcription PCR (RT-qPCR) has emerged as an accurate and widely used technique for expression profiling of selected genes. However, obtaining reliable measurements depends on the selection of appropriate reference genes for gene expression normalization. The aim of this work was to assess the expression stability of 15 candidate genes to determine which set of reference genes is best suited for transcript normalization in citrus in different tissues and organs and leaves challenged with five pathogens (Alternaria alternata, Phytophthora parasitica, Xylella fastidiosa and Candidatus Liberibacter asiaticus). We tested traditional genes used for transcript normalization in citrus and orthologs of Arabidopsis thaliana genes described as superior reference genes based on transcriptome data. geNorm and NormFinder algorithms were used to find the best reference genes to normalize all samples and conditions tested. Additionally, each biotic stress was individually analyzed by geNorm. In general, FBOX (encoding a member of the F-box family) and GAPC2 (GAPDH) was the most stable candidate gene set assessed under the different conditions and subsets tested, while CYP (cyclophilin), TUB (tubulin) and CtP (cathepsin) were the least stably expressed genes found. Validation of the best suitable reference genes for normalizing the expression level of the WRKY70 transcription factor in leaves infected with Candidatus Liberibacter asiaticus showed that arbitrary use of reference genes without previous testing could lead to misinterpretation of data. Our results revealed FBOX, SAND (a SAND family protein), GAPC2 and UPL7 (ubiquitin protein ligase 7) to be superior reference genes, and we recommend their use in studies of gene expression in citrus species and relatives. This work constitutes the first systematic analysis for the selection of superior reference genes for transcript normalization in different citrus organs and under biotic stress. PMID:22347455

  17. A Neutrality Test for Detecting Selection on DNA Methylation Using Single Methylation Polymorphism Frequency Spectrum

    PubMed Central

    Wang, Jun; Fan, Chuanzhu

    2015-01-01

    Inheritable epigenetic mutations (epimutations) can contribute to transmittable phenotypic variation. Thus, epimutations can be subject to natural selection and impact the fitness and evolution of organisms. Based on the framework of the modified Tajima’s D test for DNA mutations, we developed a neutrality test with the statistic “Dm” to detect selection forces on DNA methylation mutations using single methylation polymorphisms. With computer simulation and empirical data analysis, we compared the Dm test with the original and modified Tajima’s D tests and demonstrated that the Dm test is suitable for detecting selection on epimutations and outperforms original/modified Tajima’s D tests. Due to the higher resetting rate of epimutations, the interpretation of Dm on epimutations and Tajima’s D test on DNA mutations could be different in inferring natural selection. Analyses using simulated and empirical genome-wide polymorphism data suggested that genes under genetic and epigenetic selections behaved differently. We applied the Dm test to recently originated Arabidopsis and human genes, and showed that newly evolved genes contain higher level of rare epialleles, suggesting that epimutation may play a role in origination and evolution of genes and genomes. Overall, we demonstrate the utility of the Dm test to detect whether the loci are under selection regarding DNA methylation. Our analytical metrics and methodology could contribute to our understanding of evolutionary processes of genes and genomes in the field of epigenetics. The Perl script for the “Dm” test is available at http://fanlab.wayne.edu/ (last accessed December 18, 2014). PMID:25539727

  18. Systems Biology-Based Identification of Mycobacterium tuberculosis Persistence Genes in Mouse Lungs

    PubMed Central

    Dutta, Noton K.; Bandyopadhyay, Nirmalya; Veeramani, Balaji; Lamichhane, Gyanu; Karakousis, Petros C.; Bader, Joel S.

    2014-01-01

    ABSTRACT Identifying Mycobacterium tuberculosis persistence genes is important for developing novel drugs to shorten the duration of tuberculosis (TB) treatment. We developed computational algorithms that predict M. tuberculosis genes required for long-term survival in mouse lungs. As the input, we used high-throughput M. tuberculosis mutant library screen data, mycobacterial global transcriptional profiles in mice and macrophages, and functional interaction networks. We selected 57 unique, genetically defined mutants (18 previously tested and 39 untested) to assess the predictive power of this approach in the murine model of TB infection. We observed a 6-fold enrichment in the predicted set of M. tuberculosis genes required for persistence in mouse lungs relative to randomly selected mutant pools. Our results also allowed us to reclassify several genes as required for M. tuberculosis persistence in vivo. Finally, the new results implicated additional high-priority candidate genes for testing. Experimental validation of computational predictions demonstrates the power of this systems biology approach for elucidating M. tuberculosis persistence genes. PMID:24549847

  19. Analysis of host response to bacterial infection using error model based gene expression microarray experiments

    PubMed Central

    Stekel, Dov J.; Sarti, Donatella; Trevino, Victor; Zhang, Lihong; Salmon, Mike; Buckley, Chris D.; Stevens, Mark; Pallen, Mark J.; Penn, Charles; Falciani, Francesco

    2005-01-01

    A key step in the analysis of microarray data is the selection of genes that are differentially expressed. Ideally, such experiments should be properly replicated in order to infer both technical and biological variability, and the data should be subjected to rigorous hypothesis tests to identify the differentially expressed genes. However, in microarray experiments involving the analysis of very large numbers of biological samples, replication is not always practical. Therefore, there is a need for a method to select differentially expressed genes in a rational way from insufficiently replicated data. In this paper, we describe a simple method that uses bootstrapping to generate an error model from a replicated pilot study that can be used to identify differentially expressed genes in subsequent large-scale studies on the same platform, but in which there may be no replicated arrays. The method builds a stratified error model that includes array-to-array variability, feature-to-feature variability and the dependence of error on signal intensity. We apply this model to the characterization of the host response in a model of bacterial infection of human intestinal epithelial cells. We demonstrate the effectiveness of error model based microarray experiments and propose this as a general strategy for a microarray-based screening of large collections of biological samples. PMID:15800204

  20. Effect of Aggregation Operators on Network-Based Disease Gene Prioritization: A Case Study on Blood Disorders.

    PubMed

    Grewal, Nivit; Singh, Shailendra; Chand, Trilok

    2017-01-01

    Owing to the innate noise in the biological data sources, a single source or a single measure do not suffice for an effective disease gene prioritization. So, the integration of multiple data sources or aggregation of multiple measures is the need of the hour. The aggregation operators combine multiple related data values to a single value such that the combined value has the effect of all the individual values. In this paper, an attempt has been made for applying the fuzzy aggregation on the network-based disease gene prioritization and investigate its effect under noise conditions. This study has been conducted for a set of 15 blood disorders by fusing four different network measures, computed from the protein interaction network, using a selected set of aggregation operators and ranking the genes on the basis of the aggregated value. The aggregation operator-based rankings have been compared with the "Random walk with restart" gene prioritization method. The impact of noise has also been investigated by adding varying proportions of noise to the seed set. The results reveal that for all the selected blood disorders, the Mean of Maximal operator has relatively outperformed the other aggregation operators for noisy as well as non-noisy data.

  1. Antibiotic Combinations That Enable One-Step, Targeted Mutagenesis of Chromosomal Genes.

    PubMed

    Lee, Wonsik; Do, Truc; Zhang, Ge; Kahne, Daniel; Meredith, Timothy C; Walker, Suzanne

    2018-06-08

    Targeted modification of bacterial chromosomes is necessary to understand new drug targets, investigate virulence factors, elucidate cell physiology, and validate results of -omics-based approaches. For some bacteria, reverse genetics remains a major bottleneck to progress in research. Here, we describe a compound-centric strategy that combines new negative selection markers with known positive selection markers to achieve simple, efficient one-step genome engineering of bacterial chromosomes. The method was inspired by the observation that certain nonessential metabolic pathways contain essential late steps, suggesting that antibiotics targeting a late step can be used to select for the absence of genes that control flux into the pathway. Guided by this hypothesis, we have identified antibiotic/counterselectable markers to accelerate reverse engineering of two increasingly antibiotic-resistant pathogens, Staphylococcus aureus and Acinetobacter baumannii. For S. aureus, we used wall teichoic acid biosynthesis inhibitors to select for the absence of tarO and for A. baumannii, we used colistin to select for the absence of lpxC. We have obtained desired gene deletions, gene fusions, and promoter swaps in a single plating step with perfect efficiency. Our method can also be adapted to generate markerless deletions of genes using FLP recombinase. The tools described here will accelerate research on two important pathogens, and the concept we outline can be readily adapted to any organism for which a suitable target pathway can be identified.

  2. Machine Learning–Based Differential Network Analysis: A Study of Stress-Responsive Transcriptomes in Arabidopsis[W

    PubMed Central

    Ma, Chuang; Xin, Mingming; Feldmann, Kenneth A.; Wang, Xiangfeng

    2014-01-01

    Machine learning (ML) is an intelligent data mining technique that builds a prediction model based on the learning of prior knowledge to recognize patterns in large-scale data sets. We present an ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning–based differential network analysis (mlDNA) and apply this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana. The mlDNA first used a ML-based filtering process to remove nonexpressed, constitutively expressed, or non-stress-responsive “noninformative” genes prior to network construction, through learning the patterns of 32 expression characteristics of known stress-related genes. The retained “informative” genes were subsequently analyzed by ML-based network comparison to predict candidate stress-related genes showing expression and network differences between control and stress networks, based on 33 network topological characteristics. Comparative evaluation of the network-centric and gene-centric analytic methods showed that mlDNA substantially outperformed traditional statistical testing–based differential expression analysis at identifying stress-related genes, with markedly improved prediction accuracy. To experimentally validate the mlDNA predictions, we selected 89 candidates out of the 1784 predicted salt stress–related genes with available SALK T-DNA mutagenesis lines for phenotypic screening and identified two previously unreported genes, mutants of which showed salt-sensitive phenotypes. PMID:24520154

  3. Feature weight estimation for gene selection: a local hyperlinear learning approach

    PubMed Central

    2014-01-01

    Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071

  4. GESearch: An Interactive GUI Tool for Identifying Gene Expression Signature.

    PubMed

    Ye, Ning; Yin, Hengfu; Liu, Jingjing; Dai, Xiaogang; Yin, Tongming

    2015-01-01

    The huge amount of gene expression data generated by microarray and next-generation sequencing technologies present challenges to exploit their biological meanings. When searching for the coexpression genes, the data mining process is largely affected by selection of algorithms. Thus, it is highly desirable to provide multiple options of algorithms in the user-friendly analytical toolkit to explore the gene expression signatures. For this purpose, we developed GESearch, an interactive graphical user interface (GUI) toolkit, which is written in MATLAB and supports a variety of gene expression data files. This analytical toolkit provides four models, including the mean, the regression, the delegate, and the ensemble models, to identify the coexpression genes, and enables the users to filter data and to select gene expression patterns by browsing the display window or by importing knowledge-based genes. Subsequently, the utility of this analytical toolkit is demonstrated by analyzing two sets of real-life microarray datasets from cell-cycle experiments. Overall, we have developed an interactive GUI toolkit that allows for choosing multiple algorithms for analyzing the gene expression signatures.

  5. Ion channel gene expression predicts survival in glioma patients

    PubMed Central

    Wang, Rong; Gurguis, Christopher I.; Gu, Wanjun; Ko, Eun A; Lim, Inja; Bang, Hyoweon; Zhou, Tong; Ko, Jae-Hong

    2015-01-01

    Ion channels are important regulators in cell proliferation, migration, and apoptosis. The malfunction and/or aberrant expression of ion channels may disrupt these important biological processes and influence cancer progression. In this study, we investigate the expression pattern of ion channel genes in glioma. We designate 18 ion channel genes that are differentially expressed in high-grade glioma as a prognostic molecular signature. This ion channel gene expression based signature predicts glioma outcome in three independent validation cohorts. Interestingly, 16 of these 18 genes were down-regulated in high-grade glioma. This signature is independent of traditional clinical, molecular, and histological factors. Resampling tests indicate that the prognostic power of the signature outperforms random gene sets selected from human genome in all the validation cohorts. More importantly, this signature performs better than the random gene signatures selected from glioma-associated genes in two out of three validation datasets. This study implicates ion channels in brain cancer, thus expanding on knowledge of their roles in other cancers. Individualized profiling of ion channel gene expression serves as a superior and independent prognostic tool for glioma patients. PMID:26235283

  6. ROKU: a novel method for identification of tissue-specific genes

    PubMed Central

    Kadota, Koji; Ye, Jiazhen; Nakai, Yuji; Terada, Tohru; Shimizu, Kentaro

    2006-01-01

    Background One of the important goals of microarray research is the identification of genes whose expression is considerably higher or lower in some tissues than in others. We would like to have ways of identifying such tissue-specific genes. Results We describe a method, ROKU, which selects tissue-specific patterns from gene expression data for many tissues and thousands of genes. ROKU ranks genes according to their overall tissue specificity using Shannon entropy and detects tissues specific to each gene if any exist using an outlier detection method. We evaluated the capacity for the detection of various specific expression patterns using synthetic and real data. We observed that ROKU was superior to a conventional entropy-based method in its ability to rank genes according to overall tissue specificity and to detect genes whose expression pattern are specific only to objective tissues. Conclusion ROKU is useful for the detection of various tissue-specific expression patterns. The framework is also directly applicable to the selection of diagnostic markers for molecular classification of multiple classes. PMID:16764735

  7. Genetic improvement of Escherichia coli for ethanol production: Chromosomal integration of Zymomonas mobilis genes encoding pyruvate decarboxylase and alcohol dehydrogenase II

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ohta, Kazuyoshi; Beall, D.S.; Mejia, J.P.

    1991-04-01

    Zymomonas mobilis genes for pyruvate decarboxylase (pdc) and alcohol dehydrogenase II (adhB) were integrated into the Escherichia coli chromosome within or near the pyruvate formate-lyase gene (pfl). Integration improved the stability of the Z. mobilis genes in E. coli, but further selection was required to increase expression. Spontaneous mutants were selected for resistance to high levels of chloramphenicol that also expressed high levels of the Z. mobilis genes. Analogous mutants were selected for increased expression of alcohol dehydrogenase on aldehyde indicator plates. These mutants were functionally equivalent to the previous plasmid-based strains for the fermentation of xylose and glucose tomore » ethanol. Ethanol concentrations of 54.4 and 41.6 g/liter were obtained from 10% glucose and 8% xylose, respectively. The efficiency of conversion exceeded theoretical limits (0.51 g of ethanol/g of sugar) on the basis of added sugars because of the additional production of ethanol from the catabolism of complex nutrients. Further mutations were introduced to inactivate succinate production (frd) and to block homologous recombination (recA).« less

  8. Evidence for a Common Toolbox Based on Necrotrophy in a Fungal Lineage Spanning Necrotrophs, Biotrophs, Endophytes, Host Generalists and Specialists

    PubMed Central

    Andrew, Marion; Barua, Reeta; Short, Steven M.; Kohn, Linda M.

    2012-01-01

    The Sclerotiniaceae (Ascomycotina, Leotiomycetes) is a relatively recently evolved lineage of necrotrophic host generalists, and necrotrophic or biotrophic host specialists, some latent or symptomless. We hypothesized that they inherited a basic toolbox of genes for plant symbiosis from their common ancestor. Maintenance and evolutionary diversification of symbiosis could require selection on toolbox genes or on timing and magnitude of gene expression. The genes studied were chosen because their products have been previously investigated as pathogenicity factors in the Sclerotiniaceae. They encode proteins associated with cell wall degradation: acid protease 1 (acp1), aspartyl protease (asps), and polygalacturonases (pg1, pg3, pg5, pg6), and the oxalic acid (OA) pathway: a zinc finger transcription factor (pac1), and oxaloacetate acetylhydrolase (oah), catalyst in OA production, essential for full symptom production in Sclerotinia sclerotiorum. Site-specific likelihood analyses provided evidence for purifying selection in all 8 pathogenicity-related genes. Consistent with an evolutionary arms race model, positive selection was detected in 5 of 8 genes. Only generalists produced large, proliferating disease lesions on excised Arabidopsis thaliana leaves and oxalic acid by 72 hours in vitro. In planta expression of oah was 10–300 times greater among the necrotrophic host generalists than necrotrophic and biotrophic host specialists; pac1 was not differentially expressed. Ability to amplify 6/8 pathogenicity related genes and produce oxalic acid in all genera are consistent with the common toolbox hypothesis for this gene sample. That our data did not distinguish biotrophs from necrotrophs is consistent with 1) a common toolbox based on necrotrophy and 2) the most conservative interpretation of the 3-locus housekeeping gene phylogeny – a baseline of necrotrophy from which forms of biotrophy emerged at least twice. Early oah overexpression likely expands the host range of necrotrophic generalists in the Sclerotiniaceae, while specialists and biotrophs deploy oah, or other as-yet-unknown toolbox genes, differently. PMID:22253834

  9. Augmenting Microarray Data with Literature-Based Knowledge to Enhance Gene Regulatory Network Inference

    PubMed Central

    Kilicoglu, Halil; Shin, Dongwook; Rindflesch, Thomas C.

    2014-01-01

    Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology. PMID:24921649

  10. Augmenting microarray data with literature-based knowledge to enhance gene regulatory network inference.

    PubMed

    Chen, Guocai; Cairelli, Michael J; Kilicoglu, Halil; Shin, Dongwook; Rindflesch, Thomas C

    2014-06-01

    Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.

  11. Evolution of DUF1313 family members across plant species and their association with maize photoperiod sensitivity.

    PubMed

    Li, Jia; Hu, Erliang; Chen, Xueying; Xu, Jie; Lan, Hai; Li, Chuan; Hu, Yaodong; Lu, Yanli

    2016-05-01

    Proteins of the DUF1313 family contain a highly conserved domain and are only found in plants; they play important roles in most plant functions. In this study, 269 DUF1313 genes from 81 photoautotrophic species were identified; they were classified into three major types based on the amino acid substitutions in the conserved region: IARV, I(S/T/F)(K/R)V, and IRRV. Phylogenic tree constructed from 51 DUF1313 genes from graminoids revealed three clades: A, B1, and B2. Clade B1 was found to have undergone episodic positive selection after a gene duplication event and included four amino acid sites under positive selection. The association between DUF1313 family members and traits investigated in maize indicated that three of four genes (GRMZM2G025646, GRMZM5G877647, GRMZM2G359322, and GRMZM2G382774) were associated with the target traits such as days to silking, days to tasselling, and plant height. The nucleotide diversity of the most primitive and highly conserved DUF1313 gene, ELF4-like4, was the highest in Tripsacum and the lowest in maize. Tajima's D and Fu and Li's D tests revealed that significant purifying selection had occurred in the coding sequence region of this DUF1313 gene in teosinte and maize. No significant signal was detected in the 5'-untranslated region of this gene in each of the three species (maize, teosinte, and Tripsacum) or in any gene regions of Tripsacum. Phylogenetic analyses revealed that the 103 accessions of maize, teosinte, and Tripsacum can be grouped into four clades based on the ELF4-like4 gene sequence similarity. Thus, this gene can be used to determine the relationships between maize and its relatives, and the DUF1313 family members and alleles identified in this study might be valuable genetic resources for molecular marker-assisted breeding in maize. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Development of a qPCR Strategy to Select Bean Genes Involved in Plant Defense Response and Regulated by the Trichoderma velutinum - Rhizoctonia solani Interaction.

    PubMed

    Mayo, Sara; Cominelli, Eleonora; Sparvoli, Francesca; González-López, Oscar; Rodríguez-González, Alvaro; Gutiérrez, Santiago; Casquero, Pedro A

    2016-01-01

    Bean production is affected by a wide diversity of fungal pathogens, among them Rhizoctonia solani is one of the most important. A strategy to control bean infectious diseases, mainly those caused by fungi, is based on the use of biocontrol agents (BCAs) that can reduce the negative effects of plant pathogens and also can promote positive responses in the plant. Trichoderma is a fungal genus that is able to induce the expression of genes involved in plant defense response and also to promote plant growth, root development and nutrient uptake. In this article, a strategy that combines in silico analysis and real time PCR to detect additional bean defense-related genes, regulated by the presence of Trichoderma velutinum and/or R. solani has been applied. Based in this strategy, from the 48 bean genes initially analyzed, 14 were selected, and only WRKY33, CH5b and hGS showed an up-regulatory response in the presence of T. velutinum. The other genes were or not affected (OSM34) or down-regulated by the presence of this fungus. R. solani infection resulted in a down-regulation of most of the genes analyzed, except PR1, OSM34 and CNGC2 that were not affected, and the presence of both, T. velutinum and R. solani, up-regulates hGS and down-regulates all the other genes analyzed, except CH5b which was not significantly affected. As conclusion, the strategy described in the present work has been shown to be effective to detect genes involved in plant defense, which respond to the presence of a BCA or to a pathogen and also to the presence of both. The selected genes show significant homology with previously described plant defense genes and they are expressed in bean leaves of plants treated with T. velutinum and/or infected with R. solani.

  13. Analysis of genetic association using hierarchical clustering and cluster validation indices.

    PubMed

    Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

    2017-10-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Dynamic association rules for gene expression data analysis.

    PubMed

    Chen, Shu-Chuan; Tsai, Tsung-Hsien; Chung, Cheng-Han; Li, Wen-Hsiung

    2015-10-14

    The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease. In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.

  15. High polymorphism in MHC-DRB genes in golden snub-nosed monkeys reveals balancing selection in small, isolated populations.

    PubMed

    Zhang, Pei; Huang, Kang; Zhang, Bingyi; Dunn, Derek W; Chen, Dan; Li, Fan; Qi, Xiaoguang; Guo, Songtao; Li, Baoguo

    2018-03-13

    Maintaining variation in immune genes, such as those of the major histocompatibility complex (MHC), is important for individuals in small, isolated populations to resist pathogens and parasites. The golden snub-nosed monkey (Rhinopithecus roxellana), an endangered primate endemic to China, has experienced a rapid reduction in numbers and severe population fragmentation over recent years. For this study, we measured the DRB diversity among 122 monkeys from three populations in the Qinling Mountains, and estimated the relative importance of different agents of selection in maintaining variation of DRB genes. We identified a total of 19 DRB sequences, in which five alleles were novel. We found high DRB variation in R. roxellana and three branches of evidence suggesting that balancing selection has contributed to maintaining MHC polymorphism over the long term in this species: i) different patterns of both genetic diversity and population differentiation were detected at MHC and neutral markers; ii) an excess of non-synonymous substitutions compared to synonymous substitutions at antigen binding sites, and maximum-likelihood-based random-site models, showed significant positive selection; and iii) phylogenetic analyses revealed a pattern of trans-species evolution for DRB genes. High levels of DRB diversity in these R. roxellana populations may reflect strong selection pressure in this species. Patterns of genetic diversity and population differentiation, positive selection, as well as trans-species evolution, suggest that pathogen-mediated balancing selection has contributed to maintaining MHC polymorphism in R. roxellana over the long term. This study furthers our understanding of the role pathogen-mediated balancing selection has in maintaining variation in MHC genes in small and fragmented populations of free-ranging vertebrates.

  16. Identification of Id4 as a regulator of BRCA1 expression by using a ribozyme-library-based inverse genomics approach

    PubMed Central

    Beger, Carmela; Pierce, Leigh N.; Krüger, Martin; Marcusson, Eric G.; Robbins, Joan M.; Welcsh, Piri; Welch, Peter J.; Welte, Karl; King, Mary-Claire; Barber, Jack R.; Wong-Staal, Flossie

    2001-01-01

    Expression of the breast and ovarian cancer susceptibility gene BRCA1 is down-regulated in sporadic breast and ovarian cancer cases. Therefore, the identification of genes involved in the regulation of BRCA1 expression might lead to new insights into the pathogenesis and treatment of these tumors. In the present study, an “inverse genomics” approach based on a randomized ribozyme gene library was applied to identify cellular genes regulating BRCA1 expression. A ribozyme gene library with randomized target recognition sequences was introduced into human ovarian cancer-derived cells stably expressing a selectable marker [enhanced green fluorescence protein (EGFP)] under the control of the BRCA1 promoter. Cells in which BRCA1 expression was upregulated by particular ribozymes were selected through their concomitant increase in EGFP expression. The cellular target gene of one ribozyme was identified to be the dominant negative transcriptional regulator Id4. Modulation of Id4 expression resulted in inversely regulated expression of BRCA1. In addition, increase in Id4 expression was associated with the ability of cells to exhibit anchorage-independent growth, demonstrating the biological relevance of this gene. Our data suggest that Id4 is a crucial gene regulating BRCA1 expression and might therefore be important for the BRCA1 regulatory pathway involved in the pathogenesis of sporadic breast and ovarian cancer. PMID:11136250

  17. Genomic Selection in Plant Breeding: Methods, Models, and Perspectives.

    PubMed

    Crossa, José; Pérez-Rodríguez, Paulino; Cuevas, Jaime; Montesinos-López, Osval; Jarquín, Diego; de Los Campos, Gustavo; Burgueño, Juan; González-Camacho, Juan M; Pérez-Elizalde, Sergio; Beyene, Yoseph; Dreisigacker, Susanne; Singh, Ravi; Zhang, Xuecai; Gowda, Manje; Roorkiwal, Manish; Rutkoski, Jessica; Varshney, Rajeev K

    2017-11-01

    Genomic selection (GS) facilitates the rapid selection of superior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles, and basis of GS and genomic-enabled prediction (GP) as well as the genetics and statistical complexities of GP models, including genomic genotype×environment (G×E) interactions. We also examine the accuracy of GP models and methods for two cereal crops and two legume crops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement (i.e., prebreeding) programs could accelerate the flow of genes from gene bank accessions to elite lines. Recent advances in hyperspectral image technology could be combined with GS and pedigree-assisted breeding. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Detection of genomic signatures of recent selection in commercial broiler chickens.

    PubMed

    Fu, Weixuan; Lee, William R; Abasht, Behnam

    2016-08-26

    Identification of the genomic signatures of recent selection may help uncover causal polymorphisms controlling traits relevant to recent decades of selective breeding in livestock. In this study, we aimed at detecting signatures of recent selection in commercial broiler chickens using genotype information from single nucleotide polymorphisms (SNPs). A total of 565 chickens from five commercial purebred lines, including three broiler sire (male) lines and two broiler dam (female) lines, were genotyped using the 60K SNP Illumina iSelect chicken array. To detect genomic signatures of recent selection, we applied two methods based on population comparison, cross-population extended haplotype homozygosity (XP-EHH) and cross-population composite likelihood ratio (XP-CLR), and further analyzed the results to find genomic regions under recent selection in multiple purebred lines. A total of 321 candidate selection regions spanning approximately 1.45 % of the chicken genome in each line were detected by consensus of results of both XP-EHH and XP-CLR methods. To minimize false discovery due to genetic drift, only 42 of the candidate selection regions that were shared by 2 or more purebred lines were considered as high-confidence selection regions in the study. Of these 42 regions, 20 were 50 kb or less while 4 regions were larger than 0.5 Mb. In total, 91 genes could be found in the 42 regions, among which 19 regions contained only 1 or 2 genes, and 9 regions were located at gene deserts. Our results provide a genome-wide scan of recent selection signatures in five purebred lines of commercial broiler chickens. We found several candidate genes for recent selection in multiple lines, such as SOX6 (Sex Determining Region Y-Box 6) and cTR (Thyroid hormone receptor beta). These genes may have been under recent selection due to their essential roles in growth, development and reproduction in chickens. Furthermore, our results suggest that in some candidate regions, the same or opposite alleles have been under recent selection in multiple lines. Most of the candidate genes in the selection regions are novel, and as such they should be of great interest for future research into the genetic architecture of traits relevant to modern broiler breeding.

  19. A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization.

    PubMed

    Vafaee Sharbaf, Fatemeh; Mosafer, Sara; Moattar, Mohammad Hossein

    2016-06-01

    This paper proposes an approach for gene selection in microarray data. The proposed approach consists of a primary filter approach using Fisher criterion which reduces the initial genes and hence the search space and time complexity. Then, a wrapper approach which is based on cellular learning automata (CLA) optimized with ant colony method (ACO) is used to find the set of features which improve the classification accuracy. CLA is applied due to its capability to learn and model complicated relationships. The selected features from the last phase are evaluated using ROC curve and the most effective while smallest feature subset is determined. The classifiers which are evaluated in the proposed framework are K-nearest neighbor; support vector machine and naïve Bayes. The proposed approach is evaluated on 4 microarray datasets. The evaluations confirm that the proposed approach can find the smallest subset of genes while approaching the maximum accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Survivin Selectively Modulates Genes Deregulated in Human Leukemia Stem Cells

    PubMed Central

    Fukuda, Seiji; Abe, Mariko; Onishi, Chie; Taketani, Takeshi; Purevsuren, Jamiyan; Yamaguchi, Seiji; Conway, Edward M.; Pelus, Louis M.

    2011-01-01

    ITD-Flt3 mutations are detected in leukemia stem cells (LSCs) in acute myeloid leukemia (AML) patients. While antagonizing Survivin normalizes ITD-Flt3-induced acute leukemia, it also impairs hematopoietic stem cell (HSC) function, indicating that identification of differences in signaling pathways downstream of Survivin between LSC and HSC are crucial to develop selective Survivin-based therapeutic strategies for AML. Using a Survivin-deletion model, we identified 1,096 genes regulated by Survivin in ITD-Flt3-transformed c-kit+, Sca-1+, and lineageneg (KSL) cells, of which 137 are deregulated in human LSC. Of the 137, 124 genes were regulated by Survivin exclusively in ITD-Flt3+ KSL cells but not in normal CD34neg KSL cells. Survivin-regulated genes in LSC connect through a network associated with the epidermal growth factor receptor signaling pathway and falls into various functional categories independent of effects on apoptosis. Pathways downstream of Survivin in LSC that are distinct from HSC can be potentially targeted for selective anti-LSC therapy. PMID:21253548

  1. Missing-value estimation using linear and non-linear regression with Bayesian gene selection.

    PubMed

    Zhou, Xiaobo; Wang, Xiaodong; Dougherty, Edward R

    2003-11-22

    Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).

  2. Selection of Phototransduction Genes in Homo sapiens.

    PubMed

    Christopher, Mark; Scheetz, Todd E; Mullins, Robert F; Abràmoff, Michael D

    2013-08-13

    We investigated the evidence of recent positive selection in the human phototransduction system at single nucleotide polymorphism (SNP) and gene level. SNP genotyping data from the International HapMap Project for European, Eastern Asian, and African populations was used to discover differences in haplotype length and allele frequency between these populations. Numeric selection metrics were computed for each SNP and aggregated into gene-level metrics to measure evidence of recent positive selection. The level of recent positive selection in phototransduction genes was evaluated and compared to a set of genes shown previously to be under recent selection, and a set of highly conserved genes as positive and negative controls, respectively. Six of 20 phototransduction genes evaluated had gene-level selection metrics above the 90th percentile: RGS9, GNB1, RHO, PDE6G, GNAT1, and SLC24A1. The selection signal across these genes was found to be of similar magnitude to the positive control genes and much greater than the negative control genes. There is evidence for selective pressure in the genes involved in retinal phototransduction, and traces of this selective pressure can be demonstrated using SNP-level and gene-level metrics of allelic variation. We hypothesize that the selective pressure on these genes was related to their role in low light vision and retinal adaptation to ambient light changes. Uncovering the underlying genetics of evolutionary adaptations in phototransduction not only allows greater understanding of vision and visual diseases, but also the development of patient-specific diagnostic and intervention strategies.

  3. Dataset of the human homologues and orthologues of lipid-metabolic genes identified as DAF-16 targets their roles in lipid and energy metabolism.

    PubMed

    Fan, Lavender Yuen-Nam; Saavedra-García, Paula; Lam, Eric Wing-Fai

    2017-04-01

    The data presented in this article are related to the review article entitled 'Unravelling the role of fatty acid metabolism in cancer through the FOXO3-FOXM1 axis' (Saavedra-Garcia et al., 2017) [24]. Here, we have matched the DAF-16/FOXO3 downstream genes with their respective human orthologues and reviewed the roles of these targeted genes in FA metabolism. The list of genes listed in this article are precisely selected from literature reviews based on their functions in mammalian FA metabolism. The nematode Caenorhabditis elegans gene orthologues of the genes are obtained from WormBase, the online biological database of C. elegans. This dataset has not been uploaded to a public repository yet.

  4. Ribozyme-based aminoglycoside switches of gene expression engineered by genetic selection in S. cerevisiae.

    PubMed

    Klauser, Benedikt; Atanasov, Janina; Siewert, Lena K; Hartig, Jörg S

    2015-05-15

    Systems for conditional gene expression are powerful tools in basic research as well as in biotechnology. For future applications, it is of great importance to engineer orthogonal genetic switches that function reliably in diverse contexts. RNA-based switches have the advantage that effector molecules interact immediately with regulatory modules inserted into the target RNAs, getting rid of the need of transcription factors usually mediating genetic control. Artificial riboswitches are characterized by their simplicity and small size accompanied by a high degree of modularity. We have recently reported a series of hammerhead ribozyme-based artificial riboswitches that allow for post-transcriptional regulation of gene expression via switching mRNA, tRNA, or rRNA functions. A more widespread application was so far hampered by moderate switching performances and a limited set of effector molecules available. Here, we report the re-engineering of hammerhead ribozymes in order to respond efficiently to aminoglycoside antibiotics. We first established an in vivo selection protocol in Saccharomyces cerevisiae that enabled us to search large sequence spaces for optimized switches. We then envisioned and characterized a novel strategy of attaching the aptamer to the ribozyme catalytic core, increasing the design options for rendering the ribozyme ligand-dependent. These innovations enabled the development of neomycin-dependent RNA modules that switch gene expression up to 25-fold. The presented aminoglycoside-responsive riboswitches belong to the best-performing RNA-based genetic regulators reported so far. The developed in vivo selection protocol should allow for sampling of large sequence spaces for engineering of further optimized riboswitches.

  5. Network-Based Integration of GWAS and Gene Expression Identifies a HOX-Centric Network Associated with Serous Ovarian Cancer Risk.

    PubMed

    Kar, Siddhartha P; Tyrer, Jonathan P; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K H; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V; Bean, Yukie T; Beckmann, Matthias W; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S; Cramer, Daniel; Cunningham, Julie M; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F; Edwards, Robert P; Ekici, Arif B; Fasching, Peter A; Fridley, Brooke L; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A T; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K; Hosono, Satoyo; Iversen, Edwin S; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K; Kelemen, Linda E; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D; Lee, Alice W; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R; McNeish, Iain A; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B; Narod, Steven A; Nedergaard, Lotte; Ness, Roberta B; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M; Permuth-Wey, Jennifer; Phelan, Catherine M; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H; Rudolph, Anja; Runnebaum, Ingo B; Rzepecka, Iwona K; Salvesen, Helga B; Schildkraut, Joellen M; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C; Sucheston-Campbell, Lara E; Tangen, Ingvild L; Teo, Soo-Hwang; Terry, Kathryn L; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S; van Altena, Anne M; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wicklund, Kristine G; Wilkens, Lynne R; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A; Monteiro, Alvaro N A; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P

    2015-10-01

    Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by coexpression may also be enriched for additional EOC risk associations. We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly coexpressed with each selected TF gene in the unified microarray dataset of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this dataset were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P < 0.05 and FDR < 0.05). These results were replicated (P < 0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Network analysis integrating large, context-specific datasets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. ©2015 American Association for Cancer Research.

  6. Network-based integration of GWAS and gene expression identifies a HOX-centric network associated with serous ovarian cancer risk

    PubMed Central

    Kar, Siddhartha P.; Tyrer, Jonathan P.; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K.H.; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S.; Cramer, Daniel; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A.; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goode, Ellen L.; Goodman, Marc T.; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K.; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K.; Kelemen, Linda E.; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A.; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Iain A.; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B.; Narod, Steven A.; Nedergaard, Lotte; Ness, Roberta B.; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M.; Permuth-Wey, Jennifer; Phelan, Catherine M.; Pike, Malcolm C.; Poole, Elizabeth M.; Ramus, Susan J.; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schildkraut, Joellen M.; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C.; Sucheston-Campbell, Lara E.; Tangen, Ingvild L.; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S.; van Altena, Anne M.; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A.; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A.; Monteiro, Alvaro N. A.; Freedman, Matthew L.; Gayther, Simon A.; Pharoah, Paul D. P.

    2015-01-01

    Background Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by co-expression may also be enriched for additional EOC risk associations. Methods We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly co-expressed with each selected TF gene in the unified microarray data set of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this data set were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Results Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P<0.05 and FDR<0.05). These results were replicated (P<0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. Conclusion We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Impact Network analysis integrating large, context-specific data sets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. PMID:26209509

  7. Variations in the Intragene Methylation Profiles Hallmark Induced Pluripotency

    PubMed Central

    Druzhkov, Pavel; Zolotykh, Nikolay; Meyerov, Iosif; Alsaedi, Ahmed; Shutova, Maria; Ivanchenko, Mikhail; Zaikin, Alexey

    2015-01-01

    We demonstrate the potential of differentiating embryonic and induced pluripotent stem cells by the regularized linear and decision tree machine learning classification algorithms, based on a number of intragene methylation measures. The resulting average accuracy of classification has been proven to be above 95%, which overcomes the earlier achievements. We propose a constructive and transparent method of feature selection based on classifier accuracy. Enrichment analysis reveals statistically meaningful presence of stemness group and cancer discriminating genes among the selected best classifying features. These findings stimulate the further research on the functional consequences of these differences in methylation patterns. The presented approach can be broadly used to discriminate the cells of different phenotype or in different state by their methylation profiles, identify groups of genes constituting multifeature classifiers, and assess enrichment of these groups by the sets of genes with a functionality of interest. PMID:26618180

  8. Phylogenetic relatedness determined between antibiotic resistance and 16S rRNA genes in actinobacteria.

    PubMed

    Sagova-Mareckova, Marketa; Ulanova, Dana; Sanderova, Petra; Omelka, Marek; Kamenik, Zdenek; Olsovska, Jana; Kopecky, Jan

    2015-04-01

    Distribution and evolutionary history of resistance genes in environmental actinobacteria provide information on intensity of antibiosis and evolution of specific secondary metabolic pathways at a given site. To this day, actinobacteria producing biologically active compounds were isolated mostly from soil but only a limited range of soil environments were commonly sampled. Consequently, soil remains an unexplored environment in search for novel producers and related evolutionary questions. Ninety actinobacteria strains isolated at contrasting soil sites were characterized phylogenetically by 16S rRNA gene, for presence of erm and ABC transporter resistance genes and antibiotic production. An analogous analysis was performed in silico with 246 and 31 strains from Integrated Microbial Genomes (JGI_IMG) database selected by the presence of ABC transporter genes and erm genes, respectively. In the isolates, distances of erm gene sequences were significantly correlated to phylogenetic distances based on 16S rRNA genes, while ABC transporter gene distances were not. The phylogenetic distance of isolates was significantly correlated to soil pH and organic matter content of isolation sites. In the analysis of JGI_IMG datasets the correlation between phylogeny of resistance genes and the strain phylogeny based on 16S rRNA genes or five housekeeping genes was observed for both the erm genes and ABC transporter genes in both actinobacteria and streptomycetes. However, in the analysis of sequences from genomes where both resistance genes occurred together the correlation was observed for both ABC transporter and erm genes in actinobacteria but in streptomycetes only in the erm gene. The type of erm resistance gene sequences was influenced by linkage to 16S rRNA gene sequences and site characteristics. The phylogeny of ABC transporter gene was correlated to 16S rRNA genes mainly above the genus level. The results support the concept of new specific secondary metabolite scaffolds occurring more likely in taxonomically distant producers but suggest that the antibiotic selection of gene pools is also influenced by site conditions.

  9. 16S rRNA gene-based phylogenetic microarray for simultaneous identification of members of the genus Burkholderia.

    PubMed

    Schönmann, Susan; Loy, Alexander; Wimmersberger, Céline; Sobek, Jens; Aquino, Catharine; Vandamme, Peter; Frey, Beat; Rehrauer, Hubert; Eberl, Leo

    2009-04-01

    For cultivation-independent and highly parallel analysis of members of the genus Burkholderia, an oligonucleotide microarray (phylochip) consisting of 131 hierarchically nested 16S rRNA gene-targeted oligonucleotide probes was developed. A novel primer pair was designed for selective amplification of a 1.3 kb 16S rRNA gene fragment of Burkholderia species prior to microarray analysis. The diagnostic performance of the microarray for identification and differentiation of Burkholderia species was tested with 44 reference strains of the genera Burkholderia, Pandoraea, Ralstonia and Limnobacter. Hybridization patterns based on presence/absence of probe signals were interpreted semi-automatically using the novel likelihood-based strategy of the web-tool Phylo- Detect. Eighty-eight per cent of the reference strains were correctly identified at the species level. The evaluated microarray was applied to investigate shifts in the Burkholderia community structure in acidic forest soil upon addition of cadmium, a condition that selected for Burkholderia species. The microarray results were in agreement with those obtained from phylogenetic analysis of Burkholderia 16S rRNA gene sequences recovered from the same cadmiumcontaminated soil, demonstrating the value of the Burkholderia phylochip for determinative and environmental studies.

  10. Harnessing pain heterogeneity and RNA transcriptome to identify blood–based pain biomarkers: a novel correlational study design and bioinformatics approach in a graded chronic constriction injury model

    PubMed Central

    Grace, Peter M.; Hurley, Daniel; Barratt, Daniel T.; Tsykin, Anna; Watkins, Linda R.; Rolan, Paul E.; Hutchinson, Mark R.

    2017-01-01

    A quantitative, peripherally accessible biomarker for neuropathic pain has great potential to improve clinical outcomes. Based on the premise that peripheral and central immunity contribute to neuropathic pain mechanisms, we hypothesized that biomarkers could be identified from the whole blood of adult male rats, by integrating graded chronic constriction injury (CCI), ipsilateral lumbar dorsal quadrant (iLDQ) and whole blood transcriptomes, and pathway analysis with pain behavior. Correlational bioinformatics identified a range of putative biomarker genes for allodynia intensity, many encoding for proteins with a recognized role in immune/nociceptive mechanisms. A selection of these genes was validated in a separate replication study. Pathway analysis of the iLDQ transcriptome identified Fcγ and Fcε signaling pathways, among others. This study is the first to employ the whole blood transcriptome to identify pain biomarker panels. The novel correlational bioinformatics, developed here, selected such putative biomarkers based on a correlation with pain behavior and formation of signaling pathways with iLDQ genes. Future studies may demonstrate the predictive ability of these biomarker genes across other models and additional variables. PMID:22697386

  11. Pollen-specific, but not sperm-specific, genes show stronger purifying selection and higher rates of positive selection than sporophytic genes in Capsella grandiflora.

    PubMed

    Arunkumar, Ramesh; Josephs, Emily B; Williamson, Robert J; Wright, Stephen I

    2013-11-01

    Selection on the gametophyte can be a major force shaping plant genomes as 7-11% of genes are expressed only in that phase and 60% of genes are expressed in both the gametophytic and sporophytic phases. The efficacy of selection on gametophytic tissues is likely to be influenced by sexual selection acting on male and female functions of hermaphroditic plants. Moreover, the haploid nature of the gametophytic phase allows selection to be efficient in removing recessive deleterious mutations and fixing recessive beneficial mutations. To assess the importance of gametophytic selection, we compared the strength of purifying selection and extent of positive selection on gametophyte- and sporophyte-specific genes in the highly outcrossing plant Capsella grandiflora. We found that pollen-exclusive genes had a larger fraction of sites under strong purifying selection, a greater proportion of adaptive substitutions, and faster protein evolution compared with seedling-exclusive genes. In contrast, sperm cell-exclusive genes had a smaller fraction of sites under strong purifying selection, a lower proportion of adaptive substitutions, and slower protein evolution compared with seedling-exclusive genes. Observations of strong selection acting on pollen-expressed genes are likely explained by sexual selection resulting from pollen competition aided by the haploid nature of that tissue. The relaxation of selection in sperm might be due to the reduced influence of intrasexual competition, but reduced gene expression may also be playing an important role.

  12. SNP discovery in candidate adaptive genes using exon capture in a free-ranging alpine ungulate

    USGS Publications Warehouse

    Roffler, Gretchen H.; Amish, Stephen J.; Smith, Seth; Cosart, Ted F.; Kardos, Marty; Schwartz, Michael K.; Luikart, Gordon

    2016-01-01

    Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding and nearby 5′ and 3′ untranslated regions of chosen candidate genes. Targeted sequences were taken from bighorn sheep (Ovis canadensis) exon capture data and directly from the domestic sheep genome (Ovis aries v. 3; oviAri3). The bighorn sheep sequences used in the Dall's sheep (Ovis dalli dalli) exon capture aligned to 2350 genes on the oviAri3 genome with an average of 2 exons each. We developed a microfluidic qPCR-based SNP chip to genotype 476 Dall's sheep from locations across their range and test for patterns of selection. Using multiple corroborating approaches (lositan and bayescan), we detected 28 SNP loci potentially under selection. We additionally identified candidate loci significantly associated with latitude, longitude, precipitation and temperature, suggesting local environmental adaptation. The three methods demonstrated consistent support for natural selection on nine genes with immune and disease-regulating functions (e.g. Ovar-DRA, APC, BATF2, MAGEB18), cell regulation signalling pathways (e.g. KRIT1, PI3K, ORRC3), and respiratory health (CYSLTR1). Characterizing adaptive allele distributions from novel genetic techniques will facilitate investigation of the influence of environmental variation on local adaptation of a northern alpine ungulate throughout its range. This research demonstrated the utility of exon capture for gene-targeted SNP discovery and subsequent SNP chip genotyping using low-quality samples in a nonmodel species.

  13. A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.

    PubMed

    Chen, Zhenyu; Li, Jianping; Wei, Liwei

    2007-10-01

    Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.

  14. Selecting and validating reference genes for quantitative real-time PCR in Plutella xylostella (L.).

    PubMed

    You, Yanchun; Xie, Miao; Vasseur, Liette; You, Minsheng

    2018-05-01

    Gene expression analysis provides important clues regarding gene functions, and quantitative real-time PCR (qRT-PCR) is a widely used method in gene expression studies. Reference genes are essential for normalizing and accurately assessing gene expression. In the present study, 16 candidate reference genes (ACTB, CyPA, EF1-α, GAPDH, HSP90, NDPk, RPL13a, RPL18, RPL19, RPL32, RPL4, RPL8, RPS13, RPS4, α-TUB, and β-TUB) from Plutella xylostella were selected to evaluate gene expression stability across different experimental conditions using five statistical algorithms (geNorm, NormFinder, Delta Ct, BestKeeper, and RefFinder). The results suggest that different reference genes or combinations of reference genes are suitable for normalization in gene expression studies of P. xylostella according to the different developmental stages, strains, tissues, and insecticide treatments. Based on the given experimental sets, the most stable reference genes were RPS4 across different developmental stages, RPL8 across different strains and tissues, and EF1-α across different insecticide treatments. A comprehensive and systematic assessment of potential reference genes for gene expression normalization is essential for post-genomic functional research in P. xylostella, a notorious pest with worldwide distribution and a high capacity to adapt and develop resistance to insecticides.

  15. Gene-Based Genome-Wide Association Analysis in European and Asian Populations Identified Novel Genes for Rheumatoid Arthritis.

    PubMed

    Zhu, Hong; Xia, Wei; Mo, Xing-Bo; Lin, Xiang; Qiu, Ying-Hua; Yi, Neng-Jun; Zhang, Yong-Hong; Deng, Fei-Yan; Lei, Shu-Feng

    2016-01-01

    Rheumatoid arthritis (RA) is a complex autoimmune disease. Using a gene-based association research strategy, the present study aims to detect unknown susceptibility to RA and to address the ethnic differences in genetic susceptibility to RA between European and Asian populations. Gene-based association analyses were performed with KGG 2.5 by using publicly available large RA datasets (14,361 RA cases and 43,923 controls of European subjects, 4,873 RA cases and 17,642 controls of Asian Subjects). For the newly identified RA-associated genes, gene set enrichment analyses and protein-protein interactions analyses were carried out with DAVID and STRING version 10.0, respectively. Differential expression verification was conducted using 4 GEO datasets. The expression levels of three selected 'highly verified' genes were measured by ELISA among our in-house RA cases and controls. A total of 221 RA-associated genes were newly identified by gene-based association study, including 71'overlapped', 76 'European-specific' and 74 'Asian-specific' genes. Among them, 105 genes had significant differential expressions between RA patients and health controls at least in one dataset, especially for 20 genes including 11 'overlapped' (ABCF1, FLOT1, HLA-F, IER3, TUBB, ZKSCAN4, BTN3A3, HSP90AB1, CUTA, BRD2, HLA-DMA), 5 'European-specific' (PHTF1, RPS18, BAK1, TNFRSF14, SUOX) and 4 'Asian-specific' (RNASET2, HFE, BTN2A2, MAPK13) genes whose differential expressions were significant at least in three datasets. The protein expressions of two selected genes FLOT1 (P value = 1.70E-02) and HLA-DMA (P value = 4.70E-02) in plasma were significantly different in our in-house samples. Our study identified 221 novel RA-associated genes and especially highlighted the importance of 20 candidate genes on RA. The results addressed ethnic genetic background differences for RA susceptibility between European and Asian populations and detected a long list of overlapped or ethnic specific RA genes. The study not only greatly increases our understanding of genetic susceptibility to RA, but also provides important insights into the ethno-genetic homogeneity and heterogeneity of RA in both ethnicities.

  16. A simple and reliable multi-gene transformation method for switchgrass.

    PubMed

    Ogawa, Yoichi; Shirakawa, Makoto; Koumoto, Yasuko; Honda, Masaho; Asami, Yuki; Kondo, Yasuhiro; Hara-Nishimura, Ikuko

    2014-07-01

    A simple and reliable Agrobacterium -mediated transformation method was developed for switchgrass. Using this method, many transgenic plants carrying multiple genes-of-interest could be produced without untransformed escape. Switchgrass (Panicum virgatum L.) is a promising biomass crop for bioenergy. To obtain transgenic switchgrass plants carrying a multi-gene trait in a simple manner, an Agrobacterium-mediated transformation method was established by constructing a Gateway-based binary vector, optimizing transformation conditions and developing a novel selection method. A MultiRound Gateway-compatible destination binary vector carrying the bar selectable marker gene, pHKGB110, was constructed to introduce multiple genes of interest in a single transformation. Two reporter gene expression cassettes, GUSPlus and gfp, were constructed independently on two entry vectors and then introduced into a single T-DNA region of pHKGB110 via sequential LR reactions. Agrobacterium tumefaciens EHA101 carrying the resultant binary vector pHKGB112 and caryopsis-derived compact embryogenic calli were used for transformation experiments. Prolonged cocultivation for 7 days followed by cultivation on media containing meropenem improved transformation efficiency without overgrowth of Agrobacterium, which was, however, not inhibited by cefotaxime or Timentin. In addition, untransformed escape shoots were completely eliminated during the rooting stage by direct dipping the putatively transformed shoots into the herbicide Basta solution for a few seconds, designated as the 'herbicide dipping method'. It was also demonstrated that more than 90 % of the bar-positive transformants carried both reporters delivered from pHKGB112. This simple and reliable transformation method, which incorporates a new selection technique and the use of a MultiRound Gateway-based binary vector, would be suitable for producing a large number of transgenic lines carrying multiple genes.

  17. Classification of intramural metastases and lymph node metastases of esophageal cancer from gene expression based on boosting and projective adaptive resonance theory.

    PubMed

    Takahashi, Hiro; Aoyagi, Kazuhiko; Nakanishi, Yukihiro; Sasaki, Hiroki; Yoshida, Teruhiko; Honda, Hiroyuki

    2006-07-01

    Esophageal cancer is a well-known cancer with poorer prognosis than other cancers. An optimal and individualized treatment protocol based on accurate diagnosis is urgently needed to improve the treatment of cancer patients. For this purpose, it is important to develop a sophisticated algorithm that can manage a large amount of data, such as gene expression data from DNA microarrays, for optimal and individualized diagnosis. Marker gene selection is essential in the analysis of gene expression data. We have already developed a combination method of the use of the projective adaptive resonance theory and that of a boosted fuzzy classifier with the SWEEP operator denoted PART-BFCS. This method is superior to other methods, and has four features, namely fast calculation, accurate prediction, reliable prediction, and rule extraction. In this study, we applied this method to analyze microarray data obtained from esophageal cancer patients. A combination method of PART-BFCS and the U-test was also investigated. It was necessary to use a specific type of BFCS, namely, BFCS-1,2, because the esophageal cancer data were very complexity. PART-BFCS and PART-BFCS with the U-test models showed higher performances than two conventional methods, namely, k-nearest neighbor (kNN) and weighted voting (WV). The genes including CDK6 could be found by our methods and excellent IF-THEN rules could be extracted. The genes selected in this study have a high potential as new diagnosis markers for esophageal cancer. These results indicate that the new methods can be used in marker gene selection for the diagnosis of cancer patients.

  18. Genetic and Transcriptomic Bases of Intestinal Epithelial Barrier Dysfunction in Inflammatory Bowel Disease.

    PubMed

    Vancamelbeke, Maaike; Vanuytsel, Tim; Farré, Ricard; Verstockt, Sare; Ferrante, Marc; Van Assche, Gert; Rutgeerts, Paul; Schuit, Frans; Vermeire, Séverine; Arijs, Ingrid; Cleynen, Isabelle

    2017-10-01

    Intestinal barrier defects are common in patients with inflammatory bowel disease (IBD). To identify which components could underlie these changes, we performed an in-depth analysis of epithelial barrier genes in IBD. A set of 128 intestinal barrier genes was selected. Polygenic risk scores were generated based on selected barrier gene variants that were associated with Crohn's disease (CD) or ulcerative colitis (UC) in our study. Gene expression was analyzed using microarray and quantitative reverse transcription polymerase chain reaction. Influence of barrier gene variants on expression was studied by cis-expression quantitative trait loci mapping and comparing patients with low- and high-risk scores. Barrier risk scores were significantly higher in patients with IBD than controls. At single-gene level, the associated barrier single-nucleotide polymorphisms were most significantly enriched in PTGER4 for CD and HNF4A for UC. As a group, the regulating proteins were most enriched for CD and UC. Expression analysis showed that many epithelial barrier genes were significantly dysregulated in active CD and UC, with overrepresentation of mucus layer genes. In uninflamed CD ileum and IBD colon, most barrier gene levels restored to normal, except for MUC1 and MUC4 that remained persistently increased compared with controls. Expression levels did not depend on cis-regulatory variants nor combined genetic risk. We found genetic and transcriptomic dysregulations of key epithelial barrier genes and components in IBD. Of these, we believe that mucus genes, in particular MUC1 and MUC4, play an essential role in the pathogenesis of IBD and could represent interesting targets for treatment.

  19. Evolution of Synonymous Codon Usage in Neurospora tetrasperma and Neurospora discreta

    PubMed Central

    Whittle, C. A.; Sun, Y.; Johannesson, H.

    2011-01-01

    Neurospora comprises a primary model system for the study of fungal genetics and biology. In spite of this, little is known about genome evolution in Neurospora. For example, the evolution of synonymous codon usage is largely unknown in this genus. In the present investigation, we conducted a comprehensive analysis of synonymous codon usage and its relationship to gene expression and gene length (GL) in Neurospora tetrasperma and Neurospora discreta. For our analysis, we examined codon usage among 2,079 genes per organism and assessed gene expression using large-scale expressed sequenced tag (EST) data sets (279,323 and 453,559 ESTs for N. tetrasperma and N. discreta, respectively). Data on relative synonymous codon usage revealed 24 codons (and two putative codons) that are more frequently used in genes with high than with low expression and thus were defined as optimal codons. Although codon-usage bias was highly correlated with gene expression, it was independent of selectively neutral base composition (introns); thus demonstrating that translational selection drives synonymous codon usage in these genomes. We also report that GL (coding sequences [CDS]) was inversely associated with optimal codon usage at each gene expression level, with highly expressed short genes having the greatest frequency of optimal codons. Optimal codon frequency was moderately higher in N. tetrasperma than in N. discreta, which might be due to variation in selective pressures and/or mating systems. PMID:21402862

  20. Genomic Signature of Kin Selection in an Ant with Obligately Sterile Workers

    PubMed Central

    Warner, Michael R.; Mikheyev, Alexander S.

    2017-01-01

    Abstract Kin selection is thought to drive the evolution of cooperation and conflict, but the specific genes and genome-wide patterns shaped by kin selection are unknown. We identified thousands of genes associated with the sterile ant worker caste, the archetype of an altruistic phenotype shaped by kin selection, and then used population and comparative genomic approaches to study patterns of molecular evolution at these genes. Consistent with population genetic theoretical predictions, worker-upregulated genes experienced reduced selection compared with genes upregulated in reproductive castes. Worker-upregulated genes included more taxonomically restricted genes, indicating that the worker caste has recruited more novel genes, yet these genes also experienced reduced selection. Our study identifies a putative genomic signature of kin selection and helps to integrate emerging sociogenomic data with longstanding social evolution theory. PMID:28419349

  1. Comparative Transcriptome Analysis of the Pacific Oyster Crassostrea gigas Characterized by Shell Colors: Identification of Genetic Bases Potentially Involved in Pigmentation

    PubMed Central

    Feng, Dandan; Li, Qi; Yu, Hong; Zhao, Xuelin; Kong, Lingfeng

    2015-01-01

    Background Shell color polymorphisms of Mollusca have contributed to development of evolutionary biology and population genetics, while the genetic bases and molecular mechanisms underlying shell pigmentation are poorly understood. The Pacific oyster (Crassostrea gigas) is one of the most important farmed oysters worldwide. Through successive family selection, four shell color variants (white, golden, black and partially pigmented) of C. gigas have been developed. To elucidate the genetic mechanisms of shell coloration in C. gigas and facilitate the selection of elite oyster lines with desired coloration patterns, differentially expressed genes (DEGs) were identified among the four shell color variants by RNA-seq. Results Digital gene expression generated over fifteen million reads per sample, producing expression data for 28,027 genes. A total number of 2,645 DEGs were identified from pair-wise comparisons, of which 432, 91, 43 and 39 genes specially were up-regulated in white, black, golden and partially pigmented shell of C. gigas, respectively. Three genes of Abca1, Abca3 and Abcb1 which belong to the ATP-binding cassette (ABC) transporters super-families were significantly associated with white shell formation. A tyrosinase transcript (CGI_10008737) represented consistent up-regulated pattern with golden coloration. We proposed that white shell variant of C. gigas could employ “endocytosis” to down-regulate notch level and to prevent shell pigmentation. Conclusion This study discovered some potential shell coloration genes and related molecular mechanisms by the RNA-seq, which would provide foundational information to further study on shell coloration and assist in selective breeding in C. gigas. PMID:26693729

  2. Signatures of selection in the three-spined stickleback along a small-scale brackish water - freshwater transition zone.

    PubMed

    Konijnendijk, Nellie; Shikano, Takahito; Daneels, Dorien; Volckaert, Filip A M; Raeymaekers, Joost A M

    2015-09-01

    Local adaptation is often obvious when gene flow is impeded, such as observed at large spatial scales and across strong ecological contrasts. However, it becomes less certain at small scales such as between adjacent populations or across weak ecological contrasts, when gene flow is strong. While studies on genomic adaptation tend to focus on the former, less is known about the genomic targets of natural selection in the latter situation. In this study, we investigate genomic adaptation in populations of the three-spined stickleback Gasterosteus aculeatus L. across a small-scale ecological transition with salinities ranging from brackish to fresh. Adaptation to salinity has been repeatedly demonstrated in this species. A genome scan based on 87 microsatellite markers revealed only few signatures of selection, likely owing to the constraints that homogenizing gene flow puts on adaptive divergence. However, the detected loci appear repeatedly as targets of selection in similar studies of genomic adaptation in the three-spined stickleback. We conclude that the signature of genomic selection in the face of strong gene flow is weak, yet detectable. We argue that the range of studies of genomic divergence should be extended to include more systems characterized by limited geographical and ecological isolation, which is often a realistic setting in nature.

  3. Dietary adaptation of FADS genes in Europe varied across time and geography.

    PubMed

    Ye, Kaixiong; Gao, Feng; Wang, David; Bar-Yosef, Ofer; Keinan, Alon

    2017-05-26

    Fatty acid desaturase (FADS) genes encode rate-limiting enzymes for the biosynthesis of omega-6 and omega-3 long-chain polyunsaturated fatty acids (LCPUFAs). This biosynthesis is essential for individuals subsisting on LCPUFA-poor diets (for example, plant-based). Positive selection on FADS genes has been reported in multiple populations, but its cause and pattern in Europeans remain unknown. Here we demonstrate, using ancient and modern DNA, that positive selection acted on the same FADS variants both before and after the advent of farming in Europe, but on opposite (that is, alternative) alleles. Recent selection in farmers also varied geographically, with the strongest signal in southern Europe. These varying selection patterns concur with anthropological evidence of varying diets, and with the association of farming-adaptive alleles with higher FADS1 expression and thus enhanced LCPUFA biosynthesis. Genome-wide association studies reveal that farming-adaptive alleles not only increase LCPUFAs, but also affect other lipid levels and protect against several inflammatory diseases.

  4. Bovine Host Genetic Variation Influences Rumen Microbial Methane Production with Best Selection Criterion for Low Methane Emitting and Efficiently Feed Converting Hosts Based on Metagenomic Gene Abundance

    PubMed Central

    Roehe, Rainer; Dewhurst, Richard J.; Duthie, Carol-Anne; Rooke, John A.; McKain, Nest; Ross, Dave W.; Hyslop, Jimmy J.; Waterhouse, Anthony; Freeman, Tom C.

    2016-01-01

    Methane produced by methanogenic archaea in ruminants contributes significantly to anthropogenic greenhouse gas emissions. The host genetic link controlling microbial methane production is unknown and appropriate genetic selection strategies are not developed. We used sire progeny group differences to estimate the host genetic influence on rumen microbial methane production in a factorial experiment consisting of crossbred breed types and diets. Rumen metagenomic profiling was undertaken to investigate links between microbial genes and methane emissions or feed conversion efficiency. Sire progeny groups differed significantly in their methane emissions measured in respiration chambers. Ranking of the sire progeny groups based on methane emissions or relative archaeal abundance was consistent overall and within diet, suggesting that archaeal abundance in ruminal digesta is under host genetic control and can be used to genetically select animals without measuring methane directly. In the metagenomic analysis of rumen contents, we identified 3970 microbial genes of which 20 and 49 genes were significantly associated with methane emissions and feed conversion efficiency respectively. These explained 81% and 86% of the respective variation and were clustered in distinct functional gene networks. Methanogenesis genes (e.g. mcrA and fmdB) were associated with methane emissions, whilst host-microbiome cross talk genes (e.g. TSTA3 and FucI) were associated with feed conversion efficiency. These results strengthen the idea that the host animal controls its own microbiota to a significant extent and open up the implementation of effective breeding strategies using rumen microbial gene abundance as a predictor for difficult-to-measure traits on a large number of hosts. Generally, the results provide a proof of principle to use the relative abundance of microbial genes in the gastrointestinal tract of different species to predict their influence on traits e.g. human metabolism, health and behaviour, as well as to understand the genetic link between host and microbiome. PMID:26891056

  5. Bovine Host Genetic Variation Influences Rumen Microbial Methane Production with Best Selection Criterion for Low Methane Emitting and Efficiently Feed Converting Hosts Based on Metagenomic Gene Abundance.

    PubMed

    Roehe, Rainer; Dewhurst, Richard J; Duthie, Carol-Anne; Rooke, John A; McKain, Nest; Ross, Dave W; Hyslop, Jimmy J; Waterhouse, Anthony; Freeman, Tom C; Watson, Mick; Wallace, R John

    2016-02-01

    Methane produced by methanogenic archaea in ruminants contributes significantly to anthropogenic greenhouse gas emissions. The host genetic link controlling microbial methane production is unknown and appropriate genetic selection strategies are not developed. We used sire progeny group differences to estimate the host genetic influence on rumen microbial methane production in a factorial experiment consisting of crossbred breed types and diets. Rumen metagenomic profiling was undertaken to investigate links between microbial genes and methane emissions or feed conversion efficiency. Sire progeny groups differed significantly in their methane emissions measured in respiration chambers. Ranking of the sire progeny groups based on methane emissions or relative archaeal abundance was consistent overall and within diet, suggesting that archaeal abundance in ruminal digesta is under host genetic control and can be used to genetically select animals without measuring methane directly. In the metagenomic analysis of rumen contents, we identified 3970 microbial genes of which 20 and 49 genes were significantly associated with methane emissions and feed conversion efficiency respectively. These explained 81% and 86% of the respective variation and were clustered in distinct functional gene networks. Methanogenesis genes (e.g. mcrA and fmdB) were associated with methane emissions, whilst host-microbiome cross talk genes (e.g. TSTA3 and FucI) were associated with feed conversion efficiency. These results strengthen the idea that the host animal controls its own microbiota to a significant extent and open up the implementation of effective breeding strategies using rumen microbial gene abundance as a predictor for difficult-to-measure traits on a large number of hosts. Generally, the results provide a proof of principle to use the relative abundance of microbial genes in the gastrointestinal tract of different species to predict their influence on traits e.g. human metabolism, health and behaviour, as well as to understand the genetic link between host and microbiome.

  6. Identification of Differentially Expressed Genes and Pathways for Myofiber Characteristics in Soleus Muscles between Chicken Breeds Differing in Meat Quality.

    PubMed

    Du, Y F; Ding, Q L; Li, Y M; Fang, W R

    2017-04-03

    In the modern chicken industry, fast-growing broilers have undergone strong artificial selection for muscle growth, which has led to remarkable phenotypic variations compared with slow-growing chickens. However, the molecular mechanism underlying these phenotypes differences remains unknown. In this study, a systematic identification of candidate genes and new pathways related to myofiber development and composition in chicken Soleus muscle (SOL) has been made using gene expression profiles of two distinct breeds: Qingyuan partridge (QY), a slow-growing Chinese breed possessing high meat quality and Cobb 500 (CB), a commercial fast-growing broiler line. Agilent cDNA microarray analyses were conducted to determine gene expression profiles of soleus muscle sampled at sexual maturity age of QY (112 d) and CB (42 d). The 1318 genes with at least 2-fold differences were identified (P < 0.05, FDR <0.05, FC ≥ 2) in SOL muscles of QY and CB chickens. Differentially expressed genes (DEGs) related to muscle development, energy metabolism or lipid metabolism processes were examined further in each breed based on Gene Ontology (GO) analysis, and 11 genes involved in these processes were selected for further validation studies by qRT-PCR. In addition, based on KEGG pathway analysis of DEGs in both QY and CB chickens, it was found that in addition to pathways affecting myogenic fibre-type development and differentiation (pathways for Hedgehog & Calcium signaling), energy metabolism (Phosphatidylinositol signaling system, VEGF signaling pathway, Purine metabolism, Pyrimidine metabolism) were also enriched and might form a network with pathways related to muscle metabolism to influence the development of myofibers. This study is the first stage in the understanding of molecular mechanisms underlying variations in poultry meat quality. Large scale analyses are now required to validate the role of the genes identified and ultimately to find molecular markers that can be used for selection or to optimize rearing practices.

  7. Engineering Complex Microbial Phenotypes with Continuous Genetic Integration and Plasmid Based Multi-gene Library

    DTIC Science & Technology

    2013-10-09

    have desirable traits. We aim to enlarge the E. coli genome using Lactobacillusplantarum genes to build cells tolerant to EtOH and BT. L. plantarum is...chemicals III. Approach Objective 1 & la: Integrated heterologous (L. plantarum ) DNA into the E. coli chromosome and selected for insertions that...developed in combination with genes identified from screening L. plantarum libraries. Additionally, we have screened heterologous libraries for

  8. APPLICATION OF CDNA MICROARRAY TECHNOLOGY TO IN VITRO TOXICOLOGY AND THE SELECTION OF GENES FOR A REAL TIME RT-PCR-BASED SCREEN FOR OXIDATIVE STRESS IN HEP-G2 CELLS

    EPA Science Inventory

    Large-scale analysis of gene expression using cDNA microarrays promises the
    rapid detection of the mode of toxicity for drugs and other chemicals. cDNA
    microarrays were used to examine chemically-induced alterations of gene
    expression in HepG2 cells exposed to oxidative ...

  9. A genetic replacement system for selection-based engineering of essential proteins

    PubMed Central

    2012-01-01

    Background Essential genes represent the core of biological functions required for viability. Molecular understanding of essentiality as well as design of synthetic cellular systems includes the engineering of essential proteins. An impediment to this effort is the lack of growth-based selection systems suitable for directed evolution approaches. Results We established a simple strategy for genetic replacement of an essential gene by a (library of) variant(s) during a transformation. The system was validated using three different essential genes and plasmid combinations and it reproducibly shows transformation efficiencies on the order of 107 transformants per microgram of DNA without any identifiable false positives. This allowed for reliable recovery of functional variants out of at least a 105-fold excess of non-functional variants. This outperformed selection in conventional bleach-out strains by at least two orders of magnitude, where recombination between functional and non-functional variants interfered with reliable recovery even in recA negative strains. Conclusions We propose that this selection system is extremely suitable for evaluating large libraries of engineered essential proteins resulting in the reliable isolation of functional variants in a clean strain background which can readily be used for in vivo applications as well as expression and purification for use in in vitro studies. PMID:22898007

  10. The use of the PMI/mannose selection system to recover transgenic sweet orange plants (Citrus sinensis L. Osbeck).

    PubMed

    Boscariol, R L; Almeida, W A B; Derbyshire, M T V C; Mourão Filho, F A A; Mendes, B M J

    2003-09-01

    A new method for obtaining transgenic sweet orange plants was developed in which positive selection (Positech) based on the Escherichia coli phosphomannose-isomerase (PMI) gene as the selectable marker gene and mannose as the selective agent was used. Epicotyl segments from in vitro-germinated plants of Valencia, Hamlin, Natal and Pera sweet oranges were inoculated with Agrobacterium tumefaciens EHA101-pNOV2116 and subsequently selected on medium supplemented with different concentrations of mannose or with a combination of mannose and sucrose as a carbon source. Genetic transformation was confirmed by PCR and Southern blot. The transgene expression was evaluated using a chlorophenol red assay and isoenzymes. The transformation efficiency rate ranged from 3% to 23.8%, depending on cultivar. This system provides an efficient manner for selecting transgenic sweet orange plants without using antibiotics or herbicides.

  11. Analysis of the cytochrome c oxidase subunit 1 (COX1) gene reveals the unique evolution of the giant panda.

    PubMed

    Hu, Yao-Dong; Pang, Hui-Zhong; Li, De-Sheng; Ling, Shan-Shan; Lan, Dan; Wang, Ye; Zhu, Yun; Li, Di-Yan; Wei, Rong-Ping; Zhang, He-Min; Wang, Cheng-Dong

    2016-11-05

    As the rate-limiting enzyme of the mitochondrial respiratory chain, cytochrome c oxidase (COX) plays a crucial role in biological metabolism. "Living fossil" giant panda (Ailuropoda melanoleuca) is well-known for its special bamboo diet. In an effort to explore functional variation of COX1 in the energy metabolism behind giant panda's low-energy bamboo diet, we looked at genetic variation of COX1 gene in giant panda, and tested for its selection effect. In 1545 base pairs of the gene from 15 samples, 9 positions were variable and 1 mutation leaded to an amino acid sequence change. COX1 gene produces six haplotypes, nucleotide (pi), haplotype diversity (Hd). In addition, the average number of nucleotide differences (k) is 0.001629±0.001036, 0.8083±0.0694 and 2.517, respectively. Also, dN/dS ratio is significantly below 1. These results indicated that giant panda had a low population genetic diversity, and an obvious purifying selection of the COX1 gene which reduces synthesis of ATP determines giant panda's low-energy bamboo diet. Phylogenetic trees based on the COX1 gene were constructed to demonstrate that giant panda is the sister group of other Ursidae. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Selection of Differential Isolates of Magnaporthe oryzae for Postulation of Blast Resistance Genes.

    PubMed

    Fang, W W; Liu, C C; Zhang, H W; Xu, H; Zhou, S; Fang, K X; Peng, Y L; Zhao, W S

    2018-05-21

    A set of differential isolates of Magnaporthe oryzae is needed for the postulation of blast resistance genes in numerous rice varieties and breeding materials. In this study, the pathotypes of 1,377 M. oryzae isolates from different regions of China were determined by inoculating detached rice leaves of 24 monogenic lines. Among them, 25 isolates were selected as differential isolates based on the following characteristics: they had distinct responses on the monogenic lines, contained the minimum number of avirulence genes, were stable in pathogenicity and conidiation during consecutive culture, were consistent colony growth rate, and, together, could differentiate combinations of the 24 major blast resistance genes. Seedlings of rice cultivars were inoculated with this differential set of isolates to postulate whether they contain 1 or more than 1 of the 24 blast resistance genes. The results were consistent with those from polymerase chain reaction analysis of target resistance genes. Establishment of a standard set of differential isolates will facilitate breeding for blast resistance and improved management of rice blast disease.

  13. Molecular Genetics of Successful Smoking Cessation: Convergent Genome-Wide Association Study Results

    PubMed Central

    Uhl, George R.; Liu, Qing-Rong; Drgon, Tomas; Johnson, Catherine; Walther, Donna; Rose, Jed E.; David, Sean P.; Niaura, Ray; Lerman, Caryn

    2008-01-01

    Context Smoking remains a major public health problem. Twin studies indicate that the ability to quit smoking is substantially heritable, with genetics that overlap modestly with the genetics of vulnerability to dependence on addictive substances. Objectives To identify replicated genes that facilitate smokers’ abilities to achieve and sustain abstinence from smoking (hereinafter referred to as quit-success genes) found in more than 2 genome-wide association (GWA) studies of successful vs unsuccessful abstainers, and, secondarily, to nominate genes for selective involvement in smoking cessation success with bupropion hydrochloride vs nicotine replacement therapy (NRT). Design The GWA results in subjects from 3 centers, with secondary analyses of NRT vs bupropion responders. Setting Outpatient smoking cessation trial participants from 3 centers. Participants European American smokers who successfully vs unsuccessfully abstain from smoking with biochemical confirmation in a smoking cessation trial using NRT, bupropion, or placebo (N=550). Main Outcome Measures Quit-success genes, reproducibly identified by clustered nominally positive single-nucleotide polymorphisms (SNPs) in more than 2 independent samples with significant P values based on Monte Carlo simulation trials. The NRT-selective genes were nominated by clustered SNPs that display much larger t values for NRT vs placebo comparisons. The bupropion-selective genes were nominated by bupropion-selective results. Results Variants in quit-success genes are likely to alter cell adhesion, enzymatic, transcriptional, structural, and DNA, RNA, and/or protein-handling functions. Quit-success genes are identified by clustered nominally positive SNPs from more than 2 samples and are unlikely to represent chance observations (Monte Carlo P < .0003). These genes display modest overlap with genes identified in GWA studies of dependence on addictive substances and memory. Conclusions These results support polygenic genetics for success in abstaining from smoking, overlap with genetics of substance dependence and memory, and nominate gene variants for selective influences on therapeutic responses to bupropion vs NRT. Molecular genetics should help match the types and/or intensity of anti-smoking treatments with the smokers most likely to benefit from them. PMID:18519826

  14. FOXP2 targets show evidence of positive selection in European populations.

    PubMed

    Ayub, Qasim; Yngvadottir, Bryndis; Chen, Yuan; Xue, Yali; Hu, Min; Vernes, Sonja C; Fisher, Simon E; Tyler-Smith, Chris

    2013-05-02

    Forkhead box P2 (FOXP2) is a highly conserved transcription factor that has been implicated in human speech and language disorders and plays important roles in the plasticity of the developing brain. The pattern of nucleotide polymorphisms in FOXP2 in modern populations suggests that it has been the target of positive (Darwinian) selection during recent human evolution. In our study, we searched for evidence of selection that might have followed FOXP2 adaptations in modern humans. We examined whether or not putative FOXP2 targets identified by chromatin-immunoprecipitation genomic screening show evidence of positive selection. We developed an algorithm that, for any given gene list, systematically generates matched lists of control genes from the Ensembl database, collates summary statistics for three frequency-spectrum-based neutrality tests from the low-coverage resequencing data of the 1000 Genomes Project, and determines whether these statistics are significantly different between the given gene targets and the set of controls. Overall, there was strong evidence of selection of FOXP2 targets in Europeans, but not in the Han Chinese, Japanese, or Yoruba populations. Significant outliers included several genes linked to cellular movement, reproduction, development, and immune cell trafficking, and 13 of these constituted a significant network associated with cardiac arteriopathy. Strong signals of selection were observed for CNTNAP2 and RBFOX1, key neurally expressed genes that have been consistently identified as direct FOXP2 targets in multiple studies and that have themselves been associated with neurodevelopmental disorders involving language dysfunction. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  15. Novel harmonic regularization approach for variable selection in Cox's proportional hazards model.

    PubMed

    Chu, Ge-Jin; Liang, Yong; Wang, Jia-Xuan

    2014-01-01

    Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq  (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods.

  16. Application of nanomaterials in the bioanalytical detection of disease-related genes.

    PubMed

    Zhu, Xiaoqian; Li, Jiao; He, Hanping; Huang, Min; Zhang, Xiuhua; Wang, Shengfu

    2015-12-15

    In the diagnosis of genetic diseases and disorders, nanomaterials-based gene detection systems have significant advantages over conventional diagnostic systems in terms of simplicity, sensitivity, specificity, and portability. In this review, we describe the application of nanomaterials for disease-related genes detection in different methods excluding PCR-related method, such as colorimetry, fluorescence-based methods, electrochemistry, microarray methods, surface-enhanced Raman spectroscopy (SERS), quartz crystal microbalance (QCM) methods, and dynamic light scattering (DLS). The most commonly used nanomaterials are gold, silver, carbon and semiconducting nanoparticles. Various nanomaterials-based gene detection methods are introduced, their respective advantages are discussed, and selected examples are provided to illustrate the properties of these nanomaterials and their emerging applications for the detection of specific nucleic acid sequences. Copyright © 2015. Published by Elsevier B.V.

  17. Toward a comprehensive and systematic methylome signature in colorectal cancers.

    PubMed

    Ashktorab, Hassan; Rahi, Hamed; Wansley, Daniel; Varma, Sudhir; Shokrani, Babak; Lee, Edward; Daremipouran, Mohammad; Laiyemo, Adeyinka; Goel, Ajay; Carethers, John M; Brim, Hassan

    2013-08-01

    CpG Island Methylator Phenotype (CIMP) is one of the underlying mechanisms in colorectal cancer (CRC). This study aimed to define a methylome signature in CRC through a methylation microarray analysis and a compilation of promising CIMP markers from the literature. Illumina HumanMethylation27 (IHM27) array data was generated and analyzed based on statistical differences in methylation data (1st approach) or based on overall differences in methylation percentages using lower 95% CI (2nd approach). Pyrosequencing was performed for the validation of nine genes. A meta-analysis was used to identify CIMP and non-CIMP markers that were hypermethylated in CRC but did not yet make it to the CIMP genes' list. Our 1st approach for array data analysis demonstrated the limitations in selecting genes for further validation, highlighting the need for the 2nd bioinformatics approach to adequately select genes with differential aberrant methylation. A more comprehensive list, which included non-CIMP genes, such as APC, EVL, CD109, PTEN, TWIST1, DCC, PTPRD, SFRP1, ICAM5, RASSF1A, EYA4, 30ST2, LAMA1, KCNQ5, ADHEF1, and TFPI2, was established. Array data are useful to categorize and cluster colonic lesions based on their global methylation profiles; however, its usefulness in identifying robust methylation markers is limited and rely on the data analysis method. We have identified 16 non-CIMP-panel genes for which we provide rationale for inclusion in a more comprehensive characterization of CIMP+ CRCs. The identification of a definitive list for methylome specific genes in CRC will contribute to better clinical management of CRC patients.

  18. Molecular Evolution of the Neural Crest Regulatory Network in Ray-Finned Fish

    PubMed Central

    Kratochwil, Claudius F.; Geissler, Laura; Irisarri, Iker; Meyer, Axel

    2015-01-01

    Abstract Gene regulatory networks (GRN) are central to developmental processes. They are composed of transcription factors and signaling molecules orchestrating gene expression modules that tightly regulate the development of organisms. The neural crest (NC) is a multipotent cell population that is considered a key innovation of vertebrates. Its derivatives contribute to shaping the astounding morphological diversity of jaws, teeth, head skeleton, or pigmentation. Here, we study the molecular evolution of the NC GRN by analyzing patterns of molecular divergence for a total of 36 genes in 16 species of bony fishes. Analyses of nonsynonymous to synonymous substitution rate ratios (dN/dS) support patterns of variable selective pressures among genes deployed at different stages of NC development, consistent with the developmental hourglass model. Model-based clustering techniques of sequence features support the notion of extreme conservation of NC-genes across the entire network. Our data show that most genes are under strong purifying selection that is maintained throughout ray-finned fish evolution. Late NC development genes reveal a pattern of increased constraints in more recent lineages. Additionally, seven of the NC-genes showed signs of relaxation of purifying selection in the famously species-rich lineage of cichlid fishes. This suggests that NC genes might have played a role in the adaptive radiation of cichlids by granting flexibility in the development of NC-derived traits—suggesting an important role for NC network architecture during the diversification in vertebrates. PMID:26475317

  19. An efficient procedure for marker-free mutagenesis of S. coelicolor by site-specific recombination for secondary metabolite overproduction.

    PubMed

    Zhang, Bo; Zhang, Lin; Dai, Ruixue; Yu, Meiying; Zhao, Guoping; Ding, Xiaoming

    2013-01-01

    Streptomyces bacteria are known for producing important natural compounds by secondary metabolism, especially antibiotics with novel biological activities. Functional studies of antibiotic-biosynthesizing gene clusters are generally through homologous genomic recombination by gene-targeting vectors. Here, we present a rapid and efficient method for construction of gene-targeting vectors. This approach is based on Streptomyces phage φBT1 integrase-mediated multisite in vitro site-specific recombination. Four 'entry clones' were assembled into a circular plasmid to generate the destination gene-targeting vector by a one-step reaction. The four 'entry clones' contained two clones of the upstream and downstream flanks of the target gene, a selectable marker and an E. coli-Streptomyces shuttle vector. After targeted modification of the genome, the selectable markers were removed by φC31 integrase-mediated in vivo site-specific recombination between pre-placed attB and attP sites. Using this method, part of the calcium-dependent antibiotic (CDA) and actinorhodin (Act) biosynthetic gene clusters were deleted, and the rrdA encoding RrdA, a negative regulator of Red production, was also deleted. The final prodiginine production of the engineered strain was over five times that of the wild-type strain. This straightforward φBT1 and φC31 integrase-based strategy provides an alternative approach for rapid gene-targeting vector construction and marker removal in streptomycetes.

  20. Release of (and lessons learned from mining) a pioneering large toxicogenomics database.

    PubMed

    Sandhu, Komal S; Veeramachaneni, Vamsi; Yao, Xiang; Nie, Alex; Lord, Peter; Amaratunga, Dhammika; McMillian, Michael K; Verheyen, Geert R

    2015-07-01

    We release the Janssen Toxicogenomics database. This rat liver gene-expression database was generated using Codelink microarrays, and has been used over the past years within Janssen to derive signatures for multiple end points and to classify proprietary compounds. The release consists of gene-expression responses to 124 compounds, selected to give a broad coverage of liver-active compounds. A selection of the compounds were also analyzed on Affymetrix microarrays. The release includes results of an in-house reannotation pipeline to Entrez gene annotations, to classify probes into different confidence classes. High confidence unambiguously annotated probes were used to create gene-level data which served as starting point for cross-platform comparisons. Connectivity map-based similarity methods show excellent agreement between Codelink and Affymetrix runs of the same samples. We also compared our dataset with the Japanese Toxicogenomics Project and observed reasonable agreement, especially for compounds with stronger gene signatures. We describe an R-package containing the gene-level data and show how it can be used for expression-based similarity searches. Comparing the same biological samples run on the Affymetrix and the Codelink platform, good correspondence is observed using connectivity mapping approaches. As expected, this correspondence is smaller when the data are compared with an independent dataset such as TG-GATE. We hope that this collection of gene-expression profiles will be incorporated in toxicogenomics pipelines of users.

  1. The base pairing RNA Spot 42 participates in a multi-output feedforward loop to help enact catabolite repression in Escherichia coli

    PubMed Central

    Beisel, Chase L.; Storz, Gisela

    2011-01-01

    SUMMARY Bacteria selectively consume some carbon sources over others through a regulatory mechanism termed catabolite repression. Here, we show that the base pairing RNA Spot 42 plays a broad role in catabolite repression in Escherichia coli by directly repressing genes involved in central and secondary metabolism, redox balancing, and the consumption of diverse non-preferred carbon sources. Many of the genes repressed by Spot 42 are transcriptionally activated by the global regulator CRP. Since CRP represses Spot 42, these regulators participate in a specific regulatory circuit called a multi-output feedforward loop. We found that this loop can reduce leaky expression of target genes in the presence of glucose and can maintain repression of target genes under changing nutrient conditions. Our results suggest that base pairing RNAs in feedforward loops can help shape the steady-state levels and dynamics of gene expression. PMID:21292161

  2. PROSPECT improves cis-acting regulatory element prediction by integrating expression profile data with consensus pattern searches

    PubMed Central

    Fujibuchi, Wataru; Anderson, John S. J.; Landsman, David

    2001-01-01

    Consensus pattern and matrix-based searches designed to predict cis-acting transcriptional regulatory sequences have historically been subject to large numbers of false positives. We sought to decrease false positives by incorporating expression profile data into a consensus pattern-based search method. We have systematically analyzed the expression phenotypes of over 6000 yeast genes, across 121 expression profile experiments, and correlated them with the distribution of 14 known regulatory elements over sequences upstream of the genes. Our method is based on a metric we term probabilistic element assessment (PEA), which is a ranking of potential sites based on sequence similarity in the upstream regions of genes with similar expression phenotypes. For eight of the 14 known elements that we examined, our method had a much higher selectivity than a naïve consensus pattern search. Based on our analysis, we have developed a web-based tool called PROSPECT, which allows consensus pattern-based searching of gene clusters obtained from microarray data. PMID:11574681

  3. Recognition of digital characteristics based new improved genetic algorithm

    NASA Astrophysics Data System (ADS)

    Wang, Meng; Xu, Guoqiang; Lin, Zihao

    2017-08-01

    In the field of digital signal processing, Estimating the characteristics of signal modulation parameters is an significant research direction. The paper determines the set of eigenvalue which can show the difference of the digital signal modulation based on the deep research of the new improved genetic algorithm. Firstly take them as the best gene pool; secondly, The best gene pool will be changed in the genetic evolvement by selecting, overlapping and eliminating each other; Finally, Adapting the strategy of futher enhance competition and punishment to more optimizer the gene pool and ensure each generation are of high quality gene. The simulation results show that this method not only has the global convergence, stability and faster convergence speed.

  4. Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification.

    PubMed

    Elyasigomari, V; Lee, D A; Screen, H R C; Shaheed, M H

    2017-03-01

    For each cancer type, only a few genes are informative. Due to the so-called 'curse of dimensionality' problem, the gene selection task remains a challenge. To overcome this problem, we propose a two-stage gene selection method called MRMR-COA-HS. In the first stage, the minimum redundancy and maximum relevance (MRMR) feature selection is used to select a subset of relevant genes. The selected genes are then fed into a wrapper setup that combines a new algorithm, COA-HS, using the support vector machine as a classifier. The method was applied to four microarray datasets, and the performance was assessed by the leave one out cross-validation method. Comparative performance assessment of the proposed method with other evolutionary algorithms suggested that the proposed algorithm significantly outperforms other methods in selecting a fewer number of genes while maintaining the highest classification accuracy. The functions of the selected genes were further investigated, and it was confirmed that the selected genes are biologically relevant to each cancer type. Copyright © 2017. Published by Elsevier Inc.

  5. Accelerated Evolution of Developmentally Biased Genes in the Tetraphenic Ant Cardiocondyla obscurior.

    PubMed

    Schrader, Lukas; Helanterä, Heikki; Oettler, Jan

    2017-03-01

    Plastic gene expression underlies phenotypic plasticity and plastically expressed genes evolve under different selection regimes compared with ubiquitously expressed genes. Social insects are well-suited models to elucidate the evolutionary dynamics of plastic genes for their genetically and environmentally induced discrete polymorphisms. Here, we study the evolution of plastically expressed genes in the ant Cardiocondyla obscurior-a species that produces two discrete male morphs in addition to the typical female polymorphism of workers and queens. Based on individual-level gene expression data from 28 early third instar larvae, we test whether the same evolutionary dynamics that pertain to plastically expressed genes in adults also pertain to genes with plastic expression during development. In order to quantify plasticity of gene expression over multiple contrasts, we develop a novel geometric measure. For genes expressed during development, we show that plasticity of expression is positively correlated with evolutionary rates. We furthermore find a strong correlation between expression plasticity and expression variation within morphs, suggesting a close link between active and passive plasticity of gene expression. Our results support the notion of relaxed selection and neutral processes as important drivers in the evolution of adaptive plasticity. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  6. PanDaTox: A tool for accelerated metabolic engineering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Amitai, Gil; Sorek, Rotem

    2012-07-18

    Metabolic engineering is often facilitated by cloning of genes encoding enzymes from various heterologous organisms into E. coli. Such engineering efforts are frequently hampered by foreign genes that are toxic to the E. coli host. We have developed PanDaTox (www.weizmann.ac.il/pandatox), a web-based resource that provides experimental toxicity information for more than 1.5 million genes from hundreds of different microbial genomes. The toxicity predictions, which were extensively experimentally verified, are based on serial cloning of genes into E. coli as part of the Sanger whole genome shotgun sequencing process. PanDaTox can accelerate metabolic engineering projects by allowing researchers to exclude toxicmore » genes from the engineering plan and verify the clonability of selected genes before the actual metabolic engineering experiments are conducted.« less

  7. Genome-wide scan for selection signatures in six cattle breeds in South Africa.

    PubMed

    Makina, Sithembile O; Muchadeyi, Farai C; van Marle-Köster, Este; Taylor, Jerry F; Makgahlela, Mahlako L; Maiwashe, Azwihangwisi

    2015-11-26

    The detection of selection signatures in breeds of livestock species can contribute to the identification of regions of the genome that are, or have been, functionally important and, as a consequence, have been targeted by selection. This study used two approaches to detect signatures of selection within and between six cattle breeds in South Africa, including Afrikaner (n = 44), Nguni (n = 54), Drakensberger (n = 47), Bonsmara (n = 44), Angus (n = 31) and Holstein (n = 29). The first approach was based on the detection of genomic regions in which haplotypes have been driven towards complete fixation within breeds. The second approach identified regions of the genome that had very different allele frequencies between populations (F ST). Forty-seven candidate genomic regions were identified as harbouring putative signatures of selection using both methods. Twelve of these candidate selected regions were shared among the breeds and ten were validated by previous studies. Thirty-three of these regions were successfully annotated and candidate genes were identified. Among these genes the keratin genes (KRT222, KRT24, KRT25, KRT26, and KRT27) and one heat shock protein gene (HSPB9) on chromosome 19 between 42,896,570 and 42,897,840 bp were detected for the Nguni breed. These genes were previously associated with adaptation to tropical environments in Zebu cattle. In addition, a number of candidate genes associated with the nervous system (WNT5B, FMOD, PRELP, and ATP2B), immune response (CYM, CDC6, and CDK10), production (MTPN, IGFBP4, TGFB1, and AJAP1) and reproductive performance (ADIPOR2, OVOS2, and RBBP8) were also detected as being under selection. The results presented here provide a foundation for detecting mutations that underlie genetic variation of traits that have economic importance for cattle breeds in South Africa.

  8. Simple Monitoring of Gene Targeting Efficiency in Human Somatic Cell Lines Using the PIGA Gene

    PubMed Central

    Karnan, Sivasundaram; Konishi, Yuko; Ota, Akinobu; Takahashi, Miyuki; Damdindorj, Lkhagvasuren; Hosokawa, Yoshitaka; Konishi, Hiroyuki

    2012-01-01

    Gene targeting in most of human somatic cell lines has been labor-intensive because of low homologous recombination efficiency. The development of an experimental system that permits a facile evaluation of gene targeting efficiency in human somatic cell lines is the first step towards the improvement of this technology and its application to a broad range of cell lines. In this study, we utilized phosphatidylinositol glycan anchor biosynthesis class A (PIGA), a gene essential for the synthesis of glycosylphosphatidyl inositol (GPI) anchors, as a reporter of gene targeting events in human somatic cell lines. Targeted disruption of PIGA was quantitatively detected with FLAER, a reagent that specifically binds to GPI anchors. Using this PIGA-based reporter system, we successfully detected adeno-associated virus (AAV)-mediated gene targeting events both with and without promoter-trap enrichment of gene-targeted cell population. The PIGA-based reporter system was also capable of reproducing previous findings that an AAV-mediated gene targeting achieves a remarkably higher ratio of homologous versus random integration (H/R ratio) of targeting vectors than a plasmid-mediated gene targeting. The PIGA-based system also detected an approximately 2-fold increase in the H/R ratio achieved by a small negative selection cassette introduced at the end of the AAV-based targeting vector with a promoter-trap system. Thus, our PIGA-based system is useful for monitoring AAV-mediated gene targeting and will assist in improving gene targeting technology in human somatic cell lines. PMID:23056640

  9. Selection Shapes Transcriptional Logic and Regulatory Specialization in Genetic Networks.

    PubMed

    Fogelmark, Karl; Peterson, Carsten; Troein, Carl

    2016-01-01

    Living organisms need to regulate their gene expression in response to environmental signals and internal cues. This is a computational task where genes act as logic gates that connect to form transcriptional networks, which are shaped at all scales by evolution. Large-scale mutations such as gene duplications and deletions add and remove network components, whereas smaller mutations alter the connections between them. Selection determines what mutations are accepted, but its importance for shaping the resulting networks has been debated. To investigate the effects of selection in the shaping of transcriptional networks, we derive transcriptional logic from a combinatorially powerful yet tractable model of the binding between DNA and transcription factors. By evolving the resulting networks based on their ability to function as either a simple decision system or a circadian clock, we obtain information on the regulation and logic rules encoded in functional transcriptional networks. Comparisons are made between networks evolved for different functions, as well as with structurally equivalent but non-functional (neutrally evolved) networks, and predictions are validated against the transcriptional network of E. coli. We find that the logic rules governing gene expression depend on the function performed by the network. Unlike the decision systems, the circadian clocks show strong cooperative binding and negative regulation, which achieves tight temporal control of gene expression. Furthermore, we find that transcription factors act preferentially as either activators or repressors, both when binding multiple sites for a single target gene and globally in the transcriptional networks. This separation into positive and negative regulators requires gene duplications, which highlights the interplay between mutation and selection in shaping the transcriptional networks.

  10. High-efficiency CRISPR/Cas9 multiplex gene editing using the glycine tRNA-processing system-based strategy in maize.

    PubMed

    Qi, Weiwei; Zhu, Tong; Tian, Zhongrui; Li, Chaobin; Zhang, Wei; Song, Rentao

    2016-08-11

    CRISPR/Cas9 genome editing strategy has been applied to a variety of species and the tRNA-processing system has been used to compact multiple gRNAs into one synthetic gene for manipulating multiple genes in rice. We optimized and introduced the multiplex gene editing strategy based on the tRNA-processing system into maize. Maize glycine-tRNA was selected to design multiple tRNA-gRNA units for the simultaneous production of numerous gRNAs under the control of one maize U6 promoter. We designed three gRNAs for simplex editing and three multiple tRNA-gRNA units for multiplex editing. The results indicate that this system not only increased the number of targeted sites but also enhanced mutagenesis efficiency in maize. Additionally, we propose an advanced sequence selection of gRNA spacers for relatively more efficient and accurate chromosomal fragment deletion, which is important for complete abolishment of gene function especially long non-coding RNAs (lncRNAs). Our results also indicated that up to four tRNA-gRNA units in one expression cassette design can still work in maize. The examples reported here demonstrate the utility of the tRNA-processing system-based strategy as an efficient multiplex genome editing tool to enhance maize genetic research and breeding.

  11. The heritage of pathogen pressures and ancient demography in the human innate-immunity CD209/CD209L region.

    PubMed

    Barreiro, Luis B; Patin, Etienne; Neyrolles, Olivier; Cann, Howard M; Gicquel, Brigitte; Quintana-Murci, Lluís

    2005-11-01

    The innate immunity system constitutes the first line of host defense against pathogens. Two closely related innate immunity genes, CD209 and CD209L, are particularly interesting because they directly recognize a plethora of pathogens, including bacteria, viruses, and parasites. Both genes, which result from an ancient duplication, possess a neck region, made up of seven repeats of 23 amino acids each, known to play a major role in the pathogen-binding properties of these proteins. To explore the extent to which pathogens have exerted selective pressures on these innate immunity genes, we resequenced them in a group of samples from sub-Saharan Africa, Europe, and East Asia. Moreover, variation in the number of repeats of the neck region was defined in the entire Human Genome Diversity Panel for both genes. Our results, which are based on diversity levels, neutrality tests, population genetic distances, and neck-region length variation, provide genetic evidence that CD209 has been under a strong selective constraint that prevents accumulation of any amino acid changes, whereas CD209L variability has most likely been shaped by the action of balancing selection in non-African populations. In addition, our data point to the neck region as the functional target of such selective pressures: CD209 presents a constant size in the neck region populationwide, whereas CD209L presents an excess of length variation, particularly in non-African populations. An additional interesting observation came from the coalescent-based CD209 gene tree, whose binary topology and time depth (approximately 2.8 million years ago) are compatible with an ancestral population structure in Africa. Altogether, our study has revealed that even a short segment of the human genome can uncover an extraordinarily complex evolutionary history, including different pathogen pressures on host genes as well as traces of admixture among archaic hominid populations.

  12. Simple Method for Markerless Gene Deletion in Multidrug-Resistant Acinetobacter baumannii

    PubMed Central

    Oh, Man Hwan; Lee, Je Chul; Kim, Jungmin

    2015-01-01

    The traditional markerless gene deletion technique based on overlap extension PCR has been used for generating gene deletions in multidrug-resistant Acinetobacter baumannii. However, the method is time-consuming because it requires restriction digestion of the PCR products in DNA cloning and the construction of new vectors containing a suitable antibiotic resistance cassette for the selection of A. baumannii merodiploids. Moreover, the availability of restriction sites and the selection of recombinant bacteria harboring the desired chimeric plasmid are limited, making the construction of a chimeric plasmid more difficult. We describe a rapid and easy cloning method for markerless gene deletion in A. baumannii, which has no limitation in the availability of restriction sites and allows for easy selection of the clones carrying the desired chimeric plasmid. Notably, it is not necessary to construct new vectors in our method. This method utilizes direct cloning of blunt-end DNA fragments, in which upstream and downstream regions of the target gene are fused with an antibiotic resistance cassette via overlap extension PCR and are inserted into a blunt-end suicide vector developed for blunt-end cloning. Importantly, the antibiotic resistance cassette is placed outside the downstream region in order to enable easy selection of the recombinants carrying the desired plasmid, to eliminate the antibiotic resistance cassette via homologous recombination, and to avoid the necessity of constructing new vectors. This strategy was successfully applied to functional analysis of the genes associated with iron acquisition by A. baumannii ATCC 19606 and to ompA gene deletion in other A. baumannii strains. Consequently, the proposed method is invaluable for markerless gene deletion in multidrug-resistant A. baumannii. PMID:25746991

  13. Pooled-DNA sequencing identifies genomic regions of selection in Nigerian isolates of Plasmodium falciparum.

    PubMed

    Oyebola, Kolapo M; Idowu, Emmanuel T; Olukosi, Yetunde A; Awolola, Taiwo S; Amambua-Ngwa, Alfred

    2017-06-29

    The burden of falciparum malaria is especially high in sub-Saharan Africa. Differences in pressure from host immunity and antimalarial drugs lead to adaptive changes responsible for high level of genetic variations within and between the parasite populations. Population-specific genetic studies to survey for genes under positive or balancing selection resulting from drug pressure or host immunity will allow for refinement of interventions. We performed a pooled sequencing (pool-seq) of the genomes of 100 Plasmodium falciparum isolates from Nigeria. We explored allele-frequency based neutrality test (Tajima's D) and integrated haplotype score (iHS) to identify genes under selection. Fourteen shared iHS regions that had at least 2 SNPs with a score > 2.5 were identified. These regions code for genes that were likely to have been under strong directional selection. Two of these genes were the chloroquine resistance transporter (CRT) on chromosome 7 and the multidrug resistance 1 (MDR1) on chromosome 5. There was a weak signature of selection in the dihydrofolate reductase (DHFR) gene on chromosome 4 and MDR5 genes on chromosome 13, with only 2 and 3 SNPs respectively identified within the iHS window. We observed strong selection pressure attributable to continued chloroquine and sulfadoxine-pyrimethamine use despite their official proscription for the treatment of uncomplicated malaria. There was also a major selective sweep on chromosome 6 which had 32 SNPs within the shared iHS region. Tajima's D of circumsporozoite protein (CSP), erythrocyte-binding antigen (EBA-175), merozoite surface proteins - MSP3 and MSP7, merozoite surface protein duffy binding-like (MSPDBL2) and serine repeat antigen (SERA-5) were 1.38, 1.29, 0.73, 0.84 and 0.21, respectively. We have demonstrated the use of pool-seq to understand genomic patterns of selection and variability in P. falciparum from Nigeria, which bears the highest burden of infections. This investigation identified known genomic signatures of selection from drug pressure and host immunity. This is evidence that P. falciparum populations explore common adaptive strategies that can be targeted for the development of new interventions.

  14. dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface.

    PubMed

    Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz

    2009-08-25

    Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms.

  15. dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface

    PubMed Central

    Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz

    2009-01-01

    Background Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. Results We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. Conclusion dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms. PMID:19706156

  16. Genetic speciation of environmental Legionella isolates in Thailand.

    PubMed

    Paveenkittiporn, Wantana; Dejsirilert, Surang; Kalambaheti, Thareerat

    2012-10-01

    Legionella-like organisms were isolated during 2003-2007 from various water resources by culturing on selective media of Wadowsky-Yee-Okuda agar. The 256 isolates were identified as belonging to the Legionella genus based on detection of 108 bp PCR product of the 5S rRNA gene, while the inclusion as Legionella pneumophila were confirmed by PCR detection of a specific mip gene region of 168 bp. The 50 isolates, identified as non-pneumophila, were then subjected to DNA tree analysis, based on mip gene of ~650 bp and rnpB genes product ranged from 304 to 354 bp. Phylogenetic tree was constructed to predict their species in relative to the available database. The isolates of which their speciation, based on those two genes were inconclusive, were then investigated for the almost full-length of 16S rRNA sequences. The isolates were assigned as 16 known Legionella species, and proposed seven novel species based on their unique 16S rRNA sequence. Copyright © 2012 Elsevier B.V. All rights reserved.

  17. The development of a cisgenic apple plant.

    PubMed

    Vanblaere, Thalia; Szankowski, Iris; Schaart, Jan; Schouten, Henk; Flachowsky, Henryk; Broggini, Giovanni A L; Gessler, Cesare

    2011-07-20

    Cisgenesis represents a step toward a new generation of GM crops. The lack of selectable genes (e.g. antibiotic or herbicide resistance) in the final product and the fact that the inserted gene(s) derive from organisms sexually compatible with the target crop should rise less environmental concerns and increase consumer's acceptance. Here we report the generation of a cisgenic apple plant by inserting the endogenous apple scab resistance gene HcrVf2 under the control of its own regulatory sequences into the scab susceptible apple cultivar Gala. A previously developed method based on Agrobacterium-mediated transformation combined with a positive and negative selection system and a chemically inducible recombination machinery allowed the generation of apple cv. Gala carrying the scab resistance gene HcrVf2 under its native regulatory sequences and no foreign genes. Three cisgenic lines were chosen for detailed investigation and were shown to carry a single T-DNA insertion and express the target gene HcrVf2. This is the first report of the generation of a true cisgenic plant. Copyright © 2011 Elsevier B.V. All rights reserved.

  18. Evaluation and selection of reliable reference genes for gene expression under abiotic stress in cotton (Gossypium hirsutum L.).

    PubMed

    Wang, Min; Wang, Qinglian; Zhang, Baohong

    2013-11-01

    Reference genes are critical for normalization of the gene expression level of target genes. The widely used housekeeping genes may change their expression levels at different tissue under different treatment or stress conditions. Therefore, systematical evaluation on the housekeeping genes is required for gene expression analysis. Up to date, no work was performed to evaluate the housekeeping genes in cotton under stress treatment. In this study, we chose 10 housekeeping genes to systematically assess their expression levels at two different tissues (leaves and roots) under two different abiotic stresses (salt and drought) with three different concentrations. Our results show that there is no best reference gene for all tissues at all stress conditions. The reliable reference gene should be selected based on a specific condition. For example, under salt stress, UBQ7, GAPDH and EF1A8 are better reference genes in leaves; TUA10, UBQ7, CYP1, GAPDH and EF1A8 were better in roots. Under drought stress, UBQ7, EF1A8, TUA10, and GAPDH showed less variety of expression level in leaves and roots. Thus, it is better to identify reliable reference genes first before performing any gene expression analysis. However, using a combination of housekeeping genes as reference gene may provide a new strategy for normalization of gene expression. In this study, we found that combination of four housekeeping genes worked well as reference genes under all the stress conditions. © 2013.

  19. Scanning of selection signature provides a glimpse into important economic traits in goats (Capra hircus).

    PubMed

    Guan, Dailu; Luo, Nanjian; Tan, Xiaoshan; Zhao, Zhongquan; Huang, Yongfu; Na, Risu; Zhang, Jiahua; Zhao, Yongju

    2016-10-31

    Goats (Capra hircus) are one of the oldest livestock domesticated species, and have been used for their milk, meat, hair and skins over much of the world. Detection of selection footprints in genomic regions can provide potential insights for understanding the genetic mechanism of specific phenotypic traits and better guide in animal breeding. The study presented here has generated 192.747G raw data and identified more than 5.03 million single-nucleotide polymorphisms (SNPs) and 334,151 Indels (insertions and deletions). In addition, we identified 155 and 294 candidate regions harboring 86 and 97 genes based on allele frequency differences in Dazu black goats (DBG) and Inner Mongolia cashmere goats (IMCG), respectively. Populations differentiation reflected by Fst values detected 368 putative selective sweep regions including 164 genes. The top 1% regions of both low heterozygosity and high genetic differentiation contained 239 (135 genes) and 176 (106 genes) candidate regions in DBG and IMCG, respectively. These genes were related to reproductive and productive traits, such as "neurohypophyseal hormone activity" and "adipocytokine signaling pathway". These findings may be conducive to molecular breeding and the long-term preservation of the valuable genetic resources for this species.

  20. Balancing selection on immunity genes: review of the current literature and new analysis in Drosophila melanogaster.

    PubMed

    Croze, Myriam; Živković, Daniel; Stephan, Wolfgang; Hutter, Stephan

    2016-08-01

    Balancing selection has been widely assumed to be an important evolutionary force, yet even today little is known about its abundance and its impact on the patterns of genetic diversity. Several studies have shown examples of balancing selection in humans, plants or parasites, and many genes under balancing selection are involved in immunity. It has been proposed that host-parasite coevolution is one of the main forces driving immune genes to evolve under balancing selection. In this paper, we review the literature on balancing selection on immunity genes in several organisms, including Drosophila. Furthermore, we performed a genome scan for balancing selection in an African population of Drosophila melanogaster using coalescent simulations of a demographic model with and without selection. We find very few genes under balancing selection and only one novel candidate gene related to immunity. Finally, we discuss the possible causes of the low number of genes under balancing selection. Copyright © 2016 The Authors. Published by Elsevier GmbH.. All rights reserved.

  1. Genes under positive selection in a model plant pathogenic fungus, Botrytis.

    PubMed

    Aguileta, Gabriela; Lengelle, Juliette; Chiapello, Hélène; Giraud, Tatiana; Viaud, Muriel; Fournier, Elisabeth; Rodolphe, François; Marthey, Sylvain; Ducasse, Aurélie; Gendrault, Annie; Poulain, Julie; Wincker, Patrick; Gout, Lilian

    2012-07-01

    The rapid evolution of particular genes is essential for the adaptation of pathogens to new hosts and new environments. Powerful methods have been developed for detecting targets of selection in the genome. Here we used divergence data to compare genes among four closely related fungal pathogens adapted to different hosts to elucidate the functions putatively involved in adaptive processes. For this goal, ESTs were sequenced in the specialist fungal pathogens Botrytis tulipae and Botrytis ficariarum, and compared with genome sequences of Botrytis cinerea and Sclerotinia sclerotiorum, responsible for diseases on over 200 plant species. A maximum likelihood-based analysis of 642 predicted orthologs detected 21 genes showing footprints of positive selection. These results were validated by resequencing nine of these genes in additional Botrytis species, showing they have also been rapidly evolving in other related species. Twenty of the 21 genes had not previously been identified as pathogenicity factors in B. cinerea, but some had functions related to plant-fungus interactions. The putative functions were involved in respiratory and energy metabolism, protein and RNA metabolism, signal transduction or virulence, similarly to what was detected in previous studies using the same approach in other pathogens. Mutants of B. cinerea were generated for four of these genes as a first attempt to elucidate their functions. Copyright © 2012 Elsevier B.V. All rights reserved.

  2. Bacterial evolution through the selective loss of beneficial Genes. Trade-offs in expression involving two loci.

    PubMed Central

    Zinser, Erik R; Schneider, Dominique; Blot, Michel; Kolter, Roberto

    2003-01-01

    The loss of preexisting genes or gene activities during evolution is a major mechanism of ecological specialization. Evolutionary processes that can account for gene loss or inactivation have so far been restricted to one of two mechanisms: direct selection for the loss of gene activities that are disadvantageous under the conditions of selection (i.e., antagonistic pleiotropy) and selection-independent genetic drift of neutral (or nearly neutral) mutations (i.e., mutation accumulation). In this study we demonstrate with an evolved strain of Escherichia coli that a third, distinct mechanism exists by which gene activities can be lost. This selection-dependent mechanism involves the expropriation of one gene's upstream regulatory element by a second gene via a homologous recombination event. Resulting from this genetic exchange is the activation of the second gene and a concomitant inactivation of the first gene. This gene-for-gene expression tradeoff provides a net fitness gain, even if the forfeited activity of the first gene can play a positive role in fitness under the conditions of selection. PMID:12930738

  3. Transcriptome sequencing of Eucalyptus camaldulensis seedlings subjected to water stress reveals functional single nucleotide polymorphisms and genes under selection

    PubMed Central

    2012-01-01

    Background Water stress limits plant survival and production in many parts of the world. Identification of genes and alleles responding to water stress conditions is important in breeding plants better adapted to drought. Currently there are no studies examining the transcriptome wide gene and allelic expression patterns under water stress conditions. We used RNA sequencing (RNA-seq) to identify the candidate genes and alleles and to explore the evolutionary signatures of selection. Results We studied the effect of water stress on gene expression in Eucalyptus camaldulensis seedlings derived from three natural populations. We used reference-guided transcriptome mapping to study gene expression. Several genes showed differential expression between control and stress conditions. Gene ontology (GO) enrichment tests revealed up-regulation of 140 stress-related gene categories and down-regulation of 35 metabolic and cell wall organisation gene categories. More than 190,000 single nucleotide polymorphisms (SNPs) were detected and 2737 of these showed differential allelic expression. Allelic expression of 52% of these variants was correlated with differential gene expression. Signatures of selection patterns were studied by estimating the proportion of nonsynonymous to synonymous substitution rates (Ka/Ks). The average Ka/Ks ratio among the 13,719 genes was 0.39 indicating that most of the genes are under purifying selection. Among the positively selected genes (Ka/Ks > 1.5) apoptosis and cell death categories were enriched. Of the 287 positively selected genes, ninety genes showed differential expression and 27 SNPs from 17 positively selected genes showed differential allelic expression between treatments. Conclusions Correlation of allelic expression of several SNPs with total gene expression indicates that these variants may be the cis-acting variants or in linkage disequilibrium with such variants. Enrichment of apoptosis and cell death gene categories among the positively selected genes reveals the past selection pressures experienced by the populations used in this study. PMID:22853646

  4. Estimating the parameters of background selection and selective sweeps in Drosophila in the presence of gene conversion

    PubMed Central

    Campos, José Luis; Charlesworth, Brian

    2017-01-01

    We used whole-genome resequencing data from a population of Drosophila melanogaster to investigate the causes of the negative correlation between the within-population synonymous nucleotide site diversity (πS) of a gene and its degree of divergence from related species at nonsynonymous nucleotide sites (KA). By using the estimated distributions of mutational effects on fitness at nonsynonymous and UTR sites, we predicted the effects of background selection at sites within a gene on πS and found that these could account for only part of the observed correlation between πS and KA. We developed a model of the effects of selective sweeps that included gene conversion as well as crossing over. We used this model to estimate the average strength of selection on positively selected mutations in coding sequences and in UTRs, as well as the proportions of new mutations that are selectively advantageous. Genes with high levels of selective constraint on nonsynonymous sites were found to have lower strengths of positive selection and lower proportions of advantageous mutations than genes with low levels of constraint. Overall, background selection and selective sweeps within a typical gene reduce its synonymous diversity to ∼75% of its value in the absence of selection, with larger reductions for genes with high KA. Gene conversion has a major effect on the estimates of the parameters of positive selection, such that the estimated strength of selection on favorable mutations is greatly reduced if it is ignored. PMID:28559322

  5. A Risk Stratification Model for Lung Cancer Based on Gene Coexpression Network and Deep Learning

    PubMed Central

    2018-01-01

    Risk stratification model for lung cancer with gene expression profile is of great interest. Instead of previous models based on individual prognostic genes, we aimed to develop a novel system-level risk stratification model for lung adenocarcinoma based on gene coexpression network. Using multiple microarray, gene coexpression network analysis was performed to identify survival-related networks. A deep learning based risk stratification model was constructed with representative genes of these networks. The model was validated in two test sets. Survival analysis was performed using the output of the model to evaluate whether it could predict patients' survival independent of clinicopathological variables. Five networks were significantly associated with patients' survival. Considering prognostic significance and representativeness, genes of the two survival-related networks were selected for input of the model. The output of the model was significantly associated with patients' survival in two test sets and training set (p < 0.00001, p < 0.0001 and p = 0.02 for training and test sets 1 and 2, resp.). In multivariate analyses, the model was associated with patients' prognosis independent of other clinicopathological features. Our study presents a new perspective on incorporating gene coexpression networks into the gene expression signature and clinical application of deep learning in genomic data science for prognosis prediction. PMID:29581968

  6. A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds.

    PubMed

    Yang, Songbai; Li, Xiuling; Li, Kui; Fan, Bin; Tang, Zhonglin

    2014-01-15

    Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and production performance that differ from those of foreign commercial pig breeds. However, the signatures of selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been poorly understood. We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed that the genes that displayed evidence of positive selection were mainly involved in the development of tissues and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern VS Southern groups, respectively. In this study, we first identified the genomic regions that showed evidences of selection between Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be involved in the development of tissues and organs, the immune response, growth and litter size. The results of this study provide new insights into understanding the genetic variation and domestication in pigs.

  7. A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds

    PubMed Central

    2014-01-01

    Background Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and production performance that differ from those of foreign commercial pig breeds. However, the signatures of selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been poorly understood. Results We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed that the genes that displayed evidence of positive selection were mainly involved in the development of tissues and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern VS Southern groups, respectively. Conclusions In this study, we first identified the genomic regions that showed evidences of selection between Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be involved in the development of tissues and organs, the immune response, growth and litter size. The results of this study provide new insights into understanding the genetic variation and domestication in pigs. PMID:24422716

  8. Comparative analysis of the domestic cat genome reveals genetic signatures underlying feline biology and domestication.

    PubMed

    Montague, Michael J; Li, Gang; Gandolfi, Barbara; Khan, Razib; Aken, Bronwen L; Searle, Steven M J; Minx, Patrick; Hillier, LaDeana W; Koboldt, Daniel C; Davis, Brian W; Driscoll, Carlos A; Barr, Christina S; Blackistone, Kevin; Quilez, Javier; Lorente-Galdos, Belen; Marques-Bonet, Tomas; Alkan, Can; Thomas, Gregg W C; Hahn, Matthew W; Menotti-Raymond, Marilyn; O'Brien, Stephen J; Wilson, Richard K; Lyons, Leslie A; Murphy, William J; Warren, Wesley C

    2014-12-02

    Little is known about the genetic changes that distinguish domestic cat populations from their wild progenitors. Here we describe a high-quality domestic cat reference genome assembly and comparative inferences made with other cat breeds, wildcats, and other mammals. Based upon these comparisons, we identified positively selected genes enriched for genes involved in lipid metabolism that underpin adaptations to a hypercarnivorous diet. We also found positive selection signals within genes underlying sensory processes, especially those affecting vision and hearing in the carnivore lineage. We observed an evolutionary tradeoff between functional olfactory and vomeronasal receptor gene repertoires in the cat and dog genomes, with an expansion of the feline chemosensory system for detecting pheromones at the expense of odorant detection. Genomic regions harboring signatures of natural selection that distinguish domestic cats from their wild congeners are enriched in neural crest-related genes associated with behavior and reward in mouse models, as predicted by the domestication syndrome hypothesis. Our description of a previously unidentified allele for the gloving pigmentation pattern found in the Birman breed supports the hypothesis that cat breeds experienced strong selection on specific mutations drawn from random bred populations. Collectively, these findings provide insight into how the process of domestication altered the ancestral wildcat genome and build a resource for future disease mapping and phylogenomic studies across all members of the Felidae.

  9. Comparative analysis of the domestic cat genome reveals genetic signatures underlying feline biology and domestication

    PubMed Central

    Li, Gang; Gandolfi, Barbara; Khan, Razib; Aken, Bronwen L.; Searle, Steven M. J.; Minx, Patrick; Hillier, LaDeana W.; Koboldt, Daniel C.; Davis, Brian W.; Driscoll, Carlos A.; Barr, Christina S.; Blackistone, Kevin; Quilez, Javier; Lorente-Galdos, Belen; Marques-Bonet, Tomas; Alkan, Can; Thomas, Gregg W. C.; Hahn, Matthew W.; Menotti-Raymond, Marilyn; O’Brien, Stephen J.; Wilson, Richard K.; Lyons, Leslie A.; Murphy, William J.; Warren, Wesley C.

    2014-01-01

    Little is known about the genetic changes that distinguish domestic cat populations from their wild progenitors. Here we describe a high-quality domestic cat reference genome assembly and comparative inferences made with other cat breeds, wildcats, and other mammals. Based upon these comparisons, we identified positively selected genes enriched for genes involved in lipid metabolism that underpin adaptations to a hypercarnivorous diet. We also found positive selection signals within genes underlying sensory processes, especially those affecting vision and hearing in the carnivore lineage. We observed an evolutionary tradeoff between functional olfactory and vomeronasal receptor gene repertoires in the cat and dog genomes, with an expansion of the feline chemosensory system for detecting pheromones at the expense of odorant detection. Genomic regions harboring signatures of natural selection that distinguish domestic cats from their wild congeners are enriched in neural crest-related genes associated with behavior and reward in mouse models, as predicted by the domestication syndrome hypothesis. Our description of a previously unidentified allele for the gloving pigmentation pattern found in the Birman breed supports the hypothesis that cat breeds experienced strong selection on specific mutations drawn from random bred populations. Collectively, these findings provide insight into how the process of domestication altered the ancestral wildcat genome and build a resource for future disease mapping and phylogenomic studies across all members of the Felidae. PMID:25385592

  10. Differentiation of Xylella fastidiosa Strains via Multilocus Sequence Analysis of Environmentally Mediated Genes (MLSA-E)

    PubMed Central

    Parker, Jennifer K.; Havird, Justin C.

    2012-01-01

    Isolates of the plant pathogen Xylella fastidiosa are genetically very similar, but studies on their biological traits have indicated differences in virulence and infection symptomatology. Taxonomic analyses have identified several subspecies, and phylogenetic analyses of housekeeping genes have shown broad host-based genetic differences; however, results are still inconclusive for genetic differentiation of isolates within subspecies. This study employs multilocus sequence analysis of environmentally mediated genes (MLSA-E; genes influenced by environmental factors) to investigate X. fastidiosa relationships and differentiate isolates with low genetic variability. Potential environmentally mediated genes, including host colonization and survival genes related to infection establishment, were identified a priori. The ratio of the rate of nonsynonymous substitutions to the rate of synonymous substitutions (dN/dS) was calculated to select genes that may be under increased positive selection compared to previously studied housekeeping genes. Nine genes were sequenced from 54 X. fastidiosa isolates infecting different host plants across the United States. Results of maximum likelihood (ML) and Bayesian phylogenetic (BP) analyses are in agreement with known X. fastidiosa subspecies clades but show novel within-subspecies differentiation, including geographic differentiation, and provide additional information regarding host-based isolate variation and specificity. dN/dS ratios of environmentally mediated genes, though <1 due to high sequence similarity, are significantly greater than housekeeping gene dN/dS ratios and correlate with increased sequence variability. MLSA-E can more precisely resolve relationships between closely related bacterial strains with low genetic variability, such as X. fastidiosa isolates. Discovering the genetic relationships between X. fastidiosa isolates will provide new insights into the epidemiology of populations of X. fastidiosa, allowing improved disease management in economically important crops. PMID:22194287

  11. Differentiation of Xylella fastidiosa strains via multilocus sequence analysis of environmentally mediated genes (MLSA-E).

    PubMed

    Parker, Jennifer K; Havird, Justin C; De La Fuente, Leonardo

    2012-03-01

    Isolates of the plant pathogen Xylella fastidiosa are genetically very similar, but studies on their biological traits have indicated differences in virulence and infection symptomatology. Taxonomic analyses have identified several subspecies, and phylogenetic analyses of housekeeping genes have shown broad host-based genetic differences; however, results are still inconclusive for genetic differentiation of isolates within subspecies. This study employs multilocus sequence analysis of environmentally mediated genes (MLSA-E; genes influenced by environmental factors) to investigate X. fastidiosa relationships and differentiate isolates with low genetic variability. Potential environmentally mediated genes, including host colonization and survival genes related to infection establishment, were identified a priori. The ratio of the rate of nonsynonymous substitutions to the rate of synonymous substitutions (dN/dS) was calculated to select genes that may be under increased positive selection compared to previously studied housekeeping genes. Nine genes were sequenced from 54 X. fastidiosa isolates infecting different host plants across the United States. Results of maximum likelihood (ML) and Bayesian phylogenetic (BP) analyses are in agreement with known X. fastidiosa subspecies clades but show novel within-subspecies differentiation, including geographic differentiation, and provide additional information regarding host-based isolate variation and specificity. dN/dS ratios of environmentally mediated genes, though <1 due to high sequence similarity, are significantly greater than housekeeping gene dN/dS ratios and correlate with increased sequence variability. MLSA-E can more precisely resolve relationships between closely related bacterial strains with low genetic variability, such as X. fastidiosa isolates. Discovering the genetic relationships between X. fastidiosa isolates will provide new insights into the epidemiology of populations of X. fastidiosa, allowing improved disease management in economically important crops.

  12. "Contrasting patterns of selection at Pinus pinaster Ait. Drought stress candidate genes as revealed by genetic differentiation analyses".

    PubMed

    Eveno, Emmanuelle; Collada, Carmen; Guevara, M Angeles; Léger, Valérie; Soto, Alvaro; Díaz, Luis; Léger, Patrick; González-Martínez, Santiago C; Cervera, M Teresa; Plomion, Christophe; Garnier-Géré, Pauline H

    2008-02-01

    The importance of natural selection for shaping adaptive trait differentiation among natural populations of allogamous tree species has long been recognized. Determining the molecular basis of local adaptation remains largely unresolved, and the respective roles of selection and demography in shaping population structure are actively debated. Using a multilocus scan that aims to detect outliers from simulated neutral expectations, we analyzed patterns of nucleotide diversity and genetic differentiation at 11 polymorphic candidate genes for drought stress tolerance in phenotypically contrasted Pinus pinaster Ait. populations across its geographical range. We compared 3 coalescent-based methods: 2 frequentist-like, including 1 approach specifically developed for biallelic single nucleotide polymorphisms (SNPs) here and 1 Bayesian. Five genes showed outlier patterns that were robust across methods at the haplotype level for 2 of them. Two genes presented higher F(ST) values than expected (PR-AGP4 and erd3), suggesting that they could have been affected by the action of diversifying selection among populations. In contrast, 3 genes presented lower F(ST) values than expected (dhn-1, dhn2, and lp3-1), which could represent signatures of homogenizing selection among populations. A smaller proportion of outliers were detected at the SNP level suggesting the potential functional significance of particular combinations of sites in drought-response candidate genes. The Bayesian method appeared robust to low sample sizes, flexible to assumptions regarding migration rates, and powerful for detecting selection at the haplotype level, but the frequentist-like method adapted to SNPs was more efficient for the identification of outlier SNPs showing low differentiation. Population-specific effects estimated in the Bayesian method also revealed populations with lower immigration rates, which could have led to favorable situations for local adaptation. Outlier patterns are discussed in relation to the different genes' putative involvement in drought tolerance responses, from published results in transcriptomics and association mapping in P. pinaster and other related species. These genes clearly constitute relevant candidates for future association studies in P. pinaster.

  13. Gene selection for the reconstruction of stem cell differentiation trees: a linear programming approach.

    PubMed

    Ghadie, Mohamed A; Japkowicz, Nathalie; Perkins, Theodore J

    2015-08-15

    Stem cell differentiation is largely guided by master transcriptional regulators, but it also depends on the expression of other types of genes, such as cell cycle genes, signaling genes, metabolic genes, trafficking genes, etc. Traditional approaches to understanding gene expression patterns across multiple conditions, such as principal components analysis or K-means clustering, can group cell types based on gene expression, but they do so without knowledge of the differentiation hierarchy. Hierarchical clustering can organize cell types into a tree, but in general this tree is different from the differentiation hierarchy itself. Given the differentiation hierarchy and gene expression data at each node, we construct a weighted Euclidean distance metric such that the minimum spanning tree with respect to that metric is precisely the given differentiation hierarchy. We provide a set of linear constraints that are provably sufficient for the desired construction and a linear programming approach to identify sparse sets of weights, effectively identifying genes that are most relevant for discriminating different parts of the tree. We apply our method to microarray gene expression data describing 38 cell types in the hematopoiesis hierarchy, constructing a weighted Euclidean metric that uses just 175 genes. However, we find that there are many alternative sets of weights that satisfy the linear constraints. Thus, in the style of random-forest training, we also construct metrics based on random subsets of the genes and compare them to the metric of 175 genes. We then report on the selected genes and their biological functions. Our approach offers a new way to identify genes that may have important roles in stem cell differentiation. tperkins@ohri.ca Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Genomic Footprints in Selected and Unselected Beef Cattle Breeds in Korea.

    PubMed

    Lim, Dajeong; Strucken, Eva M; Choi, Bong Hwan; Chai, Han Ha; Cho, Yong Min; Jang, Gul Won; Kim, Tae-Hun; Gondro, Cedric; Lee, Seung Hwan

    2016-01-01

    Korean Hanwoo cattle have been subjected to intensive artificial selection over the past four decades to improve meat production traits. Another three cattle varieties very closely related to Hanwoo reside in Korea (Jeju Black and Brindle) and in China (Yanbian). These breeds have not been part of a breeding scheme to improve production traits. Here, we compare the selected Hanwoo against these similar but presumed to be unselected populations to identify genomic regions that have been under recent selection pressure due to the breeding program. Rsb statistics were used to contrast the genomes of Hanwoo versus a pooled sample of the three unselected population (UN). We identified 37 significant SNPs (FDR corrected) in the HW/UN comparison and 21 known protein coding genes were within 1 MB to the identified SNPs. These genes were previously reported to affect traits important for meat production (14 genes), reproduction including mammary gland development (3 genes), coat color (2 genes), and genes affecting behavioral traits in a broader sense (2 genes). We subsequently sequenced (Illumina HiSeq 2000 platform) 10 individuals of the brown Hanwoo and the Chinese Yanbian to identify SNPs within the candidate genomic regions. Based on allele frequency differences, haplotype structures, and literature research, we singled out one non-synonymous SNP in the APP gene (APP: c.569C>T, Ala199Val) and predicted the mutational effect on the protein structure. We found that protein-protein interactions might be impaired due to increased exposed hydrophobic surfaces of the mutated protein. The APP gene has also been reported to affect meat tenderness in pigs and obesity in humans. Meat tenderness has been linked to intramuscular fat content, which is one of the main breeding goals for brown Hanwoo, potentially supporting a causal influence of the herein described nsSNP in the APP gene.

  15. A Novel Tool for Microbial Genome Editing Using the Restriction-Modification System.

    PubMed

    Bai, Hua; Deng, Aihua; Liu, Shuwen; Cui, Di; Qiu, Qidi; Wang, Laiyou; Yang, Zhao; Wu, Jie; Shang, Xiuling; Zhang, Yun; Wen, Tingyi

    2018-01-19

    Scarless genetic manipulation of genomes is an essential tool for biological research. The restriction-modification (R-M) system is a defense system in bacteria that protects against invading genomes on the basis of its ability to distinguish foreign DNA from self DNA. Here, we designed an R-M system-mediated genome editing (RMGE) technique for scarless genetic manipulation in different microorganisms. For bacteria with Type IV REase, an RMGE technique using the inducible DNA methyltransferase gene, bceSIIM (RMGE-bceSIIM), as the counter-selection cassette was developed to edit the genome of Escherichia coli. For bacteria without Type IV REase, an RMGE technique based on a restriction endonuclease (RMGE-mcrA) was established in Bacillus subtilis. These techniques were successfully used for gene deletion and replacement with nearly 100% counter-selection efficiencies, which were higher and more stable compared to conventional methods. Furthermore, precise point mutation without limiting sites was achieved in E. coli using RMGE-bceSIIM to introduce a single base mutation of A128C into the rpsL gene. In addition, the RMGE-mcrA technique was applied to delete the CAN1 gene in Saccharomyces cerevisiae DAY414 with 100% counter-selection efficiency. The effectiveness of the RMGE technique in E. coli, B. subtilis, and S. cerevisiae suggests the potential universal usefulness of this technique for microbial genome manipulation.

  16. A novel approach for human whole transcriptome analysis based on absolute gene expression of microarray data.

    PubMed

    Bikel, Shirley; Jacobo-Albavera, Leonor; Sánchez-Muñoz, Fausto; Cornejo-Granados, Fernanda; Canizales-Quinteros, Samuel; Soberón, Xavier; Sotelo-Mundo, Rogerio R; Del Río-Navarro, Blanca E; Mendoza-Vargas, Alfredo; Sánchez, Filiberto; Ochoa-Leyva, Adrian

    2017-01-01

    In spite of the emergence of RNA sequencing (RNA-seq), microarrays remain in widespread use for gene expression analysis in the clinic. There are over 767,000 RNA microarrays from human samples in public repositories, which are an invaluable resource for biomedical research and personalized medicine. The absolute gene expression analysis allows the transcriptome profiling of all expressed genes under a specific biological condition without the need of a reference sample. However, the background fluorescence represents a challenge to determine the absolute gene expression in microarrays. Given that the Y chromosome is absent in female subjects, we used it as a new approach for absolute gene expression analysis in which the fluorescence of the Y chromosome genes of female subjects was used as the background fluorescence for all the probes in the microarray. This fluorescence was used to establish an absolute gene expression threshold, allowing the differentiation between expressed and non-expressed genes in microarrays. We extracted the RNA from 16 children leukocyte samples (nine males and seven females, ages 6-10 years). An Affymetrix Gene Chip Human Gene 1.0 ST Array was carried out for each sample and the fluorescence of 124 genes of the Y chromosome was used to calculate the absolute gene expression threshold. After that, several expressed and non-expressed genes according to our absolute gene expression threshold were compared against the expression obtained using real-time quantitative polymerase chain reaction (RT-qPCR). From the 124 genes of the Y chromosome, three genes (DDX3Y, TXLNG2P and EIF1AY) that displayed significant differences between sexes were used to calculate the absolute gene expression threshold. Using this threshold, we selected 13 expressed and non-expressed genes and confirmed their expression level by RT-qPCR. Then, we selected the top 5% most expressed genes and found that several KEGG pathways were significantly enriched. Interestingly, these pathways were related to the typical functions of leukocytes cells, such as antigen processing and presentation and natural killer cell mediated cytotoxicity. We also applied this method to obtain the absolute gene expression threshold in already published microarray data of liver cells, where the top 5% expressed genes showed an enrichment of typical KEGG pathways for liver cells. Our results suggest that the three selected genes of the Y chromosome can be used to calculate an absolute gene expression threshold, allowing a transcriptome profiling of microarray data without the need of an additional reference experiment. Our approach based on the establishment of a threshold for absolute gene expression analysis will allow a new way to analyze thousands of microarrays from public databases. This allows the study of different human diseases without the need of having additional samples for relative expression experiments.

  17. DeepSAGE Based Differential Gene Expression Analysis under Cold and Freeze Stress in Seabuckthorn (Hippophae rhamnoides L.)

    PubMed Central

    Chaudhary, Saurabh; Sharma, Prakash C.

    2015-01-01

    Seabuckthorn (Hippophae rhamnoides L.), an important plant species of Indian Himalayas, is well known for its immense medicinal and nutritional value. The plant has the ability to sustain growth in harsh environments of extreme temperatures, drought and salinity. We employed DeepSAGE, a tag based approach, to identify differentially expressed genes under cold and freeze stress in seabuckthorn. In total 36.2 million raw tags including 13.9 million distinct tags were generated using Illumina sequencing platform for three leaf tissue libraries including control (CON), cold stress (CS) and freeze stress (FS). After discarding low quality tags, 35.5 million clean tags including 7 million distinct clean tags were obtained. In all, 11922 differentially expressed genes (DEGs) including 6539 up regulated and 5383 down regulated genes were identified in three comparative setups i.e. CON vs CS, CON vs FS and CS vs FS. Gene ontology and KEGG pathway analysis were performed to assign gene ontology term to DEGs and ascertain their biological functions. DEGs were mapped back to our existing seabuckthorn transcriptome assembly comprising of 88,297 putative unigenes leading to the identification of 428 cold and freeze stress responsive genes. Expression of randomly selected 22 DEGs was validated using qRT-PCR that further supported our DeepSAGE results. The present study provided a comprehensive view of global gene expression profile of seabuckthorn under cold and freeze stresses. The DeepSAGE data could also serve as a valuable resource for further functional genomics studies aiming selection of candidate genes for development of abiotic stress tolerant transgenic plants. PMID:25803684

  18. DeepSAGE based differential gene expression analysis under cold and freeze stress in seabuckthorn (Hippophae rhamnoides L.).

    PubMed

    Chaudhary, Saurabh; Sharma, Prakash C

    2015-01-01

    Seabuckthorn (Hippophae rhamnoides L.), an important plant species of Indian Himalayas, is well known for its immense medicinal and nutritional value. The plant has the ability to sustain growth in harsh environments of extreme temperatures, drought and salinity. We employed DeepSAGE, a tag based approach, to identify differentially expressed genes under cold and freeze stress in seabuckthorn. In total 36.2 million raw tags including 13.9 million distinct tags were generated using Illumina sequencing platform for three leaf tissue libraries including control (CON), cold stress (CS) and freeze stress (FS). After discarding low quality tags, 35.5 million clean tags including 7 million distinct clean tags were obtained. In all, 11922 differentially expressed genes (DEGs) including 6539 up regulated and 5383 down regulated genes were identified in three comparative setups i.e. CON vs CS, CON vs FS and CS vs FS. Gene ontology and KEGG pathway analysis were performed to assign gene ontology term to DEGs and ascertain their biological functions. DEGs were mapped back to our existing seabuckthorn transcriptome assembly comprising of 88,297 putative unigenes leading to the identification of 428 cold and freeze stress responsive genes. Expression of randomly selected 22 DEGs was validated using qRT-PCR that further supported our DeepSAGE results. The present study provided a comprehensive view of global gene expression profile of seabuckthorn under cold and freeze stresses. The DeepSAGE data could also serve as a valuable resource for further functional genomics studies aiming selection of candidate genes for development of abiotic stress tolerant transgenic plants.

  19. Rapid construction of a Bacterial Artificial Chromosomal (BAC) expression vector using designer DNA fragments.

    PubMed

    Chen, Chao; Zhao, Xinqing; Jin, Yingyu; Zhao, Zongbao Kent; Suh, Joo-Won

    2014-11-01

    Bacterial artificial chromosomal (BAC) vectors are increasingly being used in cloning large DNA fragments containing complex biosynthetic pathways to facilitate heterologous production of microbial metabolites for drug development. To express inserted genes using Streptomyces species as the production hosts, an integration expression cassette is required to be inserted into the BAC vector, which includes genetic elements encoding a phage-specific attachment site, an integrase, an origin of transfer, a selection marker and a promoter. Due to the large sizes of DNA inserted into the BAC vectors, it is normally inefficient and time-consuming to assemble these fragments by routine PCR amplifications and restriction-ligations. Here we present a rapid method to insert fragments to construct BAC-based expression vectors. A DNA fragment of about 130 bp was designed, which contains upstream and downstream homologous sequences of both BAC vector and pIB139 plasmid carrying the whole integration expression cassette. In-Fusion cloning was performed using the designer DNA fragment to modify pIB139, followed by λ-RED-mediated recombination to obtain the BAC-based expression vector. We demonstrated the effectiveness of this method by rapid construction of a BAC-based expression vector with an insert of about 120 kb that contains the entire gene cluster for biosynthesis of immunosuppressant FK506. The empty BAC-based expression vector constructed in this study can be conveniently used for construction of BAC libraries using either microbial pure culture or environmental DNA, and the selected BAC clones can be directly used for heterologous expression. Alternatively, if a BAC library has already been constructed using a commercial BAC vector, the selected BAC vectors can be manipulated using the method described here to get the BAC-based expression vectors with desired gene clusters for heterologous expression. The rapid construction of a BAC-based expression vector facilitates heterologous expression of large gene clusters for drug discovery. Copyright © 2014 Elsevier Inc. All rights reserved.

  20. Characteristics of genomic signatures derived using univariate methods and mechanistically anchored functional descriptors for predicting drug- and xenobiotic-induced nephrotoxicity.

    PubMed

    Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J

    2008-01-01

    ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.

  1. Selecting Question-Specific Genes to Reduce Incongruence in Phylogenomics: A Case Study of Jawed Vertebrate Backbone Phylogeny.

    PubMed

    Chen, Meng-Yun; Liang, Dan; Zhang, Peng

    2015-11-01

    Incongruence between different phylogenomic analyses is the main challenge faced by phylogeneticists in the genomic era. To reduce incongruence, phylogenomic studies normally adopt some data filtering approaches, such as reducing missing data or using slowly evolving genes, to improve the signal quality of data. Here, we assembled a phylogenomic data set of 58 jawed vertebrate taxa and 4682 genes to investigate the backbone phylogeny of jawed vertebrates under both concatenation and coalescent-based frameworks. To evaluate the efficiency of extracting phylogenetic signals among different data filtering methods, we chose six highly intractable internodes within the backbone phylogeny of jawed vertebrates as our test questions. We found that our phylogenomic data set exhibits substantial conflicting signal among genes for these questions. Our analyses showed that non-specific data sets that are generated without bias toward specific questions are not sufficient to produce consistent results when there are several difficult nodes within a phylogeny. Moreover, phylogenetic accuracy based on non-specific data is considerably influenced by the size of data and the choice of tree inference methods. To address such incongruences, we selected genes that resolve a given internode but not the entire phylogeny. Notably, not only can this strategy yield correct relationships for the question, but it also reduces inconsistency associated with data sizes and inference methods. Our study highlights the importance of gene selection in phylogenomic analyses, suggesting that simply using a large amount of data cannot guarantee correct results. Constructing question-specific data sets may be more powerful for resolving problematic nodes. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  2. Browning of human adipocytes requires KLF11 and reprogramming of PPARγ superenhancers.

    PubMed

    Loft, Anne; Forss, Isabel; Siersbæk, Majken Storm; Schmidt, Søren Fisker; Larsen, Ann-Sofie Bøgh; Madsen, Jesper Grud Skat; Pisani, Didier F; Nielsen, Ronni; Aagaard, Mads Malik; Mathison, Angela; Neville, Matt J; Urrutia, Raul; Karpe, Fredrik; Amri, Ez-Zoubir; Mandrup, Susanne

    2015-01-01

    Long-term exposure to peroxisome proliferator-activated receptor γ (PPARγ) agonists such as rosiglitazone induces browning of rodent and human adipocytes; however, the transcriptional mechanisms governing this phenotypic switch in adipocytes are largely unknown. Here we show that rosiglitazone-induced browning of human adipocytes activates a comprehensive gene program that leads to increased mitochondrial oxidative capacity. Once induced, this gene program and oxidative capacity are maintained independently of rosiglitazone, suggesting that additional browning factors are activated. Browning triggers reprogramming of PPARγ binding, leading to the formation of PPARγ "superenhancers" that are selective for brown-in-white (brite) adipocytes. These are highly associated with key brite-selective genes. Based on such an association, we identified an evolutionarily conserved metabolic regulator, Kruppel-like factor 11 (KLF11), as a novel browning transcription factor in human adipocytes that is required for rosiglitazone-induced browning, including the increase in mitochondrial oxidative capacity. KLF11 is directly induced by PPARγ and appears to cooperate with PPARγ in a feed-forward manner to activate and maintain the brite-selective gene program. © 2015 Loft et al.; Published by Cold Spring Harbor Laboratory Press.

  3. Personalized gene silencing therapeutics for Huntington disease.

    PubMed

    Kay, C; Skotte, N H; Southwell, A L; Hayden, M R

    2014-07-01

    Gene silencing offers a novel therapeutic strategy for dominant genetic disorders. In specific diseases, selective silencing of only one copy of a gene may be advantageous over non-selective silencing of both copies. Huntington disease (HD) is an autosomal dominant disorder caused by an expanded CAG trinucleotide repeat in the Huntingtin gene (HTT). Silencing both expanded and normal copies of HTT may be therapeutically beneficial, but preservation of normal HTT expression is preferred. Allele-specific methods can selectively silence the mutant HTT transcript by targeting either the expanded CAG repeat or single nucleotide polymorphisms (SNPs) in linkage disequilibrium with the expansion. Both approaches require personalized treatment strategies based on patient genotypes. We compare the prospect of safe treatment of HD by CAG- and SNP-specific silencing approaches and review HD population genetics used to guide target identification in the patient population. Clinical implementation of allele-specific HTT silencing faces challenges common to personalized genetic medicine, requiring novel solutions from clinical scientists and regulatory authorities. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  4. Two-Way Gene Interaction From Microarray Data Based on Correlation Methods.

    PubMed

    Alavi Majd, Hamid; Talebi, Atefeh; Gilany, Kambiz; Khayyer, Nasibeh

    2016-06-01

    Gene networks have generated a massive explosion in the development of high-throughput techniques for monitoring various aspects of gene activity. Networks offer a natural way to model interactions between genes, and extracting gene network information from high-throughput genomic data is an important and difficult task. The purpose of this study is to construct a two-way gene network based on parametric and nonparametric correlation coefficients. The first step in constructing a Gene Co-expression Network is to score all pairs of gene vectors. The second step is to select a score threshold and connect all gene pairs whose scores exceed this value. In the foundation-application study, we constructed two-way gene networks using nonparametric methods, such as Spearman's rank correlation coefficient and Blomqvist's measure, and compared them with Pearson's correlation coefficient. We surveyed six genes of venous thrombosis disease, made a matrix entry representing the score for the corresponding gene pair, and obtained two-way interactions using Pearson's correlation, Spearman's rank correlation, and Blomqvist's coefficient. Finally, these methods were compared with Cytoscape, based on BIND, and Gene Ontology, based on molecular function visual methods; R software version 3.2 and Bioconductor were used to perform these methods. Based on the Pearson and Spearman correlations, the results were the same and were confirmed by Cytoscape and GO visual methods; however, Blomqvist's coefficient was not confirmed by visual methods. Some results of the correlation coefficients are not the same with visualization. The reason may be due to the small number of data.

  5. Novel Harmonic Regularization Approach for Variable Selection in Cox's Proportional Hazards Model

    PubMed Central

    Chu, Ge-Jin; Liang, Yong; Wang, Jia-Xuan

    2014-01-01

    Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq  (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods. PMID:25506389

  6. Demographically-Based Evaluation of Genomic Regions under Selection in Domestic Dogs

    PubMed Central

    Freedman, Adam H.; Schweizer, Rena M.; Ortega-Del Vecchyo, Diego; Han, Eunjung; Davis, Brian W.; Gronau, Ilan; Silva, Pedro M.; Galaverni, Marco; Fan, Zhenxin; Marx, Peter; Lorente-Galdos, Belen; Ramirez, Oscar; Hormozdiari, Farhad; Alkan, Can; Vilà, Carles; Squire, Kevin; Geffen, Eli; Kusak, Josip; Boyko, Adam R.; Parker, Heidi G.; Lee, Clarence; Tadigotla, Vasisht; Siepel, Adam; Bustamante, Carlos D.; Harkins, Timothy T.; Nelson, Stanley F.; Marques-Bonet, Tomas; Ostrander, Elaine A.; Wayne, Robert K.; Novembre, John

    2016-01-01

    Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers. PMID:26943675

  7. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes.

    PubMed

    Brock, Guy N; Shaffer, John R; Blakesley, Richard E; Lotz, Meredith J; Tseng, George C

    2008-01-10

    Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures x time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set. We found that the optimal imputation algorithms (LSA, LLS, and BPCA) are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS) scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS) scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost. Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA) are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA) performed better on mcroarray data with lower complexity, while neighbour-based methods (KNN, OLS, LSA, LLS) performed better in data with higher complexity. We also found that the EBS and STS schemes serve as complementary and effective tools for selecting the optimal imputation algorithm.

  8. Harnessing pain heterogeneity and RNA transcriptome to identify blood-based pain biomarkers: a novel correlational study design and bioinformatics approach in a graded chronic constriction injury model.

    PubMed

    Grace, Peter M; Hurley, Daniel; Barratt, Daniel T; Tsykin, Anna; Watkins, Linda R; Rolan, Paul E; Hutchinson, Mark R

    2012-09-01

    A quantitative, peripherally accessible biomarker for neuropathic pain has great potential to improve clinical outcomes. Based on the premise that peripheral and central immunity contribute to neuropathic pain mechanisms, we hypothesized that biomarkers could be identified from the whole blood of adult male rats, by integrating graded chronic constriction injury (CCI), ipsilateral lumbar dorsal quadrant (iLDQ) and whole blood transcriptomes, and pathway analysis with pain behavior. Correlational bioinformatics identified a range of putative biomarker genes for allodynia intensity, many encoding for proteins with a recognized role in immune/nociceptive mechanisms. A selection of these genes was validated in a separate replication study. Pathway analysis of the iLDQ transcriptome identified Fcγ and Fcε signaling pathways, among others. This study is the first to employ the whole blood transcriptome to identify pain biomarker panels. The novel correlational bioinformatics, developed here, selected such putative biomarkers based on a correlation with pain behavior and formation of signaling pathways with iLDQ genes. Future studies may demonstrate the predictive ability of these biomarker genes across other models and additional variables. © 2012 The Authors. Journal of Neurochemistry © 2012 International Society for Neurochemistry.

  9. Peptide-Based Technologies to Alter Adenoviral Vector Tropism: Ways and Means for Systemic Treatment of Cancer

    PubMed Central

    Reetz, Julia; Herchenröder, Ottmar; Pützer, Brigitte M.

    2014-01-01

    Due to the fundamental progress in elucidating the molecular mechanisms of human diseases and the arrival of the post-genomic era, increasing numbers of therapeutic genes and cellular targets are available for gene therapy. Meanwhile, the most important challenge is to develop gene delivery vectors with high efficiency through target cell selectivity, in particular under in situ conditions. The most widely used vector system to transduce cells is based on adenovirus (Ad). Recent endeavors in the development of selective Ad vectors that target cells or tissues of interest and spare the alteration of all others have focused on the modification of the virus broad natural tropism. A popular way of Ad targeting is achieved by directing the vector towards distinct cellular receptors. Redirecting can be accomplished by linking custom-made peptides with specific affinity to cellular surface proteins via genetic integration, chemical coupling or bridging with dual-specific adapter molecules. Ideally, targeted vectors are incapable of entering cells via their native receptors. Such altered vectors offer new opportunities to delineate functional genomics in a natural environment and may enable efficient systemic therapeutic approaches. This review provides a summary of current state-of-the-art techniques to specifically target adenovirus-based gene delivery vectors. PMID:24699364

  10. Gene expression divergence between malaria vector sibling species Anopheles gambiae and An. coluzzii from rural and urban Yaoundé Cameroon

    PubMed Central

    Cassone, Bryan J.; Kamdem, Colince; Cheng, Changde; Tan, John C.; Hahn, Matthew W.; Costantini, Carlo; Besansky, Nora J.

    2014-01-01

    Divergent selection based on aquatic larval ecology is a likely factor in the recent isolation of two broadly sympatric and morphologically identical African mosquito species, the malaria vectors Anopheles gambiae and An. coluzzii. Population-based genome scans have revealed numerous candidate regions of recent positive selection, but have provided few clues as to the genetic mechanisms underlying behavioral and physiological divergence between the two species, phenotypes which themselves remain obscure. To uncover possible genetic mechanisms, we compared global transcriptional profiles of natural and experimental populations using gene-based microarrays. Larvae were sampled as second and fourth instars from natural populations in and around the city of Yaoundé, capital of Cameroon, where the two species segregate along a gradient of urbanization. Functional enrichment analysis of differentially expressed genes revealed that An. coluzzii—the species that breeds in more stable, biotically complex and potentially polluted urban water bodies—over-expresses genes implicated in detoxification and immunity relative to An. gambiae, which breeds in more ephemeral and relatively depauperate pools and puddles in suburbs and rural areas. Moreover, our data suggest that such over-expression by An. coluzzii is not a transient result of induction by xenobiotics in the larval habitat, but an inherent and presumably adaptive response to repeatedly encountered environmental stressors. Finally, we find no significant overlap between the differentially expressed loci and previously identified genomic regions of recent positive selection, suggesting that transcriptome divergence is regulated by trans-acting factors rather than cis-acting elements. PMID:24673723

  11. Gene Composer: database software for protein construct design, codon engineering, and gene synthesis

    PubMed Central

    Lorimer, Don; Raymond, Amy; Walchli, John; Mixon, Mark; Barrow, Adrienne; Wallace, Ellen; Grice, Rena; Burgin, Alex; Stewart, Lance

    2009-01-01

    Background To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. Results An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. Conclusion We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene assembly procedure with mis-match specific endonuclease error correction in combination with PIPE cloning. In a sister manuscript we present data on how Gene Composer designed genes and protein constructs can result in improved protein production for structural studies. PMID:19383142

  12. Gene composer: database software for protein construct design, codon engineering, and gene synthesis.

    PubMed

    Lorimer, Don; Raymond, Amy; Walchli, John; Mixon, Mark; Barrow, Adrienne; Wallace, Ellen; Grice, Rena; Burgin, Alex; Stewart, Lance

    2009-04-21

    To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene assembly procedure with mis-match specific endonuclease error correction in combination with PIPE cloning. In a sister manuscript we present data on how Gene Composer designed genes and protein constructs can result in improved protein production for structural studies.

  13. Efficient protection from methotrexate toxicity and selection of transduced human hematopoietic cells following gene transfer of dihydrofolate reductase mutants.

    PubMed

    Meisel, Roland; Bardenheuer, Walter; Strehblow, Claudia; Sorg, Ursula Regina; Elmaagacli, Ahmet; Seeber, Siegfried; Flasshove, Michael; Moritz, Thomas

    2003-12-01

    While retrovirally mediated gene transfer of dihydrofolate reductase mutants (mutDHFR) has convincingly been demonstrated to confer methotrexate (MTX) resistance to murine hematopoietic cells, clinical application of this technology will require high efficacy in human cells. Therefore, we investigated retroviral constructs expressing various point mutants of human DHFR for their ability to confer MTX resistance to human clonogenic progenitor cells (CFU-C) and to allow for in vitro selection of transduced CFU-C. Primary human hematopoietic cells were retrovirally transduced using MMLV- and SFFV/MESV-based vectors expressing DHFR(Ser31), DHFR(Phe22/Ser31), or DHFR(Tyr22/Gly31). MTX resistance of unselected and in vitro-selected CFU-C was determined using MTX-supplemented methylcellulose cultures and gene transfer efficiency was assesed by single-colony PCR analysis. While less than 1% mock-transduced CFU-C survived the presence of > or =5 x 10(-8) M MTX, MMLV- and SFFV/MESV-based vectors expressing DHFR(Ser31) significantly protected CFU-C from MTX at doses ranging from 2.5 to 30 x 10(-8) M. Vectors expressing DHFR(Phe22/Ser31) or DHFR(Tyr22/Gly31) were even more protective and MTX-resistant CFU-C were observed up to 1 x 10(-5) M MTX. Three-day suspension cultures in the presence of 10-20 x 10(-8) M MTX resulted in significant selection of mutDHFR-transduced CFU-C. The percentage of CFU-C resistant to 10 x 10(-8) M MTX increased fourfold to 20-fold and provirus-containing CFU-C increased from 27% to 79-100%. Gene transfer of DHFR using suitable retroviral backbones and DHFR mutants significantly increases MTX resistance of human CFU-C and allows efficient in vitro selection of transduced cells using a short-term selection procedure.

  14. Analysis of genetic association in Listeria and Diabetes using Hierarchical Clustering and Silhouette Index

    NASA Astrophysics Data System (ADS)

    Pagnuco, Inti A.; Pastore, Juan I.; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L.

    2016-04-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, where significative groups of genes are defined based on some criteria. This task is usually performed by clustering algorithms, where the whole family of genes, or a subset of them, are clustered into meaningful groups based on their expression values in a set of experiment. In this work we used a methodology based on the Silhouette index as a measure of cluster quality for individual gene groups, and a combination of several variants of hierarchical clustering to generate the candidate groups, to obtain sets of co-expressed genes for two real data examples. We analyzed the quality of the best ranked groups, obtained by the algorithm, using an online bioinformatics tool that provides network information for the selected genes. Moreover, to verify the performance of the algorithm, considering the fact that it doesn’t find all possible subsets, we compared its results against a full search, to determine the amount of good co-regulated sets not detected.

  15. HuMiChip: Development of a Functional Gene Array for the Study of Human Microbiomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tu, Q.; Deng, Ye; Lin, Lu

    Microbiomes play very important roles in terms of nutrition, health and disease by interacting with their hosts. Based on sequence data currently available in public domains, we have developed a functional gene array to monitor both organismal and functional gene profiles of normal microbiota in human and mouse hosts, and such an array is called human and mouse microbiota array, HMM-Chip. First, seed sequences were identified from KEGG databases, and used to construct a seed database (seedDB) containing 136 gene families in 19 metabolic pathways closely related to human and mouse microbiomes. Second, a mother database (motherDB) was constructed withmore » 81 genomes of bacterial strains with 54 from gut and 27 from oral environments, and 16 metagenomes, and used for selection of genes and probe design. Gene prediction was performed by Glimmer3 for bacterial genomes, and by the Metagene program for metagenomes. In total, 228,240 and 801,599 genes were identified for bacterial genomes and metagenomes, respectively. Then the motherDB was searched against the seedDB using the HMMer program, and gene sequences in the motherDB that were highly homologous with seed sequences in the seedDB were used for probe design by the CommOligo software. Different degrees of specific probes, including gene-specific, inclusive and exclusive group-specific probes were selected. All candidate probes were checked against the motherDB and NCBI databases for specificity. Finally, 7,763 probes covering 91.2percent (12,601 out of 13,814) HMMer confirmed sequences from 75 bacterial genomes and 16 metagenomes were selected. This developed HMM-Chip is able to detect the diversity and abundance of functional genes, the gene expression of microbial communities, and potentially, the interactions of microorganisms and their hosts.« less

  16. ARNetMiT R Package: association rules based gene co-expression networks of miRNA targets.

    PubMed

    Özgür Cingiz, M; Biricik, G; Diri, B

    2017-03-31

    miRNAs are key regulators that bind to target genes to suppress their gene expression level. The relations between miRNA-target genes enable users to derive co-expressed genes that may be involved in similar biological processes and functions in cells. We hypothesize that target genes of miRNAs are co-expressed, when they are regulated by multiple miRNAs. With the usage of these co-expressed genes, we can theoretically construct co-expression networks (GCNs) related to 152 diseases. In this study, we introduce ARNetMiT that utilize a hash based association rule algorithm in a novel way to infer the GCNs on miRNA-target genes data. We also present R package of ARNetMiT, which infers and visualizes GCNs of diseases that are selected by users. Our approach assumes miRNAs as transactions and target genes as their items. Support and confidence values are used to prune association rules on miRNA-target genes data to construct support based GCNs (sGCNs) along with support and confidence based GCNs (scGCNs). We use overlap analysis and the topological features for the performance analysis of GCNs. We also infer GCNs with popular GNI algorithms for comparison with the GCNs of ARNetMiT. Overlap analysis results show that ARNetMiT outperforms the compared GNI algorithms. We see that using high confidence values in scGCNs increase the ratio of the overlapped gene-gene interactions between the compared methods. According to the evaluation of the topological features of ARNetMiT based GCNs, the degrees of nodes have power-law distribution. The hub genes discovered by ARNetMiT based GCNs are consistent with the literature.

  17. A second gene for acyl-(acyl-carrier-protein): glycerol-3-phosphate acyltransferase in squash, Cucurbita moschata cv. Shirogikuza(*), codes for an oleate-selective isozyme: molecular cloning and protein purification studies.

    PubMed

    Nishida, I; Sugiura, M; Enju, A; Nakamura, M

    2000-12-01

    A new isogene for acyl-(acyl-carrier-protein):glycerol-3-phosphate acyltransferase (GPAT; EC 2.3.1.15) in squash has been cloned and the gene product was identified as oleate-selective GPAT. Using PCR primers that could hybridise with exons for a previously cloned squash GPAT, we obtained two PCR products of different size: one coded for a previously cloned squash GPAT corresponding to non-selective isoforms AT2 and AT3, and the other for a new isozyme, probably the oleate-selective isoform AT1. Full-length amino acid sequences of respective isozymes were deduced from the nucleotide sequences of genomic genes and cDNAs, which were cloned by a series of PCR-based methods. Thus, we designated the new gene CmATS1;1 and the other one CmATS1;2. Genome blot analysis revealed that the squash genome contained the two isogenes at non-allelic loci. AT1-active fractions were partially purified, and three polypeptide bands were identified as being AT1 polypeptides, which exhibited relative molecular masses of 39.5-40.5 kDa, pI values of 6.75-7.15, and oleate selectivity over palmitate. Partial amino-terminal sequences obtained from two of these bands verified that the new isogene codes for AT1 polypeptides.

  18. Divergence of RNA polymerase α subunits in angiosperm plastid genomes is mediated by genomic rearrangement.

    PubMed

    Blazier, J Chris; Ruhlman, Tracey A; Weng, Mao-Lun; Rehman, Sumaiyah K; Sabir, Jamal S M; Jansen, Robert K

    2016-04-18

    Genes for the plastid-encoded RNA polymerase (PEP) persist in the plastid genomes of all photosynthetic angiosperms. However, three unrelated lineages (Annonaceae, Passifloraceae and Geraniaceae) have been identified with unusually divergent open reading frames (ORFs) in the conserved region of rpoA, the gene encoding the PEP α subunit. We used sequence-based approaches to evaluate whether these genes retain function. Both gene sequences and complete plastid genome sequences were assembled and analyzed from each of the three angiosperm families. Multiple lines of evidence indicated that the rpoA sequences are likely functional despite retaining as low as 30% nucleotide sequence identity with rpoA genes from outgroups in the same angiosperm order. The ratio of non-synonymous to synonymous substitutions indicated that these genes are under purifying selection, and bioinformatic prediction of conserved domains indicated that functional domains are preserved. One of the lineages (Pelargonium, Geraniaceae) contains species with multiple rpoA-like ORFs that show evidence of ongoing inter-paralog gene conversion. The plastid genomes containing these divergent rpoA genes have experienced extensive structural rearrangement, including large expansions of the inverted repeat. We propose that illegitimate recombination, not positive selection, has driven the divergence of rpoA.

  19. The functional transfer of genes from the mitochondria to the nucleus: the effects of selection, mutation, population size and rate of self-fertilization.

    PubMed

    Brandvain, Yaniv; Wade, Michael J

    2009-08-01

    The transfer of mitochondrial genes to the nucleus is a recurrent and consistent feature of eukaryotic genome evolution. Although many theories have been proposed to explain such transfers, little relevant data exist. The observation that clonal and self-fertilizing plants transfer more mitochondrial genes to their nuclei than do outcrossing plants contradicts predictions of major theories based on nuclear recombination and leaves a gap in our conceptual understanding how the observed pattern of gene transfer could arise. Here, with a series of deterministic and stochastic simulations, we show how epistatic selection and relative mutation rates of mitochondrial and nuclear genes influence mitochondrial-to-nuclear gene transfer. Specifically, we show that when there is a benefit to having a mitochondrial gene present in the nucleus, but absent in the mitochondria, self-fertilization dramatically increases both the rate and the probability of gene transfer. However, absent such a benefit, when mitochondrial mutation rates exceed those of the nucleus, self-fertilization decreases the rate and probability of transfer. This latter effect, however, is much weaker than the former. Our results are relevant to understanding the probabilities of fixation when loci in different genomes interact.

  20. iPcc: a novel feature extraction method for accurate disease class discovery and prediction

    PubMed Central

    Ren, Xianwen; Wang, Yong; Zhang, Xiang-Sun; Jin, Qi

    2013-01-01

    Gene expression profiling has gradually become a routine procedure for disease diagnosis and classification. In the past decade, many computational methods have been proposed, resulting in great improvements on various levels, including feature selection and algorithms for classification and clustering. In this study, we present iPcc, a novel method from the feature extraction perspective to further propel gene expression profiling technologies from bench to bedside. We define ‘correlation feature space’ for samples based on the gene expression profiles by iterative employment of Pearson’s correlation coefficient. Numerical experiments on both simulated and real gene expression data sets demonstrate that iPcc can greatly highlight the latent patterns underlying noisy gene expression data and thus greatly improve the robustness and accuracy of the algorithms currently available for disease diagnosis and classification based on gene expression profiles. PMID:23761440

  1. Analyzing gene expression from relative codon usage bias in Yeast genome: a statistical significance and biological relevance.

    PubMed

    Das, Shibsankar; Roymondal, Uttam; Sahoo, Satyabrata

    2009-08-15

    Based on the hypothesis that highly expressed genes are often characterized by strong compositional bias in terms of codon usage, there are a number of measures currently in use that quantify codon usage bias in genes, and hence provide numerical indices to predict the expression levels of genes. With the recent advent of expression measure from the score of the relative codon usage bias (RCBS), we have explicitly tested the performance of this numerical measure to predict the gene expression level and illustrate this with an analysis of Yeast genomes. In contradiction with previous other studies, we observe a weak correlations between GC content and RCBS, but a selective pressure on the codon preferences in highly expressed genes. The assertion that the expression of a given gene depends on the score of relative codon usage bias (RCBS) is supported by the data. We further observe a strong correlation between RCBS and protein length indicating natural selection in favour of shorter genes to be expressed at higher level. We also attempt a statistical analysis to assess the strength of relative codon bias in genes as a guide to their likely expression level, suggesting a decrease of the informational entropy in the highly expressed genes.

  2. Selection of suitable reference genes from bone cells in large gradient high magnetic field based on GeNorm algorithm.

    PubMed

    Di, Shengmeng; Tian, Zongcheng; Qian, Airong; Gao, Xiang; Yu, Dan; Brandi, Maria Luisa; Shang, Peng

    2011-12-01

    Studies of animals and humans subjected to spaceflight demonstrate that weightlessness negatively affects the mass and mechanical properties of bone tissue. Bone cells could sense and respond to the gravity unloading, and genes sensitive to gravity change were considered to play a critical role in the mechanotransduction of bone cells. To evaluate the fold-change of gene expression, appropriate reference genes should be identified because there is no housekeeping gene having stable expression in all experimental conditions. Consequently, expression stability of ten candidate housekeeping genes were examined in osteoblast-like MC3T3-E1, osteocyte-like MLO-Y4, and preosteoclast-like FLG29.1 cells under different apparent gravities (μg, 1 g, and 2 g) in the high-intensity gradient magnetic field produced by a superconducting magnet. The results showed that the relative expression of these ten candidate housekeeping genes was different in different bone cells; Moreover, the most suitable reference genes of the same cells in altered gravity conditions were also different from that in strong magnetic field. It demonstrated the importance of selecting suitable reference genes in experimental set-ups. Furthermore, it provides an alternative choice to the traditionally accepted housekeeping genes used so far about studies of gravitational biology and magneto biology.

  3. Plant native tryptophan synthase beta 1 gene is a non-antibiotic selection marker for plant transformation.

    PubMed

    Hsiao, Paoyuan; Sanjaya; Su, Ruey-Chih; Teixeira da Silva, Jaime A; Chan, Ming-Tsair

    2007-03-01

    Gene transformation is an integral tool for plant genetic engineering. All antibiotic resistant genes currently employed are of bacterial origin and their presence in the field is undesirable. Therefore, we developed a novel and efficient plant native non-antibiotic selection system for the selection of transgenic plants in the model system Arabidopsis. This new system is based on the enhanced expression of Arabidopsis tryptophan synthase beta 1 (AtTSB1) and the use of 5-methyl-tryptophan (5MT, a tryptophan [Trp] analog) and/or CdCl2 as selection agent(s). We successfully integrated an expression cassette containing an AtT-SB1 cDNA driven by a cauliflower mosaic virus 35S promoter into Arabidopsis by floral dip transformation. Transgenic plants were efficiently selected on MS medium supplemented with 75 microM 5MT or 300 microM CdCl2 devoid of antibiotics. TSB1 selection was as efficient as the conventional hygromycin selection system. Northern blot analysis of transgenic plants selected by 5MT and CdCl2 revealed increased TSB1 mRNA transcript whereas uneven transcript levels of hygromycin phosphotransferase II (hpt) (control) was observed. Gas chromatography-mass spectrometry revealed 10-15 fold greater free Trp content in AtT-SB1 transgenic plants than in wild-type plants grown with or without 5MT or CdCl2. Taken together, the TSB1 system provides a novel selection system distinct from conventional antibiotic selection systems.

  4. Long prereproductive selection and divergence by depth in a Caribbean candelabrum coral

    PubMed Central

    Prada, Carlos; Hellberg, Michael E.

    2013-01-01

    Long-lived corals, the foundation of modern reefs, often follow ecological gradients, so that populations or sister species segregate by habitat. Adaptive divergence maintains sympatric congeners after secondary contact or may even generate species by natural selection in the face of gene flow. Such ecological divergence, initially between alternative phenotypes within populations, may be aided by immigrant inviability, especially when a long period separates larval dispersal and the onset of reproduction, during which selection can sort lineages to match different habitats. Here, we evaluate the strength of one ecological factor (depth) to isolate populations by comparing the genes and morphologies of pairs of depth-segregated populations of the candelabrum coral Eunicea flexuosa across the Caribbean. Eunicea is endemic to the Caribbean and all sister species co-occur. Eunicea flexuosa is widespread both geographically and across reef habitats. Our genetic analysis revealed two depth-segregated lineages. Field survivorship data, combined with estimates of selection coefficients based on transplant experiments, suggest that selection is strong enough to segregate these two lineages. Genetic exchange between the Shallow and Deep lineages occurred either immediately after divergence or the two have diverged with gene flow. Migration occurs asymmetrically from the Shallow to Deep lineage. Limited recruitment to reproductive age, even under weak annual selection advantage, is sufficient to generate habitat segregation because of the cumulative prolonged prereproductive selection. Ecological factors associated with depth can act as filters generating strong barriers to gene flow, altering morphologies, and contributing to the potential for speciation in the sea. PMID:23359716

  5. A genome-wide scan for signatures of differential artificial selection in ten cattle breeds.

    PubMed

    Rothammer, Sophie; Seichter, Doris; Förster, Martin; Medugorac, Ivica

    2013-12-21

    Since the times of domestication, cattle have been continually shaped by the influence of humans. Relatively recent history, including breed formation and the still enduring enormous improvement of economically important traits, is expected to have left distinctive footprints of selection within the genome. The purpose of this study was to map genome-wide selection signatures in ten cattle breeds and thus improve the understanding of the genome response to strong artificial selection and support the identification of the underlying genetic variants of favoured phenotypes. We analysed 47,651 single nucleotide polymorphisms (SNP) using Cross Population Extended Haplotype Homozygosity (XP-EHH). We set the significance thresholds using the maximum XP-EHH values of two essentially artificially unselected breeds and found up to 229 selection signatures per breed. Through a confirmation process we verified selection for three distinct phenotypes typical for one breed (polledness in Galloway, double muscling in Blanc-Bleu Belge and red coat colour in Red Holstein cattle). Moreover, we detected six genes strongly associated with known QTL for beef or dairy traits (TG, ABCG2, DGAT1, GH1, GHR and the Casein Cluster) within selection signatures of at least one breed. A literature search for genes lying in outstanding signatures revealed further promising candidate genes. However, in concordance with previous genome-wide studies, we also detected a substantial number of signatures without any yet known gene content. These results show the power of XP-EHH analyses in cattle to discover promising candidate genes and raise the hope of identifying phenotypically important variants in the near future. The finding of plausible functional candidates in some short signatures supports this hope. For instance, MAP2K6 is the only annotated gene of two signatures detected in Galloway and Gelbvieh cattle and is already known to be associated with carcass weight, back fat thickness and marbling score in Korean beef cattle. Based on the confirmation process and literature search we deduce that XP-EHH is able to uncover numerous artificial selection targets in subpopulations of domesticated animals.

  6. Selection of genes of Mycobacterium tuberculosis upregulated during residence in lungs of infected mice.

    PubMed

    Srivastava, Vikas; Jain, Anamika; Srivastava, Brahm S; Srivastava, Ranjana

    2008-05-01

    In sequel to previous report [Srivastava V, Rouanet C, Srivastava R, Ramalingam B, Locht C, Srivastava BS. Macrophage-specific Mycobacterium tuberculosis genes: identification by green fluorescent protein and kanamycin resistance selection. Microbiology 2007;153:659-66], the genes of Mycobacterium tuberculosis upregulated during residence in lungs of infected mice were identified in an in vivo expression system based on kanamycin resistance. A promoter library of M. tuberculosis was constructed in a promoter trap shuttle vector pLL192 containing an artificial bicistronic operon composed of promoterless green fluorescent protein gene followed by kanamycin resistance gene. The library was introduced in M. bovis BCG and then infected in mice by intravenous route. Mice were treated twice daily with 40 mg/kg dose of kanamycin by intramuscular route for 21 days. Recombinant BCG recovered from the lungs were reinfected in mice to enrich clones surviving kanamycin treatment in the lung but sensitive to killing by kanamycin in vitro. After nucleotide sequencing of inserts from these clones, 20 genes belonging to fatty acids metabolism, membrane transport, nitric oxide defence and PE_PGRS/PPE family were identified. Real-time PCR analysis using RNA isolated from M. tuberculosis grown in vitro and from the lungs, confirmed upregulation of genes from 2 to 20-fold in vivo compared to growth in vitro. Several of these select 20 genes were also found upregulated ex vivo in macrophage-like cell line J774A.1, thus, suggesting a correlation in mycobacterial gene expression between ex vivo and in vivo conditions.

  7. Genetic information transfer promotes cooperation in bacteria

    PubMed Central

    Dimitriu, Tatiana; Lotton, Chantal; Bénard-Capelle, Julien; Misevic, Dusan; Brown, Sam P.; Lindner, Ariel B.; Taddei, François

    2014-01-01

    Many bacterial species are social, producing costly secreted “public good” molecules that enhance the growth of neighboring cells. The genes coding for these cooperative traits are often propagated via mobile genetic elements and can be virulence factors from a biomedical perspective. Here, we present an experimental framework that links genetic information exchange and the selection of cooperative traits. Using simulations and experiments based on a synthetic bacterial system to control public good secretion and plasmid conjugation, we demonstrate that horizontal gene transfer can favor cooperation. In a well-mixed environment, horizontal transfer brings a direct infectious advantage to any gene, regardless of its cooperation properties. However, in a structured population transfer selects specifically for cooperation by increasing the assortment among cooperative alleles. Conjugation allows cooperative alleles to overcome rarity thresholds and invade bacterial populations structured purely by stochastic dilution effects. Our results provide an explanation for the prevalence of cooperative genes on mobile elements, and suggest a previously unidentified benefit of horizontal gene transfer for bacteria. PMID:25024219

  8. Highly Expressed Genes within Hippocampal Sector CA1: Implications for the Physiology of Memory.

    PubMed

    Meyer, Michael A

    2014-04-22

    As the CA1 sector has been implicated to play a key role in memory formation, a dedicated search for highly expressed genes within this region was made from an on-line atlas of gene expression within the mouse brain (GENSAT). From a data base of 1013 genes, 16 were identified that had selective localization of gene expression within the CA1 region, and included Angpt2, ARHGEF6, CCK, Cntnap1, DRD3, EMP1, Epha2, Itm2b, Lrrtm2, Mdk, PNMT, Ppm1e, Ppp2r2d, RASGRP1, Slitrk5, and Sstr4. Of the 16 identified, the most selective and intense localization for both adult and post-natal day 7 was noted for ARHGEF6, which is known to be linked to non-syndromic mental retardation, and has also been localized to dendritic spines. Further research on the role played by ARHGEF6 in memory formation is strongly advocated.

  9. Engineering Complex Microbial Phenotypes with Continuous Genetic Integration and Plasmid Based Multi-Gene Library

    DTIC Science & Technology

    2010-01-01

    genes from strains that have desirable traits. Here, we aim to enlarge the E. coli genome using Lactobacillus plantarum genes to build cells tolerant to...EtOH and BT. L. plantarum is an organism with established high tolerance to alcohols and solvents more broadly. Objective 2: Build a stress...heterologous (here: L. plantarum ; abbreviated as L. pl) DNA into the E. coli chromosome while selecting for insertions that enhance ethanol tolerance (which

  10. geneGIS: Computational Tools for Spatial Analyses of DNA Profiles with Associated Photo-Identification and Telemetry Records of Marine Mammals

    DTIC Science & Technology

    2012-09-30

    computational tools provide the ability to display, browse, select, filter and summarize spatio-temporal relationships of these individual-based...her research assistant at Esri, Shaun Walbridge, and members of the Marine Mammal Institute ( MMI ), including Tomas Follet and Debbie Steel. This...Genomics Laboratory, MMI , OSU. 4 As part of the geneGIS initiative, these SPLASH photo-identification records and the geneSPLASH DNA profiles

  11. Network-Based Method for Identifying Co-Regeneration Genes in Bone, Dentin, Nerve and Vessel Tissues

    PubMed Central

    Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Cai, Yu-Dong

    2017-01-01

    Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein–protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method. PMID:28974058

  12. Network-Based Method for Identifying Co- Regeneration Genes in Bone, Dentin, Nerve and Vessel Tissues.

    PubMed

    Chen, Lei; Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Huang, Tao; Cai, Yu-Dong

    2017-10-02

    Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein-protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method.

  13. Generation of astaxanthin mutants in Xanthophyllomyces dendrorhous using a double recombination method based on hygromycin resistance.

    PubMed

    Niklitschek, Mauricio; Baeza, Marcelo; Fernández-Lobato, María; Cifuentes, Víctor

    2012-01-01

    Generally two selection markers are required to obtain homozygous mutations in a diploid background, one for each gene copy that is interrupted. In this chapter is described a method that allows the double gene deletions of the two copies of a gene from a diploid organism, a wild-type strain of the Xanthophyllomyces dendrorhous yeast, using hygromycin B resistance as the only selection marker. To accomplish this, in a first step, a heterozygous hygromycin B-resistant strain is obtained by a single process of transformation (carrying the inserted hph gene). Following, the heterozygous mutant is grown in media with increasing concentrations of the antibiotic. In this way, the strains that became homozygous (by mitotic recombination) for the antibiotic marker would able to growth at higher concentration of the antibiotic than the heterozygous. The method can be potentially applied for obtaining double mutants of other diploid organisms.

  14. Discovery of new candidate genes for rheumatoid arthritis through integration of genetic association data with expression pathway analysis.

    PubMed

    Shchetynsky, Klementy; Diaz-Gallo, Lina-Marcella; Folkersen, Lasse; Hensvold, Aase Haj; Catrina, Anca Irinel; Berg, Louise; Klareskog, Lars; Padyukov, Leonid

    2017-02-02

    Here we integrate verified signals from previous genetic association studies with gene expression and pathway analysis for discovery of new candidate genes and signaling networks, relevant for rheumatoid arthritis (RA). RNA-sequencing-(RNA-seq)-based expression analysis of 377 genes from previously verified RA-associated loci was performed in blood cells from 5 newly diagnosed, non-treated patients with RA, 7 patients with treated RA and 12 healthy controls. Differentially expressed genes sharing a similar expression pattern in treated and untreated RA sub-groups were selected for pathway analysis. A set of "connector" genes derived from pathway analysis was tested for differential expression in the initial discovery cohort and validated in blood cells from 73 patients with RA and in 35 healthy controls. There were 11 qualifying genes selected for pathway analysis and these were grouped into two evidence-based functional networks, containing 29 and 27 additional connector molecules. The expression of genes, corresponding to connector molecules was then tested in the initial RNA-seq data. Differences in the expression of ERBB2, TP53 and THOP1 were similar in both treated and non-treated patients with RA and an additional nine genes were differentially expressed in at least one group of patients compared to healthy controls. The ERBB2, TP53. THOP1 expression profile was successfully replicated in RNA-seq data from peripheral blood mononuclear cells from healthy controls and non-treated patients with RA, in an independent collection of samples. Integration of RNA-seq data with findings from association studies, and consequent pathway analysis implicate new candidate genes, ERBB2, TP53 and THOP1 in the pathogenesis of RA.

  15. Genetic Variation of Goat Interferon Regulatory Factor 3 Gene and Its Implication in Goat Evolution

    PubMed Central

    Shu, Liping; Zhang, Yesheng; Wang, Yangzi; Sanni, Timothy M.; Imumorin, Ikhide G.; Peters, Sunday O.; Zhang, Jiajin; Dong, Yang; Wang, Wen

    2016-01-01

    The immune systems are fundamentally vital for evolution and survival of species; as such, selection patterns in innate immune loci are of special interest in molecular evolutionary research. The interferon regulatory factor (IRF) gene family control many different aspects of the innate and adaptive immune responses in vertebrates. Among these, IRF3 is known to take active part in very many biological processes. We assembled and evaluated 1356 base pairs of the IRF3 gene coding region in domesticated goats from Africa (Nigeria, Ethiopia and South Africa) and Asia (Iran and China) and the wild goat (Capra aegagrus). Five segregating sites with θ value of 0.0009 for this gene demonstrated a low diversity across the goats’ populations. Fu and Li tests were significantly positive but Tajima’s D test was significantly negative, suggesting its deviation from neutrality. Neighbor joining tree of IRF3 gene in domesticated goats, wild goat and sheep showed that all domesticated goats have a closer relationship than with the wild goat and sheep. Maximum likelihood tree of the gene showed that different domesticated goats share a common ancestor and suggest single origin. Four unique haplotypes were observed across all the sequences, of which, one was particularly common to African goats (MOCH-K14-0425, Poitou and WAD). In assessing the evolution mode of the gene, we found that the codon model dN/dS ratio for all goats was greater than one. Phylogenetic Analysis by Maximum Likelihood (PAML) gave a ω0 (dN/dS) value of 0.067 with LnL value of -6900.3 for the first Model (M1) while ω2 = 1.667 in model M2 with LnL value of -6900.3 with positive selection inferred in 3 codon sites. Mechanistic empirical combination (MEC) model for evaluating adaptive selection pressure on particular codons also confirmed adaptive selection pressure in three codons (207, 358 and 408) in IRF3 gene. Positive diversifying selection inferred with recent evolutionary changes in domesticated goat IRF3 led us to conclude that the gene evolution may have been influenced by domestication processes in goats. PMID:27598391

  16. Genetic Variation of Goat Interferon Regulatory Factor 3 Gene and Its Implication in Goat Evolution.

    PubMed

    Okpeku, Moses; Esmailizadeh, Ali; Adeola, Adeniyi C; Shu, Liping; Zhang, Yesheng; Wang, Yangzi; Sanni, Timothy M; Imumorin, Ikhide G; Peters, Sunday O; Zhang, Jiajin; Dong, Yang; Wang, Wen

    2016-01-01

    The immune systems are fundamentally vital for evolution and survival of species; as such, selection patterns in innate immune loci are of special interest in molecular evolutionary research. The interferon regulatory factor (IRF) gene family control many different aspects of the innate and adaptive immune responses in vertebrates. Among these, IRF3 is known to take active part in very many biological processes. We assembled and evaluated 1356 base pairs of the IRF3 gene coding region in domesticated goats from Africa (Nigeria, Ethiopia and South Africa) and Asia (Iran and China) and the wild goat (Capra aegagrus). Five segregating sites with θ value of 0.0009 for this gene demonstrated a low diversity across the goats' populations. Fu and Li tests were significantly positive but Tajima's D test was significantly negative, suggesting its deviation from neutrality. Neighbor joining tree of IRF3 gene in domesticated goats, wild goat and sheep showed that all domesticated goats have a closer relationship than with the wild goat and sheep. Maximum likelihood tree of the gene showed that different domesticated goats share a common ancestor and suggest single origin. Four unique haplotypes were observed across all the sequences, of which, one was particularly common to African goats (MOCH-K14-0425, Poitou and WAD). In assessing the evolution mode of the gene, we found that the codon model dN/dS ratio for all goats was greater than one. Phylogenetic Analysis by Maximum Likelihood (PAML) gave a ω0 (dN/dS) value of 0.067 with LnL value of -6900.3 for the first Model (M1) while ω2 = 1.667 in model M2 with LnL value of -6900.3 with positive selection inferred in 3 codon sites. Mechanistic empirical combination (MEC) model for evaluating adaptive selection pressure on particular codons also confirmed adaptive selection pressure in three codons (207, 358 and 408) in IRF3 gene. Positive diversifying selection inferred with recent evolutionary changes in domesticated goat IRF3 led us to conclude that the gene evolution may have been influenced by domestication processes in goats.

  17. A DNA microarray for identification of selected Korean birds based on mitochondrial cytochrome c oxidase I gene sequences.

    PubMed

    Chung, In-Hyuk; Yoo, Hye Sook; Eah, Jae-Yong; Yoon, Hyun-Kyu; Jung, Jin-Wook; Hwang, Seung Yong; Kim, Chang-Bae

    2010-10-01

    DNA barcoding with the gene encoding cytochrome c oxidase I (COI) in the mitochondrial genome has been proposed as a standard marker to identify and discover animal species. Some migratory wild birds are suspected of transmitting avian influenza and pose a threat to aircraft safety because of bird strikes. We have previously reported the COI gene sequences of 92 Korean bird species. In the present study, we developed a DNA microarray to identify 17 selected bird species on the basis of nucleotide diversity. We designed and synthesized 19 specific oligonucleotide probes; these probes were arrayed on a silylated glass slide. The length of the probes was 19-24 bps. The COI sequences amplified from the tissues of the selected birds were labeled with a fluorescent probe for microarray hybridization, and unique hybridization patterns were detected for each selected species. These patterns may be considered diagnostic patterns for species identification. This microarray system will provide a sensitive and a high-throughput method for identification of Korean birds.

  18. Selection of housekeeping genes and demonstration of RNAi in cotton leafhopper, Amrasca biguttula biguttula (Ishida)

    PubMed Central

    Gupta, Mridula; Pandher, Suneet; Kaur, Gurmeet; Rathore, Pankaj; Palli, Subba Reddy

    2018-01-01

    Amrasca biguttula biguttula (Ishida) commonly known as cotton leafhopper is a severe pest of cotton and okra. Not much is known on this insect at molecular level due to lack of genomic and transcriptomic data. To prepare for functional genomic studies in this insect, we evaluated 15 common housekeeping genes (Tub, B-Tub, EF alpha, GADPH, UbiCF, RP13, Ubiq, G3PD, VATPase, Actin, 18s, 28s, TATA, ETF, SOD and Cytolytic actin) during different developmental stages and under starvation stress. We selected early (1st and 2nd), late (3rd and 4th) stage nymphs and adults for identification of stable housekeeping genes using geNorm, NormFinder, BestKeeper and RefFinder software. Based on the different algorithms, RP13 and VATPase are identified as the most suitable reference genes for quantification of gene expression by reverse transcriptase quantitative PCR (RT-qPCR). Based on RefFinder which comprehended the results of three algorithms, RP13 in adults, Tubulin (Tub) in late nymphs, 28S in early nymph and UbiCF under starvation stress were identified as the most stable genes. We also developed methods for feeding double-stranded RNA (dsRNA) incorporated in the diet. Feeding dsRNA targeting Snf7, IAP, AQP1, and VATPase caused 56.17–77.12% knockdown of targeted genes compared to control and 16 to 48% mortality of treated insects when compared to control. PMID:29329327

  19. Selection Shapes Transcriptional Logic and Regulatory Specialization in Genetic Networks

    PubMed Central

    Fogelmark, Karl; Peterson, Carsten; Troein, Carl

    2016-01-01

    Background Living organisms need to regulate their gene expression in response to environmental signals and internal cues. This is a computational task where genes act as logic gates that connect to form transcriptional networks, which are shaped at all scales by evolution. Large-scale mutations such as gene duplications and deletions add and remove network components, whereas smaller mutations alter the connections between them. Selection determines what mutations are accepted, but its importance for shaping the resulting networks has been debated. Methodology To investigate the effects of selection in the shaping of transcriptional networks, we derive transcriptional logic from a combinatorially powerful yet tractable model of the binding between DNA and transcription factors. By evolving the resulting networks based on their ability to function as either a simple decision system or a circadian clock, we obtain information on the regulation and logic rules encoded in functional transcriptional networks. Comparisons are made between networks evolved for different functions, as well as with structurally equivalent but non-functional (neutrally evolved) networks, and predictions are validated against the transcriptional network of E. coli. Principal Findings We find that the logic rules governing gene expression depend on the function performed by the network. Unlike the decision systems, the circadian clocks show strong cooperative binding and negative regulation, which achieves tight temporal control of gene expression. Furthermore, we find that transcription factors act preferentially as either activators or repressors, both when binding multiple sites for a single target gene and globally in the transcriptional networks. This separation into positive and negative regulators requires gene duplications, which highlights the interplay between mutation and selection in shaping the transcriptional networks. PMID:26927540

  20. Adaptations to Climate in Candidate Genes for Common Metabolic Disorders

    PubMed Central

    Hancock, Angela M; Witonsky, David B; Gordon, Adam S; Eshel, Gidon; Pritchard, Jonathan K; Coop, Graham; Di Rienzo, Anna

    2008-01-01

    Evolutionary pressures due to variation in climate play an important role in shaping phenotypic variation among and within species and have been shown to influence variation in phenotypes such as body shape and size among humans. Genes involved in energy metabolism are likely to be central to heat and cold tolerance. To test the hypothesis that climate shaped variation in metabolism genes in humans, we used a bioinformatics approach based on network theory to select 82 candidate genes for common metabolic disorders. We genotyped 873 tag SNPs in these genes in 54 worldwide populations (including the 52 in the Human Genome Diversity Project panel) and found correlations with climate variables using rank correlation analysis and a newly developed method termed Bayesian geographic analysis. In addition, we genotyped 210 carefully matched control SNPs to provide an empirical null distribution for spatial patterns of allele frequency due to population history alone. For nearly all climate variables, we found an excess of genic SNPs in the tail of the distributions of the test statistics compared to the control SNPs, implying that metabolic genes as a group show signals of spatially varying selection. Among our strongest signals were several SNPs (e.g., LEPR R109K, FABP2 A54T) that had previously been associated with phenotypes directly related to cold tolerance. Since variation in climate may be correlated with other aspects of environmental variation, it is possible that some of the signals that we detected reflect selective pressures other than climate. Nevertheless, our results are consistent with the idea that climate has been an important selective pressure acting on candidate genes for common metabolic disorders. PMID:18282109

  1. Genetics of Cerebellar and Neocortical Expansion in Anthropoid Primates: A Comparative Approach

    PubMed Central

    Harrison, Peter W.; Montgomery, Stephen H.

    2017-01-01

    What adaptive changes in brain structure and function underpin the evolution of increased cognitive performance in humans and our close relatives? Identifying the genetic basis of brain evolution has become a major tool in answering this question. Numerous cases of positive selection, altered gene expression or gene duplication have been identified that may contribute to the evolution of the neocortex, which is widely assumed to play a predominant role in cognitive evolution. However, the components of the neocortex co-evolve with other functionally interdependent regions of the brain, most notably in the cerebellum. The cerebellum is linked to a range of cognitive tasks and expanded rapidly during hominoid evolution. Here we present data that suggest that, across anthropoid primates, protein-coding genes with known roles in cerebellum development were just as likely to be targeted by selection as genes linked to cortical development. Indeed, based on currently available gene ontology data, protein-coding genes with known roles in cerebellum development are more likely to have evolved adaptively during hominoid evolution. This is consistent with phenotypic data suggesting an accelerated rate of cerebellar expansion in apes that is beyond that predicted from scaling with the neocortex in other primates. Finally, we present evidence that the strength of selection on specific genes is associated with variation in the volume of either the neocortex or the cerebellum, but not both. This result provides preliminary evidence that co-variation between these brain components during anthropoid evolution may be at least partly regulated by selection on independent loci, a conclusion that is consistent with recent intraspecific genetic analyses and a mosaic model of brain evolution that predicts adaptive evolution of brain structure. PMID:28683440

  2. The WRKY Transcription Factor Genes in Eggplant (Solanum melongena L.) and Turkey Berry (Solanum torvum Sw.)

    PubMed Central

    Yang, Xu; Deng, Cao; Zhang, Yu; Cheng, Yufu; Huo, Qiuyue; Xue, Linbao

    2015-01-01

    WRKY transcription factors, which play critical roles in stress responses, have not been characterized in eggplant or its wild relative, turkey berry. The recent availability of RNA-sequencing data provides the opportunity to examine WRKY genes from a global perspective. We identified 50 and 62 WRKY genes in eggplant (SmelWRKYs) and turkey berry (StorWRKYs), respectively, all of which could be classified into three groups (I–III) based on the WRKY protein structure. The SmelWRKYs and StorWRKYs contain ~76% and ~95% of the number of WRKYs found in other sequenced asterid species, respectively. Positive selection analysis revealed that different selection constraints could have affected the evolution of these groups. Positively-selected sites were found in Groups IIc and III. Branch-specific selection pressure analysis indicated that most WRKY domains from SmelWRKYs and StorWRKYs are conserved and have evolved at low rates since their divergence. Comparison to homologous WRKY genes in Arabidopsis revealed several potential pathogen resistance-related SmelWRKYs and StorWRKYs, providing possible candidate genetic resources for improving stress tolerance in eggplant and probably other Solanaceae plants. To our knowledge, this is the first report of a genome-wide analyses of the SmelWRKYs and StorWRKYs. PMID:25853261

  3. The WRKY transcription factor genes in eggplant (Solanum melongena L.) and Turkey Berry (Solanum torvum Sw.).

    PubMed

    Yang, Xu; Deng, Cao; Zhang, Yu; Cheng, Yufu; Huo, Qiuyue; Xue, Linbao

    2015-04-07

    WRKY transcription factors, which play critical roles in stress responses, have not been characterized in eggplant or its wild relative, turkey berry. The recent availability of RNA-sequencing data provides the opportunity to examine WRKY genes from a global perspective. We identified 50 and 62 WRKY genes in eggplant (SmelWRKYs) and turkey berry (StorWRKYs), respectively, all of which could be classified into three groups (I-III) based on the WRKY protein structure. The SmelWRKYs and StorWRKYs contain ~76% and ~95% of the number of WRKYs found in other sequenced asterid species, respectively. Positive selection analysis revealed that different selection constraints could have affected the evolution of these groups. Positively-selected sites were found in Groups IIc and III. Branch-specific selection pressure analysis indicated that most WRKY domains from SmelWRKYs and StorWRKYs are conserved and have evolved at low rates since their divergence. Comparison to homologous WRKY genes in Arabidopsis revealed several potential pathogen resistance-related SmelWRKYs and StorWRKYs, providing possible candidate genetic resources for improving stress tolerance in eggplant and probably other Solanaceae plants. To our knowledge, this is the first report of a genome-wide analyses of the SmelWRKYs and StorWRKYs.

  4. Testing mate choice and overdominance at MH in natural families of Atlantic salmon Salmo salar.

    PubMed

    Tentelier, C; Barroso-Gomila, O; Lepais, O; Manicki, A; Romero-Garmendia, I; Jugo, B M

    2017-04-01

    This study aimed to test mate choice and selection during early life stages on major histocompatibility (MH) genotype in natural families of Atlantic salmon Salmo salar spawners and juveniles, using nine microsatellites to reconstruct families, one microsatellite linked to an MH class I gene and one minisatellite linked to an MH class II gene. MH-based mate choice was only detected for the class I locus on the first year, with lower expected heterozygosity in the offspring of actually mated pairs than predicted under random mating. The genotype frequencies of MH-linked loci observed in the juveniles were compared with frequencies expected from Mendelian inheritance of parental alleles to detect selection during early life stages. No selection was detected on the locus linked to class I gene. For the locus linked to class II gene, observed heterozygosity was higher than expected in the first year and lower in the second year, suggesting overdominance and underdominance, respectively. Within family, juveniles' body size was linked to heterozygosity at the same locus, with longer heterozygotes in the first year and longer homozygotes in the second year. Selection therefore seems to differ from one locus to the other and from year to year. © 2017 The Fisheries Society of the British Isles.

  5. Computational analysis of sequence selection mechanisms.

    PubMed

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  6. Rank-based estimation in the {ell}1-regularized partly linear model for censored outcomes with application to integrated analyses of clinical predictors and gene expression data.

    PubMed

    Johnson, Brent A

    2009-10-01

    We consider estimation and variable selection in the partial linear model for censored data. The partial linear model for censored data is a direct extension of the accelerated failure time model, the latter of which is a very important alternative model to the proportional hazards model. We extend rank-based lasso-type estimators to a model that may contain nonlinear effects. Variable selection in such partial linear model has direct application to high-dimensional survival analyses that attempt to adjust for clinical predictors. In the microarray setting, previous methods can adjust for other clinical predictors by assuming that clinical and gene expression data enter the model linearly in the same fashion. Here, we select important variables after adjusting for prognostic clinical variables but the clinical effects are assumed nonlinear. Our estimator is based on stratification and can be extended naturally to account for multiple nonlinear effects. We illustrate the utility of our method through simulation studies and application to the Wisconsin prognostic breast cancer data set.

  7. Φ-score: A cell-to-cell phenotypic scoring method for sensitive and selective hit discovery in cell-based assays.

    PubMed

    Guyon, Laurent; Lajaunie, Christian; Fer, Frédéric; Bhajun, Ricky; Sulpice, Eric; Pinna, Guillaume; Campalans, Anna; Radicella, J Pablo; Rouillier, Philippe; Mary, Mélissa; Combe, Stéphanie; Obeid, Patricia; Vert, Jean-Philippe; Gidrol, Xavier

    2015-09-18

    Phenotypic screening monitors phenotypic changes induced by perturbations, including those generated by drugs or RNA interference. Currently-used methods for scoring screen hits have proven to be problematic, particularly when applied to physiologically relevant conditions such as low cell numbers or inefficient transfection. Here, we describe the Φ-score, which is a novel scoring method for the identification of phenotypic modifiers or hits in cell-based screens. Φ-score performance was assessed with simulations, a validation experiment and its application to gene identification in a large-scale RNAi screen. Using robust statistics and a variance model, we demonstrated that the Φ-score showed better sensitivity, selectivity and reproducibility compared to classical approaches. The improved performance of the Φ-score paves the way for cell-based screening of primary cells, which are often difficult to obtain from patients in sufficient numbers. We also describe a dedicated merging procedure to pool scores from small interfering RNAs targeting the same gene so as to provide improved visualization and hit selection.

  8. Φ-score: A cell-to-cell phenotypic scoring method for sensitive and selective hit discovery in cell-based assays

    PubMed Central

    Guyon, Laurent; Lajaunie, Christian; fer, Frédéric; bhajun, Ricky; sulpice, Eric; pinna, Guillaume; campalans, Anna; radicella, J. Pablo; rouillier, Philippe; mary, Mélissa; combe, Stéphanie; obeid, Patricia; vert, Jean-Philippe; gidrol, Xavier

    2015-01-01

    Phenotypic screening monitors phenotypic changes induced by perturbations, including those generated by drugs or RNA interference. Currently-used methods for scoring screen hits have proven to be problematic, particularly when applied to physiologically relevant conditions such as low cell numbers or inefficient transfection. Here, we describe the Φ-score, which is a novel scoring method for the identification of phenotypic modifiers or hits in cell-based screens. Φ-score performance was assessed with simulations, a validation experiment and its application to gene identification in a large-scale RNAi screen. Using robust statistics and a variance model, we demonstrated that the Φ-score showed better sensitivity, selectivity and reproducibility compared to classical approaches. The improved performance of the Φ-score paves the way for cell-based screening of primary cells, which are often difficult to obtain from patients in sufficient numbers. We also describe a dedicated merging procedure to pool scores from small interfering RNAs targeting the same gene so as to provide improved visualization and hit selection. PMID:26382112

  9. Impact of strong selection for the PrP major gene on genetic variability of four French sheep breeds (Open Access publication)

    PubMed Central

    Palhiere, Isabelle; Brochard, Mickaël; Moazami-Goudarzi, Katayoun; Laloë, Denis; Amigues, Yves; Bed'hom, Bertrand; Neuts, Étienne; Leymarie, Cyril; Pantano, Thais; Cribiu, Edmond Paul; Bibé, Bernard; Verrier, Étienne

    2008-01-01

    Effective selection on the PrP gene has been implemented since October 2001 in all French sheep breeds. After four years, the ARR "resistant" allele frequency increased by about 35% in young males. The aim of this study was to evaluate the impact of this strong selection on genetic variability. It is focussed on four French sheep breeds and based on the comparison of two groups of 94 animals within each breed: the first group of animals was born before the selection began, and the second, 3–4 years later. Genetic variability was assessed using genealogical and molecular data (29 microsatellite markers). The expected loss of genetic variability on the PrP gene was confirmed. Moreover, among the five markers located in the PrP region, only the three closest ones were affected. The evolution of the number of alleles, heterozygote deficiency within population, expected heterozygosity and the Reynolds distances agreed with the criteria from pedigree and pointed out that neutral genetic variability was not much affected. This trend depended on breed, i.e. on their initial states (population size, PrP frequencies) and on the selection strategies for improving scrapie resistance while carrying out selection for production traits. PMID:18990357

  10. Evaluation of the effect and profitability of gene-assisted selection in pig breeding system*

    PubMed Central

    Li, Ya-lan; Zhang, Qin; Chen, Yao-sheng

    2007-01-01

    Objective: To evaluate the effect and profitability of using the quantitative trait loci (QTL)-linked direct marker (DR marker) in gene-assisted selection (GAS). Methods: Three populations (100, 200, or 300 sows plus 10 boars within each group) with segregating QTL were simulated stochastically. Five economic traits were investigated, including number of born alive (NBA), average daily gain to 100 kg body weight (ADG), feed conversion ratio (FCR), back fat at 100 kg body weight (BF) and intramuscular fat (IMF). Selection was based on the estimated breeding value (EBV) of each trait. The starting frequencies of the QTL’s favorable allele were 0.1, 0.3 and 0.5, respectively. The economic return was calculated by gene flow method. Results: The selection efficiency was higher than 100% when DR markers were used in GAS for 5 traits. The selection efficiency for NBA was the highest, and the lowest was for ADG whose QTL had the lowest variance. The mixed model applied DR markers and obtained higher extra genetic gain and extra economic returns. We also found that the lower the frequency of the favorable allele of the QTL, the higher the extra return obtained. Conclusion: GAS is an effective selection scheme to increase the genetic gain and the economic returns in pig breeding. PMID:17973344

  11. Evaluation of the effect and profitability of gene-assisted selection in pig breeding system.

    PubMed

    Li, Ya-Lan; Zhang, Qin; Chen, Yao-Sheng

    2007-11-01

    To evaluate the effect and profitability of using the quantitative trait loci (QTL)-linked direct marker (DR marker) in gene-assisted selection (GAS). Three populations (100, 200, or 300 sows plus 10 boars within each group) with segregating QTL were simulated stochastically. Five economic traits were investigated, including number of born alive (NBA), average daily gain to 100 kg body weight (ADG), feed conversion ratio (FCR), back fat at 100 kg body weight (BF) and intramuscular fat (IMF). Selection was based on the estimated breeding value (EBV) of each trait. The starting frequencies of the QTL's favorable allele were 0.1, 0.3 and 0.5, respectively. The economic return was calculated by gene flow method. The selection efficiency was higher than 100% when DR markers were used in GAS for 5 traits. The selection efficiency for NBA was the highest, and the lowest was for ADG whose QTL had the lowest variance. The mixed model applied DR markers and obtained higher extra genetic gain and extra economic returns. We also found that the lower the frequency of the favorable allele of the QTL, the higher the extra return obtained. GAS is an effective selection scheme to increase the genetic gain and the economic returns in pig breeding.

  12. Locus-dependent selection in crop-wild hybrids of lettuce under field conditions and its implication for GM crop development

    PubMed Central

    Hooftman, Danny A P; Flavell, Andrew J; Jansen, Hans; den Nijs, Hans C M; Syed, Naeem H; Sørensen, Anker P; Orozco-ter Wengel, Pablo; van de Wiel, Clemens C M

    2011-01-01

    Gene escape from crops has gained much attention in the last two decades, as transgenes introgressing into wild populations could affect the latter's ecological characteristics. However, different genes have different likelihoods of introgression. The mixture of selective forces provided by natural conditions creates an adaptive mosaic of alleles from both parental species. We investigated segregation patterns after hybridization between lettuce (Lactuca sativa) and its wild relative, L. serriola. Three generations of hybrids (S1, BC1, and BC1S1) were grown in habitats mimicking the wild parent's habitat. As control, we harvested S1 seedlings grown under controlled conditions, providing very limited possibility for selection. We used 89 AFLP loci, as well as more recently developed dominant markers, 115 retrotransposon markers (SSAP), and 28 NBS loci linked to resistance genes. For many loci, allele frequencies were biased in plants exposed to natural field conditions, including over-representation of crop alleles for various loci. Furthermore, Linkage disequilibrium was locally changed, allegedly by selection caused by the natural field conditions, providing ample opportunity for genetic hitchhiking. Our study indicates that when developing genetically modified crops, a judicious selection of insertion sites, based on knowledge of selective (dis)advantages of the surrounding crop genome under field conditions, could diminish transgene persistence. PMID:25568012

  13. Locus-dependent selection in crop-wild hybrids of lettuce under field conditions and its implication for GM crop development.

    PubMed

    Hooftman, Danny A P; Flavell, Andrew J; Jansen, Hans; den Nijs, Hans C M; Syed, Naeem H; Sørensen, Anker P; Orozco-Ter Wengel, Pablo; van de Wiel, Clemens C M

    2011-09-01

    Gene escape from crops has gained much attention in the last two decades, as transgenes introgressing into wild populations could affect the latter's ecological characteristics. However, different genes have different likelihoods of introgression. The mixture of selective forces provided by natural conditions creates an adaptive mosaic of alleles from both parental species. We investigated segregation patterns after hybridization between lettuce (Lactuca sativa) and its wild relative, L. serriola. Three generations of hybrids (S1, BC1, and BC1S1) were grown in habitats mimicking the wild parent's habitat. As control, we harvested S1 seedlings grown under controlled conditions, providing very limited possibility for selection. We used 89 AFLP loci, as well as more recently developed dominant markers, 115 retrotransposon markers (SSAP), and 28 NBS loci linked to resistance genes. For many loci, allele frequencies were biased in plants exposed to natural field conditions, including over-representation of crop alleles for various loci. Furthermore, Linkage disequilibrium was locally changed, allegedly by selection caused by the natural field conditions, providing ample opportunity for genetic hitchhiking. Our study indicates that when developing genetically modified crops, a judicious selection of insertion sites, based on knowledge of selective (dis)advantages of the surrounding crop genome under field conditions, could diminish transgene persistence.

  14. Antibiotic-Free Selection in Biotherapeutics: Now and Forever

    PubMed Central

    Mignon, Charlotte; Sodoyer, Régis; Werle, Bettina

    2015-01-01

    The continuously improving sophistication of molecular engineering techniques gives access to novel classes of bio-therapeutics and new challenges for their production in full respect of the strengthening regulations. Among these biologic agents are DNA based vaccines or gene therapy products and to a lesser extent genetically engineered live vaccines or delivery vehicles. The use of antibiotic-based selection, frequently associated with genetic manipulation of microorganism is currently undergoing a profound metamorphosis with the implementation and diversification of alternative selection means. This short review will present examples of alternatives to antibiotic selection and their context of application to highlight their ineluctable invasion of the bio-therapeutic world. PMID:25854922

  15. The major histocompatibility complex and mate choice: inbreeding avoidance and selection of good genes.

    PubMed

    Grob, B; Knapp, L A; Martin, R D; Anzenberger, G

    1998-01-01

    It has been known for decades that MHC genes play a critical role in the cellular immune response, but only recent research has provided a better understanding of how these molecules might affect mate choice. Original studies in inbred mouse strains revealed that mate choice was influenced by MHC dissimilarity. Detection of MHC differences between individuals in these experiments was related to olfactory cues, primarily in urine. Recent studies in humans have shown an analogous picture of MHC-based mating. Taken together, these findings could support either the hypothesis of MHC-based inbreeding avoidance or the hypothesis of MHC-related avoidance of reproductive failure, since studies in mice, humans and pigtailed macaques have shown that parental sharing of certain MHC alleles correlates with frequent spontaneous abortion or prolonged intergestational intervals. Data from many mammalian species clearly demonstrate that reproductive failure occurs as a result of inbreeding. Therefore, MHC similarity might serve as an indicator of genome-wide relatedness. In contrast, increased fitness due to the presence of individual MHC alleles in a pathogenic environment could explain MHC-based selection of currently good genes. Specifically, the physical condition of long-living animals depends on the ability to respond to immunological challenge and an individual's MHC alleles determine the response, since, unlike the T cell receptors, MHC alleles are not somatically recombined. Therefore, sexual selection of condition-dependent traits during mate choice could be used to select successful MHC alleles, thereby providing offspring with a higher relative immunity in their pathogenic environment.

  16. Using RNA-seq data to select reference genes for normalizing gene expression in apple roots.

    PubMed

    Zhou, Zhe; Cong, Peihua; Tian, Yi; Zhu, Yanmin

    2017-01-01

    Gene expression in apple roots in response to various stress conditions is a less-explored research subject. Reliable reference genes for normalizing quantitative gene expression data have not been carefully investigated. In this study, the suitability of a set of 15 apple genes were evaluated for their potential use as reliable reference genes. These genes were selected based on their low variance of gene expression in apple root tissues from a recent RNA-seq data set, and a few previously reported apple reference genes for other tissue types. Four methods, Delta Ct, geNorm, NormFinder and BestKeeper, were used to evaluate their stability in apple root tissues of various genotypes and under different experimental conditions. A small panel of stably expressed genes, MDP0000095375, MDP0000147424, MDP0000233640, MDP0000326399 and MDP0000173025 were recommended for normalizing quantitative gene expression data in apple roots under various abiotic or biotic stresses. When the most stable and least stable reference genes were used for data normalization, significant differences were observed on the expression patterns of two target genes, MdLecRLK5 (MDP0000228426, a gene encoding a lectin receptor like kinase) and MdMAPK3 (MDP0000187103, a gene encoding a mitogen-activated protein kinase). Our data also indicated that for those carefully validated reference genes, a single reference gene is sufficient for reliable normalization of the quantitative gene expression. Depending on the experimental conditions, the most suitable reference genes can be specific to the sample of interest for more reliable RT-qPCR data normalization.

  17. Using RNA-seq data to select reference genes for normalizing gene expression in apple roots

    PubMed Central

    Zhou, Zhe; Cong, Peihua; Tian, Yi

    2017-01-01

    Gene expression in apple roots in response to various stress conditions is a less-explored research subject. Reliable reference genes for normalizing quantitative gene expression data have not been carefully investigated. In this study, the suitability of a set of 15 apple genes were evaluated for their potential use as reliable reference genes. These genes were selected based on their low variance of gene expression in apple root tissues from a recent RNA-seq data set, and a few previously reported apple reference genes for other tissue types. Four methods, Delta Ct, geNorm, NormFinder and BestKeeper, were used to evaluate their stability in apple root tissues of various genotypes and under different experimental conditions. A small panel of stably expressed genes, MDP0000095375, MDP0000147424, MDP0000233640, MDP0000326399 and MDP0000173025 were recommended for normalizing quantitative gene expression data in apple roots under various abiotic or biotic stresses. When the most stable and least stable reference genes were used for data normalization, significant differences were observed on the expression patterns of two target genes, MdLecRLK5 (MDP0000228426, a gene encoding a lectin receptor like kinase) and MdMAPK3 (MDP0000187103, a gene encoding a mitogen-activated protein kinase). Our data also indicated that for those carefully validated reference genes, a single reference gene is sufficient for reliable normalization of the quantitative gene expression. Depending on the experimental conditions, the most suitable reference genes can be specific to the sample of interest for more reliable RT-qPCR data normalization. PMID:28934340

  18. High-throughput microarray technology in diagnostics of enterobacteria based on genome-wide probe selection and regression analysis.

    PubMed

    Friedrich, Torben; Rahmann, Sven; Weigel, Wilfried; Rabsch, Wolfgang; Fruth, Angelika; Ron, Eliora; Gunzer, Florian; Dandekar, Thomas; Hacker, Jörg; Müller, Tobias; Dobrindt, Ulrich

    2010-10-21

    The Enterobacteriaceae comprise a large number of clinically relevant species with several individual subspecies. Overlapping virulence-associated gene pools and the high overall genome plasticity often interferes with correct enterobacterial strain typing and risk assessment. Array technology offers a fast, reproducible and standardisable means for bacterial typing and thus provides many advantages for bacterial diagnostics, risk assessment and surveillance. The development of highly discriminative broad-range microbial diagnostic microarrays remains a challenge, because of marked genome plasticity of many bacterial pathogens. We developed a DNA microarray for strain typing and detection of major antimicrobial resistance genes of clinically relevant enterobacteria. For this purpose, we applied a global genome-wide probe selection strategy on 32 available complete enterobacterial genomes combined with a regression model for pathogen classification. The discriminative power of the probe set was further tested in silico on 15 additional complete enterobacterial genome sequences. DNA microarrays based on the selected probes were used to type 92 clinical enterobacterial isolates. Phenotypic tests confirmed the array-based typing results and corroborate that the selected probes allowed correct typing and prediction of major antibiotic resistances of clinically relevant Enterobacteriaceae, including the subspecies level, e.g. the reliable distinction of different E. coli pathotypes. Our results demonstrate that the global probe selection approach based on longest common factor statistics as well as the design of a DNA microarray with a restricted set of discriminative probes enables robust discrimination of different enterobacterial variants and represents a proof of concept that can be adopted for diagnostics of a wide range of microbial pathogens. Our approach circumvents misclassifications arising from the application of virulence markers, which are highly affected by horizontal gene transfer. Moreover, a broad range of pathogens have been covered by an efficient probe set size enabling the design of high-throughput diagnostics.

  19. Balancing selection and genetic drift at major histocompatibility complex class II genes in isolated populations of golden snub-nosed monkey (Rhinopithecus roxellana)

    PubMed Central

    2012-01-01

    Background Small, isolated populations often experience loss of genetic variation due to random genetic drift. Unlike neutral or nearly neutral markers (such as mitochondrial genes or microsatellites), major histocompatibility complex (MHC) genes in these populations may retain high levels of polymorphism due to balancing selection. The relative roles of balancing selection and genetic drift in either small isolated or bottlenecked populations remain controversial. In this study, we examined the mechanisms maintaining polymorphisms of MHC genes in small isolated populations of the endangered golden snub-nosed monkey (Rhinopithecus roxellana) by comparing genetic variation found in MHC and microsatellite loci. There are few studies of this kind conducted on highly endangered primate species. Results Two MHC genes were sequenced and sixteen microsatellite loci were genotyped from samples representing three isolated populations. We isolated nine DQA1 alleles and sixteen DQB1 alleles and validated expression of the alleles. Lowest genetic variation for both MHC and microsatellites was found in the Shennongjia (SNJ) population. Historical balancing selection was revealed at both the DQA1 and DQB1 loci, as revealed by excess non-synonymous substitutions at antigen binding sites (ABS) and maximum-likelihood-based random-site models. Patterns of microsatellite variation revealed population structure. FST outlier analysis showed that population differentiation at the two MHC loci was similar to the microsatellite loci. Conclusions MHC genes and microsatellite loci showed the same allelic richness pattern with the lowest genetic variation occurring in SNJ, suggesting that genetic drift played a prominent role in these isolated populations. As MHC genes are subject to selective pressures, the maintenance of genetic variation is of particular interest in small, long-isolated populations. The results of this study may contribute to captive breeding and translocation programs for endangered species. PMID:23083308

  20. A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

    PubMed Central

    Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A

    2006-01-01

    The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943

  1. Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature

    PubMed Central

    Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V.; Lawson, Andrew; Zeng, Jia; Johnson, Amber M.; Holla, Vijaykumar; Bailey, Ann M.; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W.

    2015-01-01

    Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org PMID:25858285

  2. Remediation of petroleum hydrocarbon-contaminated sites by DNA diagnosis-based bioslurping technology.

    PubMed

    Kim, Seungjin; Krajmalnik-Brown, Rosa; Kim, Jong-Oh; Chung, Jinwook

    2014-11-01

    The application of effective remediation technologies can benefit from adequate preliminary testing, such as in lab-scale and Pilot-scale systems. Bioremediation technologies have demonstrated tremendous potential with regards to cost, but they cannot be used for all contaminated sites due to limitations in biological activity. The purpose of this study was to develop a DNA diagnostic method that reduces the time to select contaminated sites that are good candidates for bioremediation. We applied an oligonucleotide microarray method to detect and monitor genes that lead to aliphatic and aromatic degradation. Further, the bioremediation of a contaminated site, selected based on the results of the genetic diagnostic method, was achieved successfully by applying bioslurping in field tests. This gene-based diagnostic technique is a powerful tool to evaluate the potential for bioremediation in petroleum hydrocarbon contaminated soil. Copyright © 2014 Elsevier B.V. All rights reserved.

  3. Examination of Global Methylation and Targeted Imprinted Genes in Prader-Willi Syndrome.

    PubMed

    Manzardo, A M; Butler, M G

    2016-01-01

    Methylation changes observed in Prader-Willi syndrome (PWS) may impact global methylation as well as regional methylation status of imprinted genes on chromosome 15 (in cis) or other imprinted obesity-related genes on other chromosomes (in trans) leading to differential effects on gene expression impacting obesity phenotype unique to (PWS). Characterize the global methylation profiles and methylation status for select imprinted genes associated with obesity phenotype in a well-characterized imprinted, obesity-related syndrome (PWS) relative to a cohort of obese and non-obese individuals. Global methylation was assayed using two methodologies: 1) enriched LINE-1 repeat sequences by EpigenDx and 2) ELISA-based immunoassay method sensitive to genomic 5-methylcytosine by Epigentek. Target gene methylation patterns at selected candidate obesity gene loci were determined using methylation-specific PCR. Study participants were recruited as part of an ongoing research program on obesity-related genomics and Prader-Willi syndrome. Individuals with non-syndromic obesity (N=26), leanness (N=26) and PWS (N=39). A detailed characterization of the imprinting status of select target genes within the critical PWS 15q11-q13 genomic region showed enhanced cis but not trans methylation of imprinted genes. No significant differences in global methylation were found between non-syndromic obese, PWS or non-obese controls. None. Percentage methylation and the methylation index. The methylation abnormality in PWS due to errors of genomic imprinting effects both upstream and downstream effectors in the 15q11-q13 region showing enhanced cis but not trans methylation of imprinted genes. Obesity in our subject cohorts did not appear to impact global methylation levels using the described methodology.

  4. Examination of Global Methylation and Targeted Imprinted Genes in Prader-Willi Syndrome

    PubMed Central

    Manzardo, AM; Butler, MG

    2016-01-01

    Context Methylation changes observed in Prader-Willi syndrome (PWS) may impact global methylation as well as regional methylation status of imprinted genes on chromosome 15 (in cis) or other imprinted obesity-related genes on other chromosomes (in trans) leading to differential effects on gene expression impacting obesity phenotype unique to (PWS). Objective Characterize the global methylation profiles and methylation status for select imprinted genes associated with obesity phenotype in a well-characterized imprinted, obesity-related syndrome (PWS) relative to a cohort of obese and non-obese individuals. Design Global methylation was assayed using two methodologies: 1) enriched LINE-1 repeat sequences by EpigenDx and 2) ELISA-based immunoassay method sensitive to genomic 5-methylcytosine by Epigentek. Target gene methylation patterns at selected candidate obesity gene loci were determined using methylation-specific PCR. Setting Study participants were recruited as part of an ongoing research program on obesity-related genomics and Prader-Willi syndrome. Participants Individuals with non-syndromic obesity (N=26), leanness (N=26) and PWS (N=39). Results A detailed characterization of the imprinting status of select target genes within the critical PWS 15q11-q13 genomic region showed enhanced cis but not trans methylation of imprinted genes. No significant differences in global methylation were found between non-syndromic obese, PWS or non-obese controls. Intervention None. Main outcome measures Percentage methylation and the methylation index. Conclusion The methylation abnormality in PWS due to errors of genomic imprinting effects both upstream and downstream effectors in the 15q11-q13 region showing enhanced cis but not trans methylation of imprinted genes. Obesity in our subject cohorts did not appear to impact global methylation levels using the described methodology. PMID:28111641

  5. A potential disruptive technology in vaccine development: gene-based vaccines and their application to infectious diseases.

    PubMed

    Kaslow, David C

    2004-10-01

    Vaccine development requires an amalgamation of disparate disciplines and has unique economic and regulatory drivers. Non-viral gene-based delivery systems, such as formulated plasmid DNA, are new and potentially disruptive technologies capable of providing 'cheaper, simpler, and more convenient-to-use' vaccines. Typically and somewhat ironically, disruptive technologies have poorer product performance, at least in the near-term, compared with the existing conventional technologies. Because successful product development requires that the product's performance must meet or exceed the efficacy threshold for a desired application, the appropriate selection of the initial product applications for a disruptive technology is critical for its successful evolution. In this regard, the near-term successes of gene-based vaccines will likely be for protection against bacterial toxins and acute viral and bacterial infections. Recent breakthroughs, however, herald increasing rather than languishing performance improvements in the efficacy of gene-based vaccines. Whether gene-based vaccines ultimately succeed in eliciting protective immunity in humans to persistent intracellular pathogens, such as HIV, malaria and tuberculosis, for which the conventional vaccine technologies have failed, remains to be determined. A success against any one of the persistent intracellular pathogens would be sufficient proof that gene-based vaccines represent a disruptive technology against which future vaccine technologies will be measured.

  6. Genomic Trajectories to Desiccation Resistance: Convergence and Divergence Among Replicate Selected Drosophila Lines

    PubMed Central

    Griffin, Philippa C.; Hangartner, Sandra B.; Fournier-Level, Alexandre; Hoffmann, Ary A.

    2017-01-01

    Adaptation to environmental stress is critical for long-term species persistence. With climate change and other anthropogenic stressors compounding natural selective pressures, understanding the nature of adaptation is as important as ever in evolutionary biology. In particular, the number of alternative molecular trajectories available for an organism to reach the same adaptive phenotype remains poorly understood. Here, we investigate this issue in a set of replicated Drosophila melanogaster lines selected for increased desiccation resistance—a classical physiological trait that has been closely linked to Drosophila species distributions. We used pooled whole-genome sequencing (Pool-Seq) to compare the genetic basis of their selection responses, using a matching set of replicated control lines for characterizing laboratory (lab-)adaptation, as well as the original base population. The ratio of effective population size to census size was high over the 21 generations of the experiment at 0.52–0.88 for all selected and control lines. While selected SNPs in replicates of the same treatment (desiccation-selection or lab-adaptation) tended to change frequency in the same direction, suggesting some commonality in the selection response, candidate SNP and gene lists often differed among replicates. Three of the five desiccation-selection replicates showed significant overlap at the gene and network level. All five replicates showed enrichment for ovary-expressed genes, suggesting maternal effects on the selected trait. Divergence between pairs of replicate lines for desiccation-candidate SNPs was greater than between pairs of control lines. This difference also far exceeded the divergence between pairs of replicate lines for neutral SNPs. Overall, while there was overlap in the direction of allele frequency changes and the network and functional categories affected by desiccation selection, replicates showed unique responses at all levels, likely reflecting hitchhiking effects, and highlighting the challenges in identifying candidate genes from these types of experiments when traits are likely to be polygenic. PMID:28007884

  7. Description of Drinking Water Bacterial Communities Using 16S rRNA Gene Sequence Analyses

    EPA Science Inventory

    Descriptions of bacterial communities inhabiting water distribution systems (WDS) have mainly been accomplished using culture-based approaches. Due to the inherent selective nature of culture-based approaches, the majority of bacteria inhabiting WDS remain uncharacterized. The go...

  8. A Model-Based Approach for Identifying Signatures of Ancient Balancing Selection in Genetic Data

    PubMed Central

    DeGiorgio, Michael; Lohmueller, Kirk E.; Nielsen, Rasmus

    2014-01-01

    While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. Here we develop two composite likelihood ratio tests for detecting balancing selection. Using simulations, we show that these methods outperform competing methods under a variety of assumptions and demographic models. We apply the new methods to whole-genome human data, and find a number of previously-identified loci with strong evidence of balancing selection, including several HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Thus, our methods are able to reproduce many previously-hypothesized signals of balancing selection, as well as discover novel interesting candidates. PMID:25144706

  9. A model-based approach for identifying signatures of ancient balancing selection in genetic data.

    PubMed

    DeGiorgio, Michael; Lohmueller, Kirk E; Nielsen, Rasmus

    2014-08-01

    While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. Here we develop two composite likelihood ratio tests for detecting balancing selection. Using simulations, we show that these methods outperform competing methods under a variety of assumptions and demographic models. We apply the new methods to whole-genome human data, and find a number of previously-identified loci with strong evidence of balancing selection, including several HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Thus, our methods are able to reproduce many previously-hypothesized signals of balancing selection, as well as discover novel interesting candidates.

  10. Selection and Validation of Appropriate Reference Genes for qRT-PCR Analysis in Isatis indigotica Fort.

    PubMed Central

    Li, Tao; Wang, Jing; Lu, Miao; Zhang, Tianyi; Qu, Xinyun; Wang, Zhezhi

    2017-01-01

    Due to its sensitivity and specificity, real-time quantitative PCR (qRT-PCR) is a popular technique for investigating gene expression levels in plants. Based on the Minimum Information for Publication of Real-Time Quantitative PCR Experiments (MIQE) guidelines, it is necessary to select and validate putative appropriate reference genes for qRT-PCR normalization. In the current study, three algorithms, geNorm, NormFinder, and BestKeeper, were applied to assess the expression stability of 10 candidate reference genes across five different tissues and three different abiotic stresses in Isatis indigotica Fort. Additionally, the IiYUC6 gene associated with IAA biosynthesis was applied to validate the candidate reference genes. The analysis results of the geNorm, NormFinder, and BestKeeper algorithms indicated certain differences for the different sample sets and different experiment conditions. Considering all of the algorithms, PP2A-4 and TUB4 were recommended as the most stable reference genes for total and different tissue samples, respectively. Moreover, RPL15 and PP2A-4 were considered to be the most suitable reference genes for abiotic stress treatments. The obtained experimental results might contribute to improved accuracy and credibility for the expression levels of target genes by qRT-PCR normalization in I. indigotica. PMID:28702046

  11. Genomics studies on musical aptitude, music perception, and practice.

    PubMed

    Järvelä, Irma

    2018-03-23

    When searching for genetic markers inherited together with musical aptitude, genes affecting inner ear development and brain function were identified. The alpha-synuclein gene (SNCA), located in the most significant linkage region of musical aptitude, was overexpressed when listening and performing music. The GATA-binding protein 2 gene (GATA2) was located in the best associated region of musical aptitude and regulates SNCA in dopaminergic neurons, thus linking DNA- and RNA-based studies of music-related traits together. In addition to SNCA, several other genes were linked to dopamine metabolism. Mutations in SNCA predispose to Lewy-body dementia and cause Parkinson disease in humans and affect song production in songbirds. Several other birdsong genes were found in transcriptome analysis, suggesting a common evolutionary background of sound perception and production in humans and songbirds. Regions of positive selection with musical aptitude contained genes affecting auditory perception, cognitive performance, memory, human language development, and song perception and production of songbirds. The data support the role of dopaminergic pathway and their link to the reward mechanism as a molecular determinant in positive selection of music. Integration of gene-level data from the literature across multiple species prioritized activity-dependent immediate early genes as candidate genes in musical aptitude and listening to and performing music. © 2018 New York Academy of Sciences.

  12. Prions are affected by evolution at two levels.

    PubMed

    Wickner, Reed B; Kelly, Amy C

    2016-03-01

    Prions, infectious proteins, can transmit diseases or be the basis of heritable traits (or both), mostly based on amyloid forms of the prion protein. A single protein sequence can be the basis for many prion strains/variants, with different biological properties based on different amyloid conformations, each rather stably propagating. Prions are unique in that evolution and selection work at both the level of the chromosomal gene encoding the protein, and on the prion itself selecting prion variants. Here, we summarize what is known about the evolution of prion proteins, both the genes and the prions themselves. We contrast the one known functional prion, [Het-s] of Podospora anserina, with the known disease prions, the yeast prions [PSI+] and [URE3] and the transmissible spongiform encephalopathies of mammals.

  13. Walnut (Juglans).

    PubMed

    Leslie, Charles A; Walawage, Sriema L; Uratsu, Sandra L; McGranahan, Gale; Dandekar, Abhaya M

    2015-01-01

    Walnut species are important nut and timber producers in temperate regions of Europe, Asia, South America, and North America. Trees can be impacted by Phytophthora, crown gall, nematodes, Armillaria, and cherry leaf roll virus; nuts can be severely damaged by codling moth, husk fly, and Xanthomonas blight. The long generation time of walnuts and an absence of identified natural resistance for most of these problems suggest biotechnological approaches to crop improvement. Described here is a somatic embryo-based transformation protocol that has been used to successfully insert horticulturally useful traits into walnut. Selection is based on the combined use of the selectable neomycin phosphotransferase (nptII) gene and the scorable uidA gene. Transformed embryos can be germinated or micropropagated and rooted for plant production. The method described has been used to establish field trials of mature trees.

  14. Lymphocyte signaling: beyond knockouts.

    PubMed

    Saveliev, Alexander; Tybulewicz, Victor L J

    2009-04-01

    The analysis of lymphocyte signaling was greatly enhanced by the advent of gene targeting, which allows the selective inactivation of a single gene. Although this gene 'knockout' approach is often informative, in many cases, the phenotype resulting from gene ablation might not provide a complete picture of the function of the corresponding protein. If a protein has multiple functions within a single or several signaling pathways, or stabilizes other proteins in a complex, the phenotypic consequences of a gene knockout may manifest as a combination of several different perturbations. In these cases, gene targeting to 'knock in' subtle point mutations might provide more accurate insight into protein function. However, to be informative, such mutations must be carefully based on structural and biophysical data.

  15. Recursive regularization for inferring gene networks from time-course gene expression profiles

    PubMed Central

    Shimamura, Teppei; Imoto, Seiya; Yamaguchi, Rui; Fujita, André; Nagasaki, Masao; Miyano, Satoru

    2009-01-01

    Background Inferring gene networks from time-course microarray experiments with vector autoregressive (VAR) model is the process of identifying functional associations between genes through multivariate time series. This problem can be cast as a variable selection problem in Statistics. One of the promising methods for variable selection is the elastic net proposed by Zou and Hastie (2005). However, VAR modeling with the elastic net succeeds in increasing the number of true positives while it also results in increasing the number of false positives. Results By incorporating relative importance of the VAR coefficients into the elastic net, we propose a new class of regularization, called recursive elastic net, to increase the capability of the elastic net and estimate gene networks based on the VAR model. The recursive elastic net can reduce the number of false positives gradually by updating the importance. Numerical simulations and comparisons demonstrate that the proposed method succeeds in reducing the number of false positives drastically while keeping the high number of true positives in the network inference and achieves two or more times higher true discovery rate (the proportion of true positives among the selected edges) than the competing methods even when the number of time points is small. We also compared our method with various reverse-engineering algorithms on experimental data of MCF-7 breast cancer cells stimulated with two ErbB ligands, EGF and HRG. Conclusion The recursive elastic net is a powerful tool for inferring gene networks from time-course gene expression profiles. PMID:19386091

  16. Differential gene expression of two extreme honey bee (Apis mellifera) colonies showing varroa tolerance and susceptibility.

    PubMed

    Jiang, S; Robertson, T; Mostajeran, M; Robertson, A J; Qiu, X

    2016-06-01

    Varroa destructor, an ectoparasitic mite of honey bees (Apis mellifera), is the most serious pest threatening the apiculture industry. In our honey bee breeding programme, two honey bee colonies showing extreme phenotypes for varroa tolerance/resistance (S88) and susceptibility (G4) were identified by natural selection from a large gene pool over a 6-year period. To investigate potential defence mechanisms for honey bee tolerance to varroa infestation, we employed DNA microarray and real time quantitative (PCR) analyses to identify differentially expressed genes in the tolerant and susceptible colonies at pupa and adult stages. Our results showed that more differentially expressed genes were identified in the tolerant bees than in bees from the susceptible colony, indicating that the tolerant colony showed an increased genetic capacity to respond to varroa mite infestation. In both colonies, there were more differentially expressed genes identified at the pupa stage than at the adult stage, indicating that pupa bees are more responsive to varroa infestation than adult bees. Genes showing differential expression in the colony phenotypes were categorized into several groups based on their molecular functions, such as olfactory signalling, detoxification processes, exoskeleton formation, protein degradation and long-chain fatty acid metabolism, suggesting that these biological processes play roles in conferring varroa tolerance to naturally selected colonies. Identification of differentially expressed genes between the two colony phenotypes provides potential molecular markers for selecting and breeding varroa-tolerant honey bees. © 2016 The Royal Entomological Society.

  17. Transcriptome Profiling of Selectively Bred Pacific Oyster Crassostrea gigas Families that Differ in Tolerance of Heat Shock

    PubMed Central

    Bayne, Christopher J.; Camara, Mark D.; Cunningham, Charles; Jenny, Matthew J.; Langdon, Christopher J.

    2010-01-01

    Sessile inhabitants of marine intertidal environments commonly face heat stress, an important component of summer mortality syndrome in the Pacific oyster Crassostrea gigas. Marker-aided selection programs would be useful for developing oyster strains that resist summer mortality; however, there is currently a need to identify candidate genes associated with stress tolerance and to develop molecular markers associated with those genes. To identify candidate genes for further study, we used cDNA microarrays to test the hypothesis that oyster families that had high (>64%) or low (<29%) survival of heat shock (43°C, 1 h) differ in their transcriptional responses to stress. Based upon data generated by the microarray and by real-time quantitative PCR, we found that transcription after heat shock increased for genes putatively encoding heat shock proteins and genes for proteins that synthesize lipids, protect against bacterial infection, and regulate spawning, whereas transcription decreased for genes for proteins that mobilize lipids and detoxify reactive oxygen species. RNAs putatively identified as heat shock protein 27, collagen, peroxinectin, S-crystallin, and two genes with no match in Genbank had higher transcript concentrations in low-surviving families than in high-surviving families, whereas concentration of putative cystatin B mRNA was greater in high-surviving families. These ESTs should be studied further for use in marker-aided selection programs. Low survival of heat shock could result from a complex interaction of cell damage, opportunistic infection, and metabolic exhaustion. PMID:19205802

  18. Selection and evaluation of reference genes for expression studies with quantitative PCR in the model fungus Neurospora crassa under different environmental conditions in continuous culture.

    PubMed

    Cusick, Kathleen D; Fitzgerald, Lisa A; Pirlo, Russell K; Cockrell, Allison L; Petersen, Emily R; Biffinger, Justin C

    2014-01-01

    Neurospora crassa has served as a model organism for studying circadian pathways and more recently has gained attention in the biofuel industry due to its enhanced capacity for cellulase production. However, in order to optimize N. crassa for biotechnological applications, metabolic pathways during growth under different environmental conditions must be addressed. Reverse-transcription quantitative PCR (RT-qPCR) is a technique that provides a high-throughput platform from which to measure the expression of a large set of genes over time. The selection of a suitable reference gene is critical for gene expression studies using relative quantification, as this strategy is based on normalization of target gene expression to a reference gene whose expression is stable under the experimental conditions. This study evaluated twelve candidate reference genes for use with N. crassa when grown in continuous culture bioreactors under different light and temperature conditions. Based on combined stability values from NormFinder and Best Keeper software packages, the following are the most appropriate reference genes under conditions of: (1) light/dark cycling: btl, asl, and vma1; (2) all-dark growth: btl, tbp, vma1, and vma2; (3) temperature flux: btl, vma1, act, and asl; (4) all conditions combined: vma1, vma2, tbp, and btl. Since N. crassa exists as different cell types (uni- or multi-nucleated), expression changes in a subset of the candidate genes was further assessed using absolute quantification. A strong negative correlation was found to exist between ratio and threshold cycle (CT) values, demonstrating that CT changes serve as a reliable reflection of transcript, and not gene copy number, fluctuations. The results of this study identified genes that are appropriate for use as reference genes in RT-qPCR studies with N. crassa and demonstrated that even with the presence of different cell types, relative quantification is an acceptable method for measuring gene expression changes during growth in bioreactors.

  19. Comprehensive characterization of glutamine synthetase-mediated selection for the establishment of recombinant CHO cells producing monoclonal antibodies.

    PubMed

    Noh, Soo Min; Shin, Seunghyeon; Lee, Gyun Min

    2018-03-29

    To characterize a glutamine synthetase (GS)-based selection system, monoclonal antibody (mAb) producing recombinant CHO cell clones were generated by a single round of selection at various methionine sulfoximine (MSX) concentrations (0, 25, and 50 μM) using two different host cell lines (CHO-K1 and GS-knockout CHO). Regardless of the host cell lines used, the clones selected at 50 μM MSX had the lowest average specific growth rate and the highest average specific production rates of toxic metabolic wastes, lactate and ammonia. Unlike CHO-K1, high producing clones could be generated in the absence of MSX using GS-knockout CHO with an improved selection stringency. Regardless of the host cell lines used, the clones selected at various MSX concentrations showed no significant difference in the GS, heavy chain, and light chain gene copies (P > 0.05). Furthermore, there was no correlation between the specific mAb productivity and these three gene copies (R 2  ≤ 0.012). Taken together, GS-mediated gene amplification does not occur in a single round of selection at a MSX concentration up to 50 μM. The use of the GS-knockout CHO host cell line facilitates the rapid generation of high producing clones with reduced production of lactate and ammonia in the absence of MSX.

  20. Selection of reference genes for gene expression studies in virus-infected monocots using quantitative real-time PCR.

    PubMed

    Zhang, Kun; Niu, Shaofang; Di, Dianping; Shi, Lindan; Liu, Deshui; Cao, Xiuling; Miao, Hongqin; Wang, Xianbing; Han, Chenggui; Yu, Jialin; Li, Dawei; Zhang, Yongliang

    2013-10-10

    Both genome-wide transcriptomic surveys of the mRNA expression profiles and virus-induced gene silencing-based molecular studies of target gene during virus-plant interaction involve the precise estimation of the transcript abundance. Quantitative real-time PCR (qPCR) is the most widely adopted technique for mRNA quantification. In order to obtain reliable quantification of transcripts, identification of the best reference genes forms the basis of the preliminary work. Nevertheless, the stability of internal controls in virus-infected monocots needs to be fully explored. In this work, the suitability of ten housekeeping genes (ACT, EF1α, FBOX, GAPDH, GTPB, PP2A, SAND, TUBβ, UBC18 and UK) for potential use as reference genes in qPCR were investigated in five different monocot plants (Brachypodium, barley, sorghum, wheat and maize) under infection with different viruses including Barley stripe mosaic virus (BSMV), Brome mosaic virus (BMV), Rice black-streaked dwarf virus (RBSDV) and Sugarcane mosaic virus (SCMV). By using three different algorithms, the most appropriate reference genes or their combinations were identified for different experimental sets and their effectiveness for the normalisation of expression studies were further validated by quantitative analysis of a well-studied PR-1 gene. These results facilitate the selection of desirable reference genes for more accurate gene expression studies in virus-infected monocots. Copyright © 2013 Elsevier B.V. All rights reserved.

  1. Mobile genes in the human microbiome are structured from global to individual scales

    PubMed Central

    Brito, IL; Jupiter, SD; Jenkins, AP; Naisilisili, W; Tamminen, M; Smillie, CS; Wortman, JR; Birren, BW; Xavier, RJ; Blainey, PC; Singh, AK; Gevers, D; Alm, EJ

    2016-01-01

    Recent work has underscored the importance of the microbiome in human health, largely attributing differences in phenotype to differences in the species present across individuals1,2,3,4,5. But mobile genes can confer profoundly different phenotypes on different strains of the same species. Little is known about the function and distribution of mobile genes in the human microbiome, and in particular whether the gene pool is globally homogenous or constrained by human population structure. Here, we investigate this question by comparing the mobile genes found in the microbiomes of 81 metropolitan North Americans with that of 172 agrarian Fiji islanders using a combination of single-cell genomics and metagenomics. We find large differences in mobile gene content between the Fijian and North American microbiomes, with functional variation that mirrors known dietary differences such as the excess of plant-based starch degradation genes. Remarkably, differences are also observed between the mobile gene pools of proximal Fijian villages, even though microbiome composition across villages is similar. Finally, we observe high rates of recombination leading to individual-specific mobile elements, suggesting that the abundance of some genes may reflect environmental selection rather than dispersal limitation. Together, these data support the hypothesis that human activities and behaviors provide selective pressures that shape mobile gene pools, and that acquisition of mobile genes is important to colonizing specific human populations. PMID:27409808

  2. Two-Way Gene Interaction From Microarray Data Based on Correlation Methods

    PubMed Central

    Alavi Majd, Hamid; Talebi, Atefeh; Gilany, Kambiz; Khayyer, Nasibeh

    2016-01-01

    Background Gene networks have generated a massive explosion in the development of high-throughput techniques for monitoring various aspects of gene activity. Networks offer a natural way to model interactions between genes, and extracting gene network information from high-throughput genomic data is an important and difficult task. Objectives The purpose of this study is to construct a two-way gene network based on parametric and nonparametric correlation coefficients. The first step in constructing a Gene Co-expression Network is to score all pairs of gene vectors. The second step is to select a score threshold and connect all gene pairs whose scores exceed this value. Materials and Methods In the foundation-application study, we constructed two-way gene networks using nonparametric methods, such as Spearman’s rank correlation coefficient and Blomqvist’s measure, and compared them with Pearson’s correlation coefficient. We surveyed six genes of venous thrombosis disease, made a matrix entry representing the score for the corresponding gene pair, and obtained two-way interactions using Pearson’s correlation, Spearman’s rank correlation, and Blomqvist’s coefficient. Finally, these methods were compared with Cytoscape, based on BIND, and Gene Ontology, based on molecular function visual methods; R software version 3.2 and Bioconductor were used to perform these methods. Results Based on the Pearson and Spearman correlations, the results were the same and were confirmed by Cytoscape and GO visual methods; however, Blomqvist’s coefficient was not confirmed by visual methods. Conclusions Some results of the correlation coefficients are not the same with visualization. The reason may be due to the small number of data. PMID:27621916

  3. Molecular assays in detecting EGFR gene aberrations: an updated HER2-dependent algorithm for interpreting gene signals; a short technical report.

    PubMed

    Tsiambas, Evangelos; Ragos, Vasileios; Lefas, Alicia Y; Georgiannos, Stavros N; Rigopoulos, Dimitrios N; Georgakopoulos, Georgios; Stamatelopoulos, Athanasios; Grapsa, Dimitra; Syrigos, Konstantinos

    2016-01-01

    Purpose: Among oncogenes that have already been identified and cloned, Epidermal Growth Factor Receptor (EGFR) remains one of the most significant. Understanding its deregulation mechanisms improves critically patients' selection for personalized therapies based on modern molecular biology and oncology guidelines. Anti-EGFR targeted therapeutic strategies have been developed based on specific genetic profiles and applied in subgroups of patients suffering by solid cancers of different histogenetic origin. Detection of specific EGFR somatic mutations leads to tyrosine kinase inhibitors (TKIs) application in subsets of them. Concerning EGFR gene numerical imbalances, identification of pure gene amplification is critical for targeting the molecule via monoclonal antibodies (mAbs). In the current technical paper we demonstrate the main molecular methods applied in EGFR analyses focused also on new data in interpreting numerical imbalances based on ASCO/ACAP guidelines for HER2 in situ hybridization (ISH) clarifications.

  4. Selection system and co-cultivation medium are important determinants of Agrobacterium-mediated transformation of sugarcane.

    PubMed

    Joyce, Priya; Kuwahata, Melissa; Turner, Nicole; Lakshmanan, Prakash

    2010-02-01

    A reproducible method for transformation of sugarcane using various strains of Agrobacterium tumefaciens (A. tumefaciens) (AGL0, AGL1, EHA105 and LBA4404) has been developed. The selection system and co-cultivation medium were the most important factors determining the success of transformation and transgenic plant regeneration. Plant regeneration at a frequency of 0.8-4.8% occurred only when callus was transformed with A. tumefaciens carrying a newly constructed superbinary plasmid containing neomycin phosphotransferase (nptII) and beta-glucuronidase (gusA) genes, both driven by the maize ubiquitin (ubi-1) promoter. Regeneration was successful in plants carrying the nptII gene but not the hygromycin phosphotransferase (hph) gene. NptII gene selection was imposed at a concentration of 150 mg/l paromomycin sulphate and applied either immediately or 4 days after the co-cultivation period. Co-cultivation on Murashige and Skoog (MS)-based medium for a period of 4 days produced the highest number of transgenic plants. Over 200 independent transgenic lines were created using this protocol. Regenerated plants appeared phenotypically normal and contained both gusA and nptII genes. Southern blot analysis revealed 1-3 transgene insertion events that were randomly integrated in the majority of the plants produced.

  5. Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts.

    PubMed

    Dashtban, M; Balafar, Mohammadali

    2017-03-01

    Gene selection is a demanding task for microarray data analysis. The diverse complexity of different cancers makes this issue still challenging. In this study, a novel evolutionary method based on genetic algorithms and artificial intelligence is proposed to identify predictive genes for cancer classification. A filter method was first applied to reduce the dimensionality of feature space followed by employing an integer-coded genetic algorithm with dynamic-length genotype, intelligent parameter settings, and modified operators. The algorithmic behaviors including convergence trends, mutation and crossover rate changes, and running time were studied, conceptually discussed, and shown to be coherent with literature findings. Two well-known filter methods, Laplacian and Fisher score, were examined considering similarities, the quality of selected genes, and their influences on the evolutionary approach. Several statistical tests concerning choice of classifier, choice of dataset, and choice of filter method were performed, and they revealed some significant differences between the performance of different classifiers and filter methods over datasets. The proposed method was benchmarked upon five popular high-dimensional cancer datasets; for each, top explored genes were reported. Comparing the experimental results with several state-of-the-art methods revealed that the proposed method outperforms previous methods in DLBCL dataset. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information.

    PubMed

    Safo, Sandra E; Li, Shuzhao; Long, Qi

    2018-03-01

    Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach. © 2017, The International Biometric Society.

  7. Controversial opinion: evaluation of EGR1 and LAMA2 loci for high myopia in Chinese populations.

    PubMed

    Lin, Fang-yu; Huang, Zhu; Lu, Ning; Chen, Wei; Fang, Hui; Han, Wei

    2016-03-01

    Functional studies have suggested the important role of early growth response 1 (EGR1) and Laminin α2-chain (LAMA2) in human eye development. Genetic studies have reported a significant association of the single nucleotide polymorphism (SNP) in the LAMA2 gene with myopia. This study aimed to evaluate the association of the tagging SNPs (tSNPs) in the EGR1 and LAMA2 genes with high myopia in two independent Han Chinese populations. Four tSNPs (rs11743810 in the EGR1 gene; rs2571575, rs9321170, and rs1889891 in the LAMA2 gene) were selected, according to the HapMap database (http://hapmap.ncbi.nlm.nih.gov), and were genotyped using the ligase detection reaction (LDR) approach for 167 Han Chinese nuclear families with extremely highly myopic offspring (<-10.0 diopters) and an independent group with 485 extremely highly myopic cases (<-10.0 diopters) and 499 controls. Direct sequencing was used to confirm the LDR results in twenty randomly selected subjects. Family-based association analysis was performed using the family-based association test (FBAT) software package (Version 1.5.5). Population-based association analysis was performed using the Chi-square test. The association analysis power was estimated using online software (http://design.cs.ucla.edu). The FBAT demonstrated that all four tSNPs tested did not show association with high myopia (P>0.05). Haplotype analysis of tSNPs in the LAMA2 genes also did not show a significant association (P>0.05). Meanwhile, population-based association analysis also showed no significant association results with high myopia (P>0.05). On the basis of our family- and population-based analyses for the Han Chinese population, we did not find positive association signals of the four SNPs in the LAMA2 and EGR1 genes with high myopia.

  8. Application of site and haplotype-frequency based approaches for detecting selection signatures in cattle

    PubMed Central

    2011-01-01

    Background 'Selection signatures' delimit regions of the genome that are, or have been, functionally important and have therefore been under either natural or artificial selection. In this study, two different and complementary methods--integrated Haplotype Homozygosity Score (|iHS|) and population differentiation index (FST)--were applied to identify traces of decades of intensive artificial selection for traits of economic importance in modern cattle. Results We scanned the genome of a diverse set of dairy and beef breeds from Germany, Canada and Australia genotyped with a 50 K SNP panel. Across breeds, a total of 109 extreme |iHS| values exceeded the empirical threshold level of 5% with 19, 27, 9, 10 and 17 outliers in Holstein, Brown Swiss, Australian Angus, Hereford and Simmental, respectively. Annotating the regions harboring clustered |iHS| signals revealed a panel of interesting candidate genes like SPATA17, MGAT1, PGRMC2 and ACTC1, COL23A1, MATN2, respectively, in the context of reproduction and muscle formation. In a further step, a new Bayesian FST-based approach was applied with a set of geographically separated populations including Holstein, Brown Swiss, Simmental, North American Angus and Piedmontese for detecting differentiated loci. In total, 127 regions exceeding the 2.5 per cent threshold of the empirical posterior distribution were identified as extremely differentiated. In a substantial number (56 out of 127 cases) the extreme FST values were found to be positioned in poor gene content regions which deviated significantly (p < 0.05) from the expectation assuming a random distribution. However, significant FST values were found in regions of some relevant genes such as SMCP and FGF1. Conclusions Overall, 236 regions putatively subject to recent positive selection in the cattle genome were detected. Both |iHS| and FST suggested selection in the vicinity of the Sialic acid binding Ig-like lectin 5 gene on BTA18. This region was recently reported to be a major QTL with strong effects on productive life and fertility traits in Holstein cattle. We conclude that high-resolution genome scans of selection signatures can be used to identify genomic regions contributing to within- and inter-breed phenotypic variation. PMID:21679429

  9. Integrative Bayesian variable selection with gene-based informative priors for genome-wide association studies.

    PubMed

    Zhang, Xiaoshuai; Xue, Fuzhong; Liu, Hong; Zhu, Dianwen; Peng, Bin; Wiemels, Joseph L; Yang, Xiaowei

    2014-12-10

    Genome-wide Association Studies (GWAS) are typically designed to identify phenotype-associated single nucleotide polymorphisms (SNPs) individually using univariate analysis methods. Though providing valuable insights into genetic risks of common diseases, the genetic variants identified by GWAS generally account for only a small proportion of the total heritability for complex diseases. To solve this "missing heritability" problem, we implemented a strategy called integrative Bayesian Variable Selection (iBVS), which is based on a hierarchical model that incorporates an informative prior by considering the gene interrelationship as a network. It was applied here to both simulated and real data sets. Simulation studies indicated that the iBVS method was advantageous in its performance with highest AUC in both variable selection and outcome prediction, when compared to Stepwise and LASSO based strategies. In an analysis of a leprosy case-control study, iBVS selected 94 SNPs as predictors, while LASSO selected 100 SNPs. The Stepwise regression yielded a more parsimonious model with only 3 SNPs. The prediction results demonstrated that the iBVS method had comparable performance with that of LASSO, but better than Stepwise strategies. The proposed iBVS strategy is a novel and valid method for Genome-wide Association Studies, with the additional advantage in that it produces more interpretable posterior probabilities for each variable unlike LASSO and other penalized regression methods.

  10. Selective pressures for accurate altruism targeting: evidence from digital evolution for difficult-to-test aspects of inclusive fitness theory.

    PubMed

    Clune, Jeff; Goldsby, Heather J; Ofria, Charles; Pennock, Robert T

    2011-03-07

    Inclusive fitness theory predicts that natural selection will favour altruist genes that are more accurate in targeting altruism only to copies of themselves. In this paper, we provide evidence from digital evolution in support of this prediction by competing multiple altruist-targeting mechanisms that vary in their accuracy in determining whether a potential target for altruism carries a copy of the altruist gene. We compete altruism-targeting mechanisms based on (i) kinship (kin targeting), (ii) genetic similarity at a level greater than that expected of kin (similarity targeting), and (iii) perfect knowledge of the presence of an altruist gene (green beard targeting). Natural selection always favoured the most accurate targeting mechanism available. Our investigations also revealed that evolution did not increase the altruism level when all green beard altruists used the same phenotypic marker. The green beard altruism levels stably increased only when mutations that changed the altruism level also changed the marker (e.g. beard colour), such that beard colour reliably indicated the altruism level. For kin- and similarity-targeting mechanisms, we found that evolution was able to stably adjust altruism levels. Our results confirm that natural selection favours altruist genes that are increasingly accurate in targeting altruism to only their copies. Our work also emphasizes that the concept of targeting accuracy must include both the presence of an altruist gene and the level of altruism it produces.

  11. Mutation analysis in 129 genes associated with other forms of retinal dystrophy in 157 families with retinitis pigmentosa based on exome sequencing.

    PubMed

    Xu, Yan; Guan, Liping; Xiao, Xueshan; Zhang, Jianguo; Li, Shiqiang; Jiang, Hui; Jia, Xiaoyun; Yang, Jianhua; Guo, Xiangming; Yin, Ye; Wang, Jun; Zhang, Qingjiong

    2015-01-01

    Mutations in 60 known genes were previously identified by exome sequencing in 79 of 157 families with retinitis pigmentosa (RP). This study analyzed variants in 129 genes associated with other forms of hereditary retinal dystrophy in the same cohort. Apart from the 73 genes previously analyzed, a further 129 genes responsible for other forms of hereditary retinal dystrophy were selected based on RetNet. Variants in the 129 genes determined by whole exome sequencing were selected and filtered by bioinformatics analysis. Candidate variants were confirmed by Sanger sequencing and validated by analysis of available family members and controls. A total of 90 candidate variants were present in the 129 genes. Sanger sequencing confirmed 83 of the 90 variants. Analysis of family members and controls excluded 76 of these 83 variants. The remaining seven variants were considered to be potential pathogenic mutations; these were c.899A>G, c.1814C>G, and c.2107C>T in BBS2; c.1073C>T and c.1669C>T in INPP5E; and c.3582C>G and c.5704-5C>G in CACNA1F. Six of these seven mutations were novel. The mutations were detected in five unrelated patients without a family history, including three patients with homozygous or compound heterozygous mutations in BBS2 and INPP5E, and two patients with hemizygous mutations in CACNA1F. None of the patients had mutations in the genes associated with autosome dominant retinal dystrophy. Only a small portion of patients with RP, about 3% (5/157), had causative mutations in the 129 genes associated with other forms of hereditary retinal dystrophy.

  12. Clinical and molecular characterization of a novel INS mutation identified in patients with MODY phenotype.

    PubMed

    Piccini, Barbara; Artuso, Rosangela; Lenzi, Lorenzo; Guasti, Monica; Braccesi, Giulia; Barni, Federica; Casalini, Emilio; Giglio, Sabrina; Toni, Sonia

    2016-11-01

    Correct diagnosis of Maturity-Onset Diabetes of the Young (MODY) is based on genetic tests requiring an appropriate subject selection by clinicians. Mutations in the insulin (INS) gene rarely occur in patients with MODY. This study is aimed at determining the genetic background and clinical phenotype in patients with suspected MODY. 34 patients with suspected MODY, negative for mutations in the GCK, HNF1α, HNF4α, HNF1β and PDX1 genes, were screened by next generation sequencing (NGS). A heterozygous INS mutation was identified in 4 members of the same family. First genetic tests performed identified two heterozygous silent nucleotide substitutions in MODY3/HNF1α gene. An ineffective attempt to suspend insulin therapy, administering repaglinide and sulphonylureas, was made. DNA was re-sequenced by NGS investigating a set of 102 genes. Genes implicated in the pathway of pancreatic β-cells, candidate genes for type 2 diabetes mellitus and genes causative of diabetes in mice were selected. A novel heterozygous variant in human preproinsulin INS gene (c.125T > C) was found in the affected family members. The new INS mutation broadens the spectrum of possible INS phenotypes. Screening for INS mutations is warranted not only in neonatal diabetes but also in MODYx patients and in selected patients with type 1 diabetes mellitus negative for autoantibodies. Subjects with complex diseases without a specific phenotype should be studied by NGS because Sanger sequencing is ineffective and time consuming in detecting rare variants. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  13. 40 CFR 158.2110 - Microbial pesticides data requirements.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ...: genetic engineering techniques used; the identity of the inserted or deleted gene segment (base sequence... evaluate genetic stability and exchange; and selected Tier II environmental expression and toxicology tests. ...

  14. 40 CFR 158.2110 - Microbial pesticides data requirements.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ...: genetic engineering techniques used; the identity of the inserted or deleted gene segment (base sequence... evaluate genetic stability and exchange; and selected Tier II environmental expression and toxicology tests. ...

  15. 40 CFR 158.2110 - Microbial pesticides data requirements.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ...: genetic engineering techniques used; the identity of the inserted or deleted gene segment (base sequence... evaluate genetic stability and exchange; and selected Tier II environmental expression and toxicology tests. ...

  16. 40 CFR 158.2110 - Microbial pesticides data requirements.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ...: genetic engineering techniques used; the identity of the inserted or deleted gene segment (base sequence... evaluate genetic stability and exchange; and selected Tier II environmental expression and toxicology tests. ...

  17. Positive and relaxed selection associated with flight evolution and loss in insect transcriptomes

    PubMed Central

    Mitterboeck, T. Fatima; Liu, Shanlin; Adamowicz, Sarah J.; Fu, Jinzhong; Zhang, Rui; Song, Wenhui; Meusemann, Karen

    2017-01-01

    Abstract The evolution of powered flight is a major innovation that has facilitated the success of insects. Previously, studies of birds, bats, and insects have detected molecular signatures of differing selection regimes in energy-related genes associated with flight evolution and/or loss. Here, using DNA sequences from more than 1000 nuclear and mitochondrial protein-coding genes obtained from insect transcriptomes, we conduct a broader exploration of which gene categories display positive and relaxed selection at the origin of flight as well as with multiple independent losses of flight. We detected a number of categories of nuclear genes more often under positive selection in the lineage leading to the winged insects (Pterygota), related to catabolic processes such as proteases, as well as splicing-related genes. Flight loss was associated with relaxed selection signatures in splicing genes, mirroring the results for flight evolution. Similar to previous studies of flight loss in various animal taxa, we observed consistently higher nonsynonymous-to-synonymous substitution ratios in mitochondrial genes of flightless lineages, indicative of relaxed selection in energy-related genes. While oxidative phosphorylation genes were not detected as being under selection with the origin of flight specifically, they were most often detected as being under positive selection in holometabolous (complete metamorphosis) insects as compared with other insect lineages. This study supports some convergence in gene-specific selection pressures associated with flight ability, and the exploratory analysis provided some new insights into gene categories potentially associated with the gain and loss of flight in insects. PMID:29020740

  18. Positive and relaxed selection associated with flight evolution and loss in insect transcriptomes.

    PubMed

    Mitterboeck, T Fatima; Liu, Shanlin; Adamowicz, Sarah J; Fu, Jinzhong; Zhang, Rui; Song, Wenhui; Meusemann, Karen; Zhou, Xin

    2017-10-01

    The evolution of powered flight is a major innovation that has facilitated the success of insects. Previously, studies of birds, bats, and insects have detected molecular signatures of differing selection regimes in energy-related genes associated with flight evolution and/or loss. Here, using DNA sequences from more than 1000 nuclear and mitochondrial protein-coding genes obtained from insect transcriptomes, we conduct a broader exploration of which gene categories display positive and relaxed selection at the origin of flight as well as with multiple independent losses of flight. We detected a number of categories of nuclear genes more often under positive selection in the lineage leading to the winged insects (Pterygota), related to catabolic processes such as proteases, as well as splicing-related genes. Flight loss was associated with relaxed selection signatures in splicing genes, mirroring the results for flight evolution. Similar to previous studies of flight loss in various animal taxa, we observed consistently higher nonsynonymous-to-synonymous substitution ratios in mitochondrial genes of flightless lineages, indicative of relaxed selection in energy-related genes. While oxidative phosphorylation genes were not detected as being under selection with the origin of flight specifically, they were most often detected as being under positive selection in holometabolous (complete metamorphosis) insects as compared with other insect lineages. This study supports some convergence in gene-specific selection pressures associated with flight ability, and the exploratory analysis provided some new insights into gene categories potentially associated with the gain and loss of flight in insects. © The Authors 2017. Published by Oxford University Press.

  19. JCoDA: a tool for detecting evolutionary selection.

    PubMed

    Steinway, Steven N; Dannenfelser, Ruth; Laucius, Christopher D; Hayes, James E; Nayak, Sudhir

    2010-05-27

    The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda.

  20. JCoDA: a tool for detecting evolutionary selection

    PubMed Central

    2010-01-01

    Background The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. Results JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. Conclusions JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda. PMID:20507581

  1. Genetic basis of interindividual susceptibility to cancer cachexia: selection of potential candidate gene polymorphisms for association studies.

    PubMed

    Johns, N; Tan, B H; MacMillan, M; Solheim, T S; Ross, J A; Baracos, V E; Damaraju, S; Fearon, K C H

    2014-12-01

    Cancer cachexia is a complex and multifactorial disease. Evolving definitions highlight the fact that a diverse range of biological processes contribute to cancer cachexia. Part of the variation in who will and who will not develop cancer cachexia may be genetically determined. As new definitions, classifications and biological targets continue to evolve, there is a need for reappraisal of the literature for future candidate association studies. This review summarizes genes identified or implicated as well as putative candidate genes contributing to cachexia, identified through diverse technology platforms and model systems to further guide association studies. A systematic search covering 1986-2012 was performed for potential candidate genes / genetic polymorphisms relating to cancer cachexia. All candidate genes were reviewed for functional polymorphisms or clinically significant polymorphisms associated with cachexia using the OMIM and GeneRIF databases. Pathway analysis software was used to reveal possible network associations between genes. Functionality of SNPs/genes was explored based on published literature, algorithms for detecting putative deleterious SNPs and interrogating the database for expression of quantitative trait loci (eQTLs). A total of 154 genes associated with cancer cachexia were identified and explored for functional polymorphisms. Of these 154 genes, 119 had a combined total of 281 polymorphisms with functional and/or clinical significance in terms of cachexia associated with them. Of these, 80 polymorphisms (in 51 genes) were replicated in more than one study with 24 polymorphisms found to influence two or more hallmarks of cachexia (i.e., inflammation, loss of fat mass and/or lean mass and reduced survival). Selection of candidate genes and polymorphisms is a key element of multigene study design. The present study provides a contemporary basis to select genes and/or polymorphisms for further association studies in cancer cachexia, and to develop their potential as susceptibility biomarkers of cachexia.

  2. An efficient platform for genetic selection and screening of gene switches in Escherichia coli

    PubMed Central

    Muranaka, Norihito; Sharma, Vandana; Nomura, Yoko; Yokobayashi, Yohei

    2009-01-01

    Engineered gene switches and circuits that can sense various biochemical and physical signals, perform computation, and produce predictable outputs are expected to greatly advance our ability to program complex cellular behaviors. However, rational design of gene switches and circuits that function in living cells is challenging due to the complex intracellular milieu. Consequently, most successful designs of gene switches and circuits have relied, to some extent, on high-throughput screening and/or selection from combinatorial libraries of gene switch and circuit variants. In this study, we describe a generic and efficient platform for selection and screening of gene switches and circuits in Escherichia coli from large libraries. The single-gene dual selection marker tetA was translationally fused to green fluorescent protein (gfpuv) via a flexible peptide linker and used as a dual selection and screening marker for laboratory evolution of gene switches. Single-cycle (sequential positive and negative selections) enrichment efficiencies of >7000 were observed in mock selections of model libraries containing functional riboswitches in liquid culture. The technique was applied to optimize various parameters affecting the selection outcome, and to isolate novel thiamine pyrophosphate riboswitches from a complex library. Artificial riboswitches with excellent characteristics were isolated that exhibit up to 58-fold activation as measured by fluorescent reporter gene assay. PMID:19190095

  3. Selection and validation of reference genes for quantitative real-time PCR in Artemisia sphaerocephala based on transcriptome sequence data.

    PubMed

    Hu, Xiaowei; Zhang, Lijing; Nan, Shuzhen; Miao, Xiumei; Yang, Pengfang; Duan, Guoqin; Fu, Hua

    2018-05-30

    Artemisia sphaerocephala, a dicotyledonous perennial semi-shrub belonging to the Artemisia genus of the Compositae family, is widely distributed in northwestern China. This shrub is one of the most important pioneer plants which is capable of protecting rangelands from wind erosion. It therefore plays a vital role in maintaining desert ecosystem stability. In addition, to its use as a forage grass, it has excellent prospective applications as a source of plant oil and as a plant-based fuel. The use of internal genes is the basis for accurately assessing Real time quantitative PCR. In this study, based on transcriptome data of A. sphaerocephala, we analyzed 21 candidate internal genes to determine the optimal internal genes in this shrub. The stabilities of candidate genes were evaluated in 16 samples of A. sphaerocephala. Finally, UBC9 and TIP41-like were determined as the optimal reference genes in A. sphaerocephala by Delta Ct and three various programs. There were GeNorm, NormFinder and BestKeeper. Copyright © 2018 Elsevier B.V. All rights reserved.

  4. TOM: a web-based integrated approach for identification of candidate disease genes.

    PubMed

    Rossi, Simona; Masotti, Daniele; Nardini, Christine; Bonora, Elena; Romeo, Giovanni; Macii, Enrico; Benini, Luca; Volinia, Stefano

    2006-07-01

    The massive production of biological data by means of highly parallel devices like microarrays for gene expression has paved the way to new possible approaches in molecular genetics. Among them the possibility of inferring biological answers by querying large amounts of expression data. Based on this principle, we present here TOM, a web-based resource for the efficient extraction of candidate genes for hereditary diseases. The service requires the previous knowledge of at least another gene responsible for the disease and the linkage area, or else of two disease associated genetic intervals. The algorithm uses the information stored in public resources, including mapping, expression and functional databases. Given the queries, TOM will select and list one or more candidate genes. This approach allows the geneticist to bypass the costly and time consuming tracing of genetic markers through entire families and might improve the chance of identifying disease genes, particularly for rare diseases. We present here the tool and the results obtained on known benchmark and on hereditary predisposition to familial thyroid cancer. Our algorithm is available at http://www-micrel.deis.unibo.it/~tom/.

  5. PhytoPath: an integrative resource for plant pathogen genomics.

    PubMed

    Pedro, Helder; Maheswari, Uma; Urban, Martin; Irvine, Alistair George; Cuzick, Alayne; McDowall, Mark D; Staines, Daniel M; Kulesha, Eugene; Hammond-Kosack, Kim Elizabeth; Kersey, Paul Julian

    2016-01-04

    PhytoPath (www.phytopathdb.org) is a resource for genomic and phenotypic data from plant pathogen species, that integrates phenotypic data for genes from PHI-base, an expertly curated catalog of genes with experimentally verified pathogenicity, with the Ensembl tools for data visualization and analysis. The resource is focused on fungi, protists (oomycetes) and bacterial plant pathogens that have genomes that have been sequenced and annotated. Genes with associated PHI-base data can be easily identified across all plant pathogen species using a BioMart-based query tool and visualized in their genomic context on the Ensembl genome browser. The PhytoPath resource contains data for 135 genomic sequences from 87 plant pathogen species, and 1364 genes curated for their role in pathogenicity and as targets for chemical intervention. Support for community annotation of gene models is provided using the WebApollo online gene editor, and we are working with interested communities to improve reference annotation for selected species. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Contrasted evolutionary histories of two Toll-like receptors (Tlr4 and Tlr7) in wild rodents (MURINAE)

    PubMed Central

    2013-01-01

    Background In vertebrates, it has been repeatedly demonstrated that genes encoding proteins involved in pathogen-recognition by adaptive immunity (e.g. MHC) are subject to intensive diversifying selection. On the other hand, the role and the type of selection processes shaping the evolution of innate-immunity genes are currently far less clear. In this study we analysed the natural variation and the evolutionary processes acting on two genes involved in the innate-immunity recognition of Microbe-Associated Molecular Patterns (MAMPs). Results We sequenced genes encoding Toll-like receptor 4 (Tlr4) and 7 (Tlr7), two of the key bacterial- and viral-sensing receptors of innate immunity, across 23 species within the subfamily Murinae. Although we have shown that the phylogeny of both Tlr genes is largely congruent with the phylogeny of rodents based on a comparably sized non-immune sequence dataset, we also identified several potentially important discrepancies. The sequence analyses revealed that major parts of both Tlrs are evolving under strong purifying selection, likely due to functional constraints. Yet, also several signatures of positive selection have been found in both genes, with more intense signal in the bacterial-sensing Tlr4 than in the viral-sensing Tlr7. 92% and 100% of sites evolving under positive selection in Tlr4 and Tlr7, respectively, were located in the extracellular domain. Directly in the Ligand-Binding Region (LBR) of TLR4 we identified two rapidly evolving amino acid residues and one site under positive selection, all three likely involved in species-specific recognition of lipopolysaccharide of gram-negative bacteria. In contrast, all putative sites of LBRTLR7 involved in the detection of viral nucleic acids were highly conserved across rodents. Interspecific differences in the predicted 3D-structure of the LBR of both Tlrs were not related to phylogenetic history, while analyses of protein charges clearly discriminated Rattini and Murini clades. Conclusions In consequence of the constraints given by the receptor protein function purifying selection has been a dominant force in evolution of Tlrs. Nevertheless, our results show that episodic diversifying parasite-mediated selection has shaped the present species-specific variability in rodent Tlrs. The intensity of diversifying selection was higher in Tlr4 than in Tlr7, presumably due to structural properties of their ligands. PMID:24028551

  7. Dominant positive and negative selection using a hygromycin phosphotransferase-thymidine kinase fusion gene.

    PubMed

    Lupton, S D; Brunton, L L; Kalberg, V A; Overell, R W

    1991-06-01

    The hygromycin phosphotransferase gene was fused in-frame with the herpes simplex virus type 1 thymidine kinase gene. The resulting fusion gene (termed HyTK) confers hygromycin B resistance for dominant positive selection and ganciclovir sensitivity for negative selection and provides a means by which these selectable phenotypes may be expressed and regulated as a single genetic entity.

  8. Phylogeny reconstruction in the Caesalpinieae grade (Leguminosae) based on duplicated copies of the sucrose synthase gene and plastid markers.

    PubMed

    Manzanilla, Vincent; Bruneau, Anne

    2012-10-01

    The Caesalpinieae grade (Leguminosae) forms a morphologically and ecologically diverse group of mostly tropical tree species with a complex evolutionary history. This grade comprises several distinct lineages, but the exact delimitation of the group relative to subfamily Mimosoideae and other members of subfamily Caesalpinioideae, as well as phylogenetic relationships among the lineages are uncertain. With the aim of better resolving phylogenetic relationships within the Caesalpinieae grade, we investigated the utility of several nuclear markers developed from genomic studies in the Papilionoideae. We cloned and sequenced the low copy nuclear gene sucrose synthase (SUSY) and combined the data with plastid trnL and matK sequences. SUSY has two paralogs in the Caesalpinieae grade and in the Mimosoideae, but occurs as a single copy in all other legumes tested. Bayesian and maximum likelihood phylogenetic analyses suggest the two nuclear markers are congruent with plastid DNA data. The Caesalpinieae grade is divided into four well-supported clades (Cassia, Caesalpinia, Tachigali and Peltophorum clades), a poorly supported clade of Dimorphandra Group genera, and two paraphyletic groups, one with other Dimorphandra Group genera and the other comprising genera previously recognized as the Umtiza clade. A selection analysis of the paralogs, using selection models from PAML, suggests that SUSY genes are subjected to a purifying selection. One of the SUSY paralogs, under slightly stronger positive selection, may be undergoing subfunctionalization. The low copy SUSY gene is useful for phylogeny reconstruction in the Caesalpinieae despite the presence of duplicate copies. This study confirms that the Caesalpinieae grade is an artificial group, and highlights the need for further analyses of lineages at the base of the Mimosoideae. Copyright © 2012 Elsevier Inc. All rights reserved.

  9. SiNoPsis: Single Nucleotide Polymorphisms selection and promoter profiling.

    PubMed

    Boloc, Daniel; Rodríguez, Natalia; Gassó, Patricia; Abril, Josep F; Bernardo, Miquel; Lafuente, Amalia; Mas, Sergi

    2017-09-14

    The selection of a Single Nucleotide Polymorphism (SNP) using bibliographic methods can be a very time-consuming task. Moreover, a SNP selected in this way may not be easily visualized in its genomic context by a standard user hoping to correlate it with other valuable information. Here we propose a web form built on top of Circos that can assist SNP-centred screening, based on their location in the genome and the regulatory modules they can disrupt. Its use may allow researchers to prioritize SNPs in genotyping and disease studies. SiNoPsis is bundled as a web portal. It focuses on the different structures involved in the genomic expression of a gene, especially those found in the core promoter upstream region. These structures include transcription factor binding sites (for promoter and enhancer signals), histones, and promoter flanking regions. Additionally, the tool provides eQTL and linkage disequilibrium (LD) properties for a given SNP query, yielding further clues about other indirectly associated SNPs. Possible disruptions of the aforementioned structures affecting gene transcription are reported using multiple resource databases. SiNoPsis has a simple user-friendly interface, which allows single queries by gene symbol, genomic coordinates, Ensembl gene identifiers, RefSeq transcript identifiers and SNPs. It is the only portal providing useful SNP selection based on regulatory modules and LD with functional variants in both textual and graphic modes (by properly defining the arguments and parameters needed to run Circos). SiNoPsis is freely available at https://compgen.bio.ub.edu/SiNoPsis /. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  10. A multi-SNP association test for complex diseases incorporating an optimal P-value threshold algorithm in nuclear families.

    PubMed

    Wang, Yi-Ting; Sung, Pei-Yuan; Lin, Peng-Lin; Yu, Ya-Wen; Chung, Ren-Hua

    2015-05-15

    Genome-wide association studies (GWAS) have become a common approach to identifying single nucleotide polymorphisms (SNPs) associated with complex diseases. As complex diseases are caused by the joint effects of multiple genes, while the effect of individual gene or SNP is modest, a method considering the joint effects of multiple SNPs can be more powerful than testing individual SNPs. The multi-SNP analysis aims to test association based on a SNP set, usually defined based on biological knowledge such as gene or pathway, which may contain only a portion of SNPs with effects on the disease. Therefore, a challenge for the multi-SNP analysis is how to effectively select a subset of SNPs with promising association signals from the SNP set. We developed the Optimal P-value Threshold Pedigree Disequilibrium Test (OPTPDT). The OPTPDT uses general nuclear families. A variable p-value threshold algorithm is used to determine an optimal p-value threshold for selecting a subset of SNPs. A permutation procedure is used to assess the significance of the test. We used simulations to verify that the OPTPDT has correct type I error rates. Our power studies showed that the OPTPDT can be more powerful than the set-based test in PLINK, the multi-SNP FBAT test, and the p-value based test GATES. We applied the OPTPDT to a family-based autism GWAS dataset for gene-based association analysis and identified MACROD2-AS1 with genome-wide significance (p-value=2.5×10(-6)). Our simulation results suggested that the OPTPDT is a valid and powerful test. The OPTPDT will be helpful for gene-based or pathway association analysis. The method is ideal for the secondary analysis of existing GWAS datasets, which may identify a set of SNPs with joint effects on the disease.

  11. Genes under weaker stabilizing selection increase network evolvability and rapid regulatory adaptation to an environmental shift.

    PubMed

    Laarits, T; Bordalo, P; Lemos, B

    2016-08-01

    Regulatory networks play a central role in the modulation of gene expression, the control of cellular differentiation, and the emergence of complex phenotypes. Regulatory networks could constrain or facilitate evolutionary adaptation in gene expression levels. Here, we model the adaptation of regulatory networks and gene expression levels to a shift in the environment that alters the optimal expression level of a single gene. Our analyses show signatures of natural selection on regulatory networks that both constrain and facilitate rapid evolution of gene expression level towards new optima. The analyses are interpreted from the standpoint of neutral expectations and illustrate the challenge to making inferences about network adaptation. Furthermore, we examine the consequence of variable stabilizing selection across genes on the strength and direction of interactions in regulatory networks and in their subsequent adaptation. We observe that directional selection on a highly constrained gene previously under strong stabilizing selection was more efficient when the gene was embedded within a network of partners under relaxed stabilizing selection pressure. The observation leads to the expectation that evolutionarily resilient regulatory networks will contain optimal ratios of genes whose expression is under weak and strong stabilizing selection. Altogether, our results suggest that the variable strengths of stabilizing selection across genes within regulatory networks might itself contribute to the long-term adaptation of complex phenotypes. © 2016 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2016 European Society For Evolutionary Biology.

  12. AucPR: an AUC-based approach using penalized regression for disease prediction with high-dimensional omics data.

    PubMed

    Yu, Wenbao; Park, Taesung

    2014-01-01

    It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.

  13. A novel approach for human whole transcriptome analysis based on absolute gene expression of microarray data

    PubMed Central

    Bikel, Shirley; Jacobo-Albavera, Leonor; Sánchez-Muñoz, Fausto; Cornejo-Granados, Fernanda; Canizales-Quinteros, Samuel; Soberón, Xavier; Sotelo-Mundo, Rogerio R.; del Río-Navarro, Blanca E.; Mendoza-Vargas, Alfredo; Sánchez, Filiberto

    2017-01-01

    Background In spite of the emergence of RNA sequencing (RNA-seq), microarrays remain in widespread use for gene expression analysis in the clinic. There are over 767,000 RNA microarrays from human samples in public repositories, which are an invaluable resource for biomedical research and personalized medicine. The absolute gene expression analysis allows the transcriptome profiling of all expressed genes under a specific biological condition without the need of a reference sample. However, the background fluorescence represents a challenge to determine the absolute gene expression in microarrays. Given that the Y chromosome is absent in female subjects, we used it as a new approach for absolute gene expression analysis in which the fluorescence of the Y chromosome genes of female subjects was used as the background fluorescence for all the probes in the microarray. This fluorescence was used to establish an absolute gene expression threshold, allowing the differentiation between expressed and non-expressed genes in microarrays. Methods We extracted the RNA from 16 children leukocyte samples (nine males and seven females, ages 6–10 years). An Affymetrix Gene Chip Human Gene 1.0 ST Array was carried out for each sample and the fluorescence of 124 genes of the Y chromosome was used to calculate the absolute gene expression threshold. After that, several expressed and non-expressed genes according to our absolute gene expression threshold were compared against the expression obtained using real-time quantitative polymerase chain reaction (RT-qPCR). Results From the 124 genes of the Y chromosome, three genes (DDX3Y, TXLNG2P and EIF1AY) that displayed significant differences between sexes were used to calculate the absolute gene expression threshold. Using this threshold, we selected 13 expressed and non-expressed genes and confirmed their expression level by RT-qPCR. Then, we selected the top 5% most expressed genes and found that several KEGG pathways were significantly enriched. Interestingly, these pathways were related to the typical functions of leukocytes cells, such as antigen processing and presentation and natural killer cell mediated cytotoxicity. We also applied this method to obtain the absolute gene expression threshold in already published microarray data of liver cells, where the top 5% expressed genes showed an enrichment of typical KEGG pathways for liver cells. Our results suggest that the three selected genes of the Y chromosome can be used to calculate an absolute gene expression threshold, allowing a transcriptome profiling of microarray data without the need of an additional reference experiment. Discussion Our approach based on the establishment of a threshold for absolute gene expression analysis will allow a new way to analyze thousands of microarrays from public databases. This allows the study of different human diseases without the need of having additional samples for relative expression experiments. PMID:29230367

  14. An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.

    PubMed

    Nidheesh, N; Abdul Nazeer, K A; Ameer, P M

    2017-12-01

    Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Structural analysis, selection, and ontogeny of the shark new antigen receptor (IgNAR): identification of a new locus preferentially expressed in early development.

    PubMed

    Diaz, Marilyn; Stanfield, Robyn L; Greenberg, Andrew S; Flajnik, Martin F

    2002-10-01

    The new antigen receptor (IgNAR) family has been detected in all elasmobranch species so far studied and has several intriguing structural and functional features. IgNAR protein, found in both transmembrane and secretory forms, is a dimer of heavy chains with no associated light chains, with each chain of the dimer having a single free and flexible V region. Four rearrangement events (among 1V, 3D, and 1J germline genes) generate an expressed NAR V gene, resulting in long and diverse CDR3 regions that contain cysteine residues. IgNAR mutation frequency is very high and "selected" mutations are found only in genes encoding the secreted form, suggesting that the primary repertoire is entirely CDR3-based. Here we further analyzed the two IgNAR types, "type 1" having one cysteine in CDR3 and "type 2" with an even number (two or four) of CDR3 cysteines, and discovered that placement of the disulfide bridges in the IgNAR V domain differentially influences the selection of mutations in CDR1 and CDR2. Ontogenetic analyses showed that IgNAR sequences from young animals were infrequently mutated, consistent with the paradigm that the shark immune system must become mature before high levels of mutation accompanied with selection can occur. Nevertheless, also in agreement with the idea that the IgNAR repertoire is entirely CDR3-based, but unlike studies in most other vertebrates, N-region diversity is present in expressed IgNAR clones at birth. During the investigation of this early IgNAR repertoire we serendipitously detected a third type of IgNAR gene that is expressed in all neonatal tissues; later in life its expression is perpetuated only in the epigonal organ, a tissue recently shown to be a (the?) primary lymphoid tissue in elasmobranchs. This "type 3" IgNAR gene still undergoes three rearrangement events (two D regions are "germline-joined"), yet CDR3 sequences were exactly of the same length and very similar sequence, suggesting that "type 3" CDR3s are selected early in ontogeny, perhaps by a self-ligand.

  16. Silencing of six susceptibility genes results in potato late blight resistance.

    PubMed

    Sun, Kaile; Wolters, Anne-Marie A; Vossen, Jack H; Rouwet, Maarten E; Loonen, Annelies E H M; Jacobsen, Evert; Visser, Richard G F; Bai, Yuling

    2016-10-01

    Phytophthora infestans, the causal agent of late blight, is a major threat to commercial potato production worldwide. Significant costs are required for crop protection to secure yield. Many dominant genes for resistance (R-genes) to potato late blight have been identified, and some of these R-genes have been applied in potato breeding. However, the P. infestans population rapidly accumulates new virulent strains that render R-genes ineffective. Here we introduce a new class of resistance which is based on the loss-of-function of a susceptibility gene (S-gene) encoding a product exploited by pathogens during infection and colonization. Impaired S-genes primarily result in recessive resistance traits in contrast to recognition-based resistance that is governed by dominant R-genes. In Arabidopsis thaliana, many S-genes have been detected in screens of mutant populations. In the present study, we selected 11 A. thaliana S-genes and silenced orthologous genes in the potato cultivar Desiree, which is highly susceptible to late blight. The silencing of five genes resulted in complete resistance to the P. infestans isolate Pic99189, and the silencing of a sixth S-gene resulted in reduced susceptibility. The application of S-genes to potato breeding for resistance to late blight is further discussed.

  17. A Double Selection Approach to Achieve Specific Expression of Toxin Genes for Ovarian Cancer Gene Therapy

    DTIC Science & Technology

    2005-11-01

    biological properties. CAV-1 is known to cause allergic two knobs may in fact be distinct based on our uveitis, called the ’blue eye syndrome ’ and rarely...system - first steps towards gene therapy of Alport syndrome . Gene Ther 3(1), 2 1-7. Hemminki, A., and Alvarez, R. D. 2002. Adenoviruses in oncology: a...Ad-IX- Ad-IX-imRFPI or Ad-IX-tdimer2(12) (10000 viral particles/ tdimer2(12) with plX modifications, and wild-type El/E3 cell) were added to the

  18. Genetic Variance in the F2 Generation of Divergently Selected Parents

    Treesearch

    M.P. Koshy; G. Namkoong; J.H. Roberds

    1998-01-01

    Either by selective breeding for population divergence or by using natural population differences, F2 and advanced generation hybrids can be developed with high variances. We relate the size of the genetic variance to the population divergence based on a forward and backward mutation model at a locus with two alleles with additive gene action....

  19. EnRICH: Extraction and Ranking using Integration and Criteria Heuristics.

    PubMed

    Zhang, Xia; Greenlee, M Heather West; Serb, Jeanne M

    2013-01-15

    High throughput screening technologies enable biologists to generate candidate genes at a rate that, due to time and cost constraints, cannot be studied by experimental approaches in the laboratory. Thus, it has become increasingly important to prioritize candidate genes for experiments. To accomplish this, researchers need to apply selection requirements based on their knowledge, which necessitates qualitative integration of heterogeneous data sources and filtration using multiple criteria. A similar approach can also be applied to putative candidate gene relationships. While automation can assist in this routine and imperative procedure, flexibility of data sources and criteria must not be sacrificed. A tool that can optimize the trade-off between automation and flexibility to simultaneously filter and qualitatively integrate data is needed to prioritize candidate genes and generate composite networks from heterogeneous data sources. We developed the java application, EnRICH (Extraction and Ranking using Integration and Criteria Heuristics), in order to alleviate this need. Here we present a case study in which we used EnRICH to integrate and filter multiple candidate gene lists in order to identify potential retinal disease genes. As a result of this procedure, a candidate pool of several hundred genes was narrowed down to five candidate genes, of which four are confirmed retinal disease genes and one is associated with a retinal disease state. We developed a platform-independent tool that is able to qualitatively integrate multiple heterogeneous datasets and use different selection criteria to filter each of them, provided the datasets are tables that have distinct identifiers (required) and attributes (optional). With the flexibility to specify data sources and filtering criteria, EnRICH automatically prioritizes candidate genes or gene relationships for biologists based on their specific requirements. Here, we also demonstrate that this tool can be effectively and easily used to apply highly specific user-defined criteria and can efficiently identify high quality candidate genes from relatively sparse datasets.

  20. Does gene flow constrain adaptive divergence or vice versa? A test using ecomorphology and sexual isolation in Timema cristinae walking-sticks.

    PubMed

    Nosil, P; Crespi, B J

    2004-01-01

    Population differentiation often reflects a balance between divergent natural selection and the opportunity for homogenizing gene flow to erode the effects of selection. However, during ecological speciation, trait divergence results in reproductive isolation and becomes a cause, rather than a consequence, of reductions in gene flow. To assess both the causes and the reproductive consequences of morphological differentiation, we examined morphological divergence and sexual isolation among 17 populations of Timema cristinae walking-sticks. Individuals from populations adapted to using Adenostoma as a host plant tended to exhibit smaller overall body size, wide heads, and short legs relative to individuals using Ceonothus as a host. However, there was also significant variation in morphology among populations within host-plant species. Mean trait values for each single population could be reliably predicted based upon host-plant used and the potential for homogenizing gene flow, inferred from the size of the neighboring population using the alternate host and mitochondrial DNA estimates of gene flow. Morphology did not influence the probability of copulation in between-population mating trials. Thus, morphological divergence is facilitated by reductions in gene flow, but does not cause reductions in gene flow via the evolution of sexual isolation. Combined with rearing data indicating that size and shape have a partial genetic basis, evidence for parallel origins of the host-associated forms, and inferences from functional morphology, these results indicate that morphological divergence in T. cristinae reflects a balance between the effects of host-specific natural selection and gene flow. Our findings illustrate how data on mating preferences can help determine the causal associations between trait divergence and levels of gene flow.

  1. Selection and evaluation of reference genes for RT-qPCR expression studies on Burkholderia tropica strain Ppe8, a sugarcane-associated diazotrophic bacterium grown with different carbon sources or sugarcane juice.

    PubMed

    da Silva, Paula Renata Alves; Vidal, Marcia Soares; de Paula Soares, Cleiton; Polese, Valéria; Simões-Araújo, Jean Luís; Baldani, José Ivo

    2016-11-01

    Among the members of the genus Burkholderia, Burkholderia tropica has the ability to fix nitrogen and promote sugarcane plant growth as well as act as a biological control agent. There is little information about how this bacterium metabolizes carbohydrates as well as those carbon sources found in the sugarcane juice that accumulates in stems during plant growth. Reverse transcription quantitative PCR (RT-qPCR) can be used to evaluate changes in gene expression during bacterial growth on different carbon sources. Here we tested the expression of six reference genes, lpxC, gyrB, recA, rpoA, rpoB, and rpoD, when cells were grown with glucose, fructose, sucrose, mannitol, aconitic acid, and sugarcane juice as carbon sources. The lpxC, gyrB, and recA were selected as the most stable reference genes based on geNorm and NormFinder software analyses. Validation of these three reference genes during strain Ppe8 growth on the same carbon sources showed that genes involved in glycogen biosynthesis (glgA, glgB, glgC) and trehalose biosynthesis (treY and treZ) were highly expressed when Ppe8 was grown in aconitic acid relative to other carbon sources, while otsA expression (trehalose biosynthesis) was reduced with all carbon sources. In addition, the expression level of the ORF_6066 (gluconolactonase) gene was reduced on sugarcane juice. The results confirmed the stability of the three selected reference genes (lpxC, gyrB, and recA) during the RT-qPCR and also their robustness by evaluating the relative expression of genes involved in glycogen and trehalose biosynthesis when strain Ppe8 was grown on different carbon sources and sugarcane juice.

  2. Potential of gene drives with genome editing to increase genetic gain in livestock breeding programs.

    PubMed

    Gonen, Serap; Jenko, Janez; Gorjanc, Gregor; Mileham, Alan J; Whitelaw, C Bruce A; Hickey, John M

    2017-01-04

    This paper uses simulation to explore how gene drives can increase genetic gain in livestock breeding programs. Gene drives are naturally occurring phenomena that cause a mutation on one chromosome to copy itself onto its homologous chromosome. We simulated nine different breeding and editing scenarios with a common overall structure. Each scenario began with 21 generations of selection, followed by 20 generations of selection based on true breeding values where the breeder used selection alone, selection in combination with genome editing, or selection with genome editing and gene drives. In the scenarios that used gene drives, we varied the probability of successfully incorporating the gene drive. For each scenario, we evaluated genetic gain, genetic variance [Formula: see text], rate of change in inbreeding ([Formula: see text]), number of distinct quantitative trait nucleotides (QTN) edited, rate of increase in favourable allele frequencies of edited QTN and the time to fix favourable alleles. Gene drives enhanced the benefits of genome editing in seven ways: (1) they amplified the increase in genetic gain brought about by genome editing; (2) they amplified the rate of increase in the frequency of favourable alleles and reduced the time it took to fix them; (3) they enabled more rapid targeting of QTN with lesser effect for genome editing; (4) they distributed fixed editing resources across a larger number of distinct QTN across generations; (5) they focussed editing on a smaller number of QTN within a given generation; (6) they reduced the level of inbreeding when editing a subset of the sires; and (7) they increased the efficiency of converting genetic variation into genetic gain. Genome editing in livestock breeding results in short-, medium- and long-term increases in genetic gain. The increase in genetic gain occurs because editing increases the frequency of favourable alleles in the population. Gene drives accelerate the increase in allele frequency caused by editing, which results in even higher genetic gain over a shorter period of time with no impact on inbreeding.

  3. Genome-Wide Characterization of bHLH Genes in Grape and Analysis of their Potential Relevance to Abiotic Stress Tolerance and Secondary Metabolite Biosynthesis

    PubMed Central

    Wang, Pengfei; Su, Ling; Gao, Huanhuan; Jiang, Xilong; Wu, Xinying; Li, Yi; Zhang, Qianqian; Wang, Yongmei; Ren, Fengshan

    2018-01-01

    Basic helix-loop-helix (bHLH) transcription factors are involved in many abiotic stress responses as well as flavonol and anthocyanin biosynthesis. In grapes (Vitis vinifera L.), flavonols including anthocyanins and condensed tannins are most abundant in the skins of the berries. Flavonols are important phytochemicals for viticulture and enology, but grape bHLH genes have rarely been examined. We identified 94 grape bHLH genes in a genome-wide analysis and performed Nr and GO function analyses for these genes. Phylogenetic analyses placed the genes into 15 clades, with some remaining orphans. 41 duplicate gene pairs were found in the grape bHLH gene family, and all of these duplicate gene pairs underwent purifying selection. Nine triplicate gene groups were found in the grape bHLH gene family and all of these triplicate gene groups underwent purifying selection. Twenty-two grape bHLH genes could be induced by PEG treatment and 17 grape bHLH genes could be induced by cold stress treatment including a homologous form of MYC2, VvbHLH007. Based on the GO or Nr function annotations, we found three other genes that are potentially related to anthocyanin or flavonol biosynthesis: VvbHLH003, VvbHLH007, and VvbHLH010. We also performed a cis-acting regulatory element analysis on some genes involved in flavonoid or anthocyanin biosynthesis and our results showed that most of these gene promoters contained G-box or E-box elements that could be recognized by bHLH family members. PMID:29449854

  4. Whole-Gene Positive Selection, Elevated Synonymous Substitution Rates, Duplication, and Indel Evolution of the Chloroplast clpP1 Gene

    PubMed Central

    Erixon, Per; Oxelman, Bengt

    2008-01-01

    Background Synonymous DNA substitution rates in the plant chloroplast genome are generally relatively slow and lineage dependent. Non-synonymous rates are usually even slower due to purifying selection acting on the genes. Positive selection is expected to speed up non-synonymous substitution rates, whereas synonymous rates are expected to be unaffected. Until recently, positive selection has seldom been observed in chloroplast genes, and large-scale structural rearrangements leading to gene duplications are hitherto supposed to be rare. Methodology/Principle Findings We found high substitution rates in the exons of the plastid clpP1 gene in Oenothera (the Evening Primrose family) and three separate lineages in the tribe Sileneae (Caryophyllaceae, the Carnation family). Introns have been lost in some of the lineages, but where present, the intron sequences have substitution rates similar to those found in other introns of their genomes. The elevated substitution rates of clpP1 are associated with statistically significant whole-gene positive selection in three branches of the phylogeny. In two of the lineages we found multiple copies of the gene. Neighboring genes present in the duplicated fragments do not show signs of elevated substitution rates or positive selection. Although non-synonymous substitutions account for most of the increase in substitution rates, synonymous rates are also markedly elevated in some lineages. Whereas plant clpP1 genes experiencing negative (purifying) selection are characterized by having very conserved lengths, genes under positive selection often have large insertions of more or less repetitive amino acid sequence motifs. Conclusions/Significance We found positive selection of the clpP1 gene in various plant lineages to correlated with repeated duplication of the clpP1 gene and surrounding regions, repetitive amino acid sequences, and increase in synonymous substitution rates. The present study sheds light on the controversial issue of whether negative or positive selection is to be expected after gene duplications by providing evidence for the latter alternative. The observed increase in synonymous substitution rates in some of the lineages indicates that the detection of positive selection may be obscured under such circumstances. Future studies are required to explore the functional significance of the large inserted repeated amino acid motifs, as well as the possibility that synonymous substitution rates may be affected by positive selection. PMID:18167545

  5. Genomic Signatures Reveal New Evidences for Selection of Important Traits in Domestic Cattle

    PubMed Central

    Xu, Lingyang; Bickhart, Derek M.; Cole, John B.; Schroeder, Steven G.; Song, Jiuzhou; Tassell, Curtis P. Van; Sonstegard, Tad S.; Liu, George E.

    2015-01-01

    We investigated diverse genomic selections using high-density single nucleotide polymorphism data of five distinct cattle breeds. Based on allele frequency differences, we detected hundreds of candidate regions under positive selection across Holstein, Angus, Charolais, Brahman, and N'Dama. In addition to well-known genes such as KIT, MC1R, ASIP, GHR, LCORL, NCAPG, WIF1, and ABCA12, we found evidence for a variety of novel and less-known genes under selection in cattle, such as LAP3, SAR1B, LRIG3, FGF5, and NUDCD3. Selective sweeps near LAP3 were then validated by next-generation sequencing. Genome-wide association analysis involving 26,362 Holsteins confirmed that LAP3 and SAR1B were related to milk production traits, suggesting that our candidate regions were likely functional. In addition, haplotype network analyses further revealed distinct selective pressures and evolution patterns across these five cattle breeds. Our results provided a glimpse into diverse genomic selection during cattle domestication, breed formation, and recent genetic improvement. These findings will facilitate genome-assisted breeding to improve animal production and health. PMID:25431480

  6. Stem cell-based gene therapy activated using magnetic hyperthermia to enhance the treatment of cancer

    PubMed Central

    Yin, Perry T.; Shah, Shreyas; Pasquale, Nicholas J.; Garbuzenko, Olga B.; Minko, Tamara; Lee, Ki-Bum

    2015-01-01

    Stem cell-based gene therapies, wherein stem cells are genetically engineered to express therapeutic molecules, have shown tremendous potential for cancer applications owing to their innate ability to home to tumors. However, traditional stem cell-based gene therapies are hampered by our current inability to control when the therapeutic genes are actually turned on, thereby resulting in detrimental side effects. Here, we report the novel application of magnetic core-shell nanoparticles for the dual purpose of delivering and activating a heat-inducible gene vector that encodes TNF-related apoptosis-inducing ligand (TRAIL) in adipose-derived mesenchymal stem cells (AD-MSCs). By combining the tumor tropism of the AD-MSCs with the spatiotemporal MCNP-based delivery and activation of TRAIL expression, this platform provides an attractive means with which to enhance our control over the activation of stem cell-based gene therapies. In particular, we found that these engineered AD-MSCs retained their innate ability to proliferate, differentiate, and, most importantly, home to tumors, making them ideal cellular carriers. Moreover, exposure of the engineered AD-MSCS to mild magnetic hyperthermia resulted in the selective expression of TRAIL from the engineered AD-MSCs and, as a result, induced significant ovarian cancer cell death in vitro and in vivo. PMID:26720500

  7. Stem cell-based gene therapy activated using magnetic hyperthermia to enhance the treatment of cancer.

    PubMed

    Yin, Perry T; Shah, Shreyas; Pasquale, Nicholas J; Garbuzenko, Olga B; Minko, Tamara; Lee, Ki-Bum

    2016-03-01

    Stem cell-based gene therapies, wherein stem cells are genetically engineered to express therapeutic molecules, have shown tremendous potential for cancer applications owing to their innate ability to home to tumors. However, traditional stem cell-based gene therapies are hampered by our current inability to control when the therapeutic genes are actually turned on, thereby resulting in detrimental side effects. Here, we report the novel application of magnetic core-shell nanoparticles for the dual purpose of delivering and activating a heat-inducible gene vector that encodes TNF-related apoptosis-inducing ligand (TRAIL) in adipose-derived mesenchymal stem cells (AD-MSCs). By combining the tumor tropism of the AD-MSCs with the spatiotemporal MCNP-based delivery and activation of TRAIL expression, this platform provides an attractive means with which to enhance our control over the activation of stem cell-based gene therapies. In particular, we found that these engineered AD-MSCs retained their innate ability to proliferate, differentiate, and, most importantly, home to tumors, making them ideal cellular carriers. Moreover, exposure of the engineered AD-MSCS to mild magnetic hyperthermia resulted in the selective expression of TRAIL from the engineered AD-MSCs and, as a result, induced significant ovarian cancer cell death in vitro and in vivo. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Oligonucleotide-based strategies to combat polyglutamine diseases

    PubMed Central

    Fiszer, Agnieszka; Krzyzosiak, Wlodzimierz J.

    2014-01-01

    Considerable advances have been recently made in understanding the molecular aspects of pathogenesis and in developing therapeutic approaches for polyglutamine (polyQ) diseases. Studies on pathogenic mechanisms have extended our knowledge of mutant protein toxicity, confirmed the toxicity of mutant transcript and identified other toxic RNA and protein entities. One very promising therapeutic strategy is targeting the causative gene expression with oligonucleotide (ON) based tools. This straightforward approach aimed at halting the early steps in the cascade of pathogenic events has been widely tested for Huntington's disease and spinocerebellar ataxia type 3. In this review, we gather information on the use of antisense oligonucleotides and RNA interference triggers for the experimental treatment of polyQ diseases in cellular and animal models. We present studies testing non-allele-selective and allele-selective gene silencing strategies. The latter include targeting SNP variants associated with mutations or targeting the pathologically expanded CAG repeat directly. We compare gene silencing effectors of various types in a number of aspects, including their design, efficiency in cell culture experiments and pre-clinical testing. We discuss advantages, current limitations and perspectives of various ON-based strategies used to treat polyQ diseases. PMID:24848018

  9. Genomic identification of WRKY transcription factors in carrot (Daucus carota) and analysis of evolution and homologous groups for plants

    PubMed Central

    Li, Meng-Yao; Xu, Zhi-Sheng; Tian, Chang; Huang, Ying; Wang, Feng; Xiong, Ai-Sheng

    2016-01-01

    WRKY transcription factors belong to one of the largest transcription factor families. These factors possess functions in plant growth and development, signal transduction, and stress response. Here, we identified 95 DcWRKY genes in carrot based on the carrot genomic and transcriptomic data, and divided them into three groups. Phylogenetic analysis of WRKY proteins from carrot and Arabidopsis divided these proteins into seven subgroups. To elucidate the evolution and distribution of WRKY transcription factors in different species, we constructed a schematic of the phylogenetic tree and compared the WRKY family factors among 22 species, which including plants, slime mold and protozoan. An in-depth study was performed to clarify the homologous factor groups of nine divergent taxa in lower and higher plants. Based on the orthologous factors between carrot and Arabidopsis, 38 DcWRKY proteins were calculated to interact with other proteins in the carrot genome. Yeast two-hybrid assay showed that DcWRKY20 can interact with DcMAPK1 and DcMAPK4. The expression patterns of the selected DcWRKY genes based on transcriptome data and qRT-PCR suggested that those selected DcWRKY genes are involved in root development, biotic and abiotic stress response. This comprehensive analysis provides a basis for investigating the evolution and function of WRKY genes. PMID:26975939

  10. Genomic identification of WRKY transcription factors in carrot (Daucus carota) and analysis of evolution and homologous groups for plants.

    PubMed

    Li, Meng-Yao; Xu, Zhi-Sheng; Tian, Chang; Huang, Ying; Wang, Feng; Xiong, Ai-Sheng

    2016-03-15

    WRKY transcription factors belong to one of the largest transcription factor families. These factors possess functions in plant growth and development, signal transduction, and stress response. Here, we identified 95 DcWRKY genes in carrot based on the carrot genomic and transcriptomic data, and divided them into three groups. Phylogenetic analysis of WRKY proteins from carrot and Arabidopsis divided these proteins into seven subgroups. To elucidate the evolution and distribution of WRKY transcription factors in different species, we constructed a schematic of the phylogenetic tree and compared the WRKY family factors among 22 species, which including plants, slime mold and protozoan. An in-depth study was performed to clarify the homologous factor groups of nine divergent taxa in lower and higher plants. Based on the orthologous factors between carrot and Arabidopsis, 38 DcWRKY proteins were calculated to interact with other proteins in the carrot genome. Yeast two-hybrid assay showed that DcWRKY20 can interact with DcMAPK1 and DcMAPK4. The expression patterns of the selected DcWRKY genes based on transcriptome data and qRT-PCR suggested that those selected DcWRKY genes are involved in root development, biotic and abiotic stress response. This comprehensive analysis provides a basis for investigating the evolution and function of WRKY genes.

  11. Prediction of regulatory gene pairs using dynamic time warping and gene ontology.

    PubMed

    Yang, Andy C; Hsu, Hui-Huang; Lu, Ming-Da; Tseng, Vincent S; Shih, Timothy K

    2014-01-01

    Selecting informative genes is the most important task for data analysis on microarray gene expression data. In this work, we aim at identifying regulatory gene pairs from microarray gene expression data. However, microarray data often contain multiple missing expression values. Missing value imputation is thus needed before further processing for regulatory gene pairs becomes possible. We develop a novel approach to first impute missing values in microarray time series data by combining k-Nearest Neighbour (KNN), Dynamic Time Warping (DTW) and Gene Ontology (GO). After missing values are imputed, we then perform gene regulation prediction based on our proposed DTW-GO distance measurement of gene pairs. Experimental results show that our approach is more accurate when compared with existing missing value imputation methods on real microarray data sets. Furthermore, our approach can also discover more regulatory gene pairs that are known in the literature than other methods.

  12. Caste-biased gene expression in a facultatively eusocial bee suggests a role for genetic accommodation in the evolution of eusociality

    PubMed Central

    Kingwell, Callum J.; Wcislo, William T.; Robinson, Gene E.

    2017-01-01

    Developmental plasticity may accelerate the evolution of phenotypic novelty through genetic accommodation, but studies of genetic accommodation often lack knowledge of the ancestral state to place selected traits in an evolutionary context. A promising approach for assessing genetic accommodation involves using a comparative framework to ask whether ancestral plasticity is related to the evolution of a particular trait. Bees are an excellent group for such comparisons because caste-based societies (eusociality) have evolved multiple times independently and extant species exhibit different modes of eusociality. We measured brain and abdominal gene expression in a facultatively eusocial bee, Megalopta genalis, and assessed whether plasticity in this species is functionally linked to eusocial traits in other bee lineages. Caste-biased abdominal genes in M. genalis overlapped significantly with caste-biased genes in obligately eusocial bees. Moreover, caste-biased genes in M. genalis overlapped significantly with genes shown to be rapidly evolving in multiple studies of 10 bee species, particularly for genes in the glycolysis pathway and other genes involved in metabolism. These results provide support for the idea that eusociality can evolve via genetic accommodation, with plasticity in facultatively eusocial species like M. genalis providing a substrate for selection during the evolution of caste in obligately eusocial lineages. PMID:28053060

  13. Caste-biased gene expression in a facultatively eusocial bee suggests a role for genetic accommodation in the evolution of eusociality.

    PubMed

    Jones, Beryl M; Kingwell, Callum J; Wcislo, William T; Robinson, Gene E

    2017-01-11

    Developmental plasticity may accelerate the evolution of phenotypic novelty through genetic accommodation, but studies of genetic accommodation often lack knowledge of the ancestral state to place selected traits in an evolutionary context. A promising approach for assessing genetic accommodation involves using a comparative framework to ask whether ancestral plasticity is related to the evolution of a particular trait. Bees are an excellent group for such comparisons because caste-based societies (eusociality) have evolved multiple times independently and extant species exhibit different modes of eusociality. We measured brain and abdominal gene expression in a facultatively eusocial bee, Megalopta genalis, and assessed whether plasticity in this species is functionally linked to eusocial traits in other bee lineages. Caste-biased abdominal genes in M. genalis overlapped significantly with caste-biased genes in obligately eusocial bees. Moreover, caste-biased genes in M. genalis overlapped significantly with genes shown to be rapidly evolving in multiple studies of 10 bee species, particularly for genes in the glycolysis pathway and other genes involved in metabolism. These results provide support for the idea that eusociality can evolve via genetic accommodation, with plasticity in facultatively eusocial species like M. genalis providing a substrate for selection during the evolution of caste in obligately eusocial lineages. © 2017 The Author(s).

  14. Divergence of RNA polymerase α subunits in angiosperm plastid genomes is mediated by genomic rearrangement

    PubMed Central

    Blazier, J. Chris; Ruhlman, Tracey A.; Weng, Mao-Lun; Rehman, Sumaiyah K.; Sabir, Jamal S. M.; Jansen, Robert K.

    2016-01-01

    Genes for the plastid-encoded RNA polymerase (PEP) persist in the plastid genomes of all photosynthetic angiosperms. However, three unrelated lineages (Annonaceae, Passifloraceae and Geraniaceae) have been identified with unusually divergent open reading frames (ORFs) in the conserved region of rpoA, the gene encoding the PEP α subunit. We used sequence-based approaches to evaluate whether these genes retain function. Both gene sequences and complete plastid genome sequences were assembled and analyzed from each of the three angiosperm families. Multiple lines of evidence indicated that the rpoA sequences are likely functional despite retaining as low as 30% nucleotide sequence identity with rpoA genes from outgroups in the same angiosperm order. The ratio of non-synonymous to synonymous substitutions indicated that these genes are under purifying selection, and bioinformatic prediction of conserved domains indicated that functional domains are preserved. One of the lineages (Pelargonium, Geraniaceae) contains species with multiple rpoA-like ORFs that show evidence of ongoing inter-paralog gene conversion. The plastid genomes containing these divergent rpoA genes have experienced extensive structural rearrangement, including large expansions of the inverted repeat. We propose that illegitimate recombination, not positive selection, has driven the divergence of rpoA. PMID:27087667

  15. Tumor regression following intravenous administration of lactoferrin- and lactoferricin-bearing dendriplexes

    PubMed Central

    Lim, Li Ying; Koh, Pei Yin; Somani, Sukrut; Al Robaian, Majed; Karim, Reatul; Yean, Yi Lyn; Mitchell, Jennifer; Tate, Rothwelle J.; Edrada-Ebel, RuAngelie; Blatchford, David R.; Mullin, Margaret; Dufès, Christine

    2015-01-01

    The possibility of using gene therapy for the treatment of cancer is limited by the lack of safe, intravenously administered delivery systems able to selectively deliver therapeutic genes to tumors. In this study, we investigated if the conjugation of the polypropylenimine dendrimer to lactoferrin and lactoferricin, whose receptors are overexpressed on cancer cells, could result in a selective gene delivery to tumors and a subsequently enhanced therapeutic efficacy. The conjugation of lactoferrin and lactoferricin to the dendrimer significantly increased the gene expression in the tumor while decreasing the non-specific gene expression in the liver. Consequently, the intravenous administration of the targeted dendriplexes encoding TNFα led to the complete suppression of 60% of A431 tumors and up to 50% of B16-F10 tumors over one month. The treatment was well tolerated by the animals. These results suggest that these novel lactoferrin- and lactoferricin-bearing dendrimers are promising gene delivery systems for cancer therapy. From the Clinical Editor Specific targeting of cancer cells should enhance the delivery of chemotherapeutic agents. This is especially true for gene delivery. In this article, the authors utilized a dendrimer-based system and conjugated this with lactoferrin and lactoferricin to deliver anti-tumor genes. The positive findings in animal studies should provide the basis for further clinical studies. PMID:25933695

  16. Analysis of genotype diversity and evolution of Dengue virus serotype 2 using complete genomes

    PubMed Central

    Waman, Vaishali P.; Kolekar, Pandurang; Ramtirthkar, Mukund R.; Kale, Mohan M.

    2016-01-01

    Background Dengue is one of the most common arboviral diseases prevalent worldwide and is caused by Dengue viruses (genus Flavivirus, family Flaviviridae). There are four serotypes of Dengue Virus (DENV-1 to DENV-4), each of which is further subdivided into distinct genotypes. DENV-2 is frequently associated with severe dengue infections and epidemics. DENV-2 consists of six genotypes such as Asian/American, Asian I, Asian II, Cosmopolitan, American and sylvatic. Comparative genomic study was carried out to infer population structure of DENV-2 and to analyze the role of evolutionary and spatiotemporal factors in emergence of diversifying lineages. Methods Complete genome sequences of 990 strains of DENV-2 were analyzed using Bayesian-based population genetics and phylogenetic approaches to infer genetically distinct lineages. The role of spatiotemporal factors, genetic recombination and selection pressure in the evolution of DENV-2 is examined using the sequence-based bioinformatics approaches. Results DENV-2 genetic structure is complex and consists of fifteen subpopulations/lineages. The Asian/American genotype is observed to be diversified into seven lineages. The Asian I, Cosmopolitan and sylvatic genotypes were found to be subdivided into two lineages, each. The populations of American and Asian II genotypes were observed to be homogeneous. Significant evidence of episodic positive selection was observed in all the genes, except NS4A. Positive selection operational on a few codons in envelope gene confers antigenic and lineage diversity in the American strains of Asian/American genotype. Selection on codons of non-structural genes was observed to impact diversification of lineages in Asian I, cosmopolitan and sylvatic genotypes. Evidence of intra/inter-genotype recombination was obtained and the uncertainty in classification of recombinant strains was resolved using the population genetics approach. Discussion Complete genome-based analysis revealed that the worldwide population of DENV-2 strains is subdivided into fifteen lineages. The population structure of DENV-2 is spatiotemporal and is shaped by episodic positive selection and recombination. Intra-genotype diversity was observed in four genotypes (Asian/American, Asian I, cosmopolitan and sylvatic). Episodic positive selection on envelope and non-structural genes translates into antigenic diversity and appears to be responsible for emergence of strains/lineages in DENV-2 genotypes. Understanding of the genotype diversity and emerging lineages will be useful to design strategies for epidemiological surveillance and vaccine design. PMID:27635316

  17. Structure-related clustering of gene expression fingerprints of thp-1 cells exposed to smaller polycyclic aromatic hydrocarbons.

    PubMed

    Wan, B; Yarbrough, J W; Schultz, T W

    2008-01-01

    This study was undertaken to test the hypothesis that structurally similar PAHs induce similar gene expression profiles. THP-1 cells were exposed to a series of 12 selected PAHs at 50 microM for 24 hours and gene expressions profiles were analyzed using both unsupervised and supervised methods. Clustering analysis of gene expression profiles revealed that the 12 tested chemicals were grouped into five clusters. Within each cluster, the gene expression profiles are more similar to each other than to the ones outside the cluster. One-methylanthracene and 1-methylfluorene were found to have the most similar profiles; dibenzothiophene and dibenzofuran were found to share common profiles with fluorine. As expression pattern comparisons were expanded, similarity in genomic fingerprint dropped off dramatically. Prediction analysis of microarrays (PAM) based on the clustering pattern generated 49 predictor genes that can be used for sample discrimination. Moreover, a significant analysis of Microarrays (SAM) identified 598 genes being modulated by tested chemicals with a variety of biological processes, such as cell cycle, metabolism, and protein binding and KEGG pathways being significantly (p < 0.05) affected. It is feasible to distinguish structurally different PAHs based on their genomic fingerprints, which are mechanism based.

  18. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data.

    PubMed

    Wang, Shuaiqun; Aorigele; Kong, Wei; Zeng, Weiming; Hong, Xiaomin

    2016-01-01

    Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes.

  19. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data

    PubMed Central

    Aorigele; Zeng, Weiming; Hong, Xiaomin

    2016-01-01

    Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. PMID:27579323

  20. Single feature polymorphism (SFP)-based selective sweep identification and association mapping of growth-related metabolic traits in Arabidopsis thaliana

    PubMed Central

    2010-01-01

    Background Natural accessions of Arabidopsis thaliana are characterized by a high level of phenotypic variation that can be used to investigate the extent and mode of selection on the primary metabolic traits. A collection of 54 A. thaliana natural accession-derived lines were subjected to deep genotyping through Single Feature Polymorphism (SFP) detection via genomic DNA hybridization to Arabidopsis Tiling 1.0 Arrays for the detection of selective sweeps, and identification of associations between sweep regions and growth-related metabolic traits. Results A total of 1,072,557 high-quality SFPs were detected and indications for 3,943 deletions and 1,007 duplications were obtained. A significantly lower than expected SFP frequency was observed in protein-, rRNA-, and tRNA-coding regions and in non-repetitive intergenic regions, while pseudogenes, transposons, and non-coding RNA genes are enriched with SFPs. Gene families involved in plant defence or in signalling were identified as highly polymorphic, while several other families including transcription factors are depleted of SFPs. 198 significant associations between metabolic genes and 9 metabolic and growth-related phenotypic traits were detected with annotation hinting at the nature of the relationship. Five significant selective sweep regions were also detected of which one associated significantly with a metabolic trait. Conclusions We generated a high density polymorphism map for 54 A. thaliana accessions that highlights the variability of resistance genes across geographic ranges and used it to identify selective sweeps and associations between metabolic genes and metabolic phenotypes. Several associations show a clear biological relationship, while many remain requiring further investigation. PMID:20302660

Top